From f25ba6dccc3bfe7e1524f4498a171be038507c45 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:30 -0700 Subject: mm, compaction: reorder fields in struct compact_control Patch series "try to reduce fragmenting fallbacks", v3. Last year, Johannes Weiner has reported a regression in page mobility grouping [1] and while the exact cause was not found, I've come up with some ways to improve it by reducing the number of allocations falling back to different migratetype and causing permanent fragmentation. The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats are quite volatile (note the stats below are sums, not averages, as it was less perl hacking for me). Success rate are the same, already high due to the low allocation order used, so I'm not including them. Compaction stats: (the patches are stacked, and I haven't measured the non-functional-changes patches separately) patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Compaction stalls 22449 24680 24846 19765 22059 17480 Compaction success 12971 14836 14608 10475 11632 8757 Compaction failures 9477 9843 10238 9290 10426 8722 Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379 Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367 Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665 Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197 Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731 Compaction cost 10243 10578 10304 8286 8398 9440 Compaction stats are mostly within noise until patch 4, which decreases the number of compactions, and migrations. Part of that could be due to more pageblocks marked as unmovable, and async compaction skipping those. This changes a bit with patch 7, but not so much. Patch 8 increases free scanner stats and migrations, which comes from the changed termination criteria. Interestingly number of compactions decreases - probably the fully compacted pageblock satisfies multiple subsequent allocations, so it amortizes. Next comes the extfrag tracepoint, where "fragmenting" means that an allocation had to fallback to a pageblock of another migratetype which wasn't fully free (which is almost all of the fallbacks). I have locally added another tracepoint for "Page steal" into steal_suitable_fallback() which triggers in situations where we are allowed to do move_freepages_block(). If we decide to also do set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down for which allocation migratetype we are stealing and from which fallback migratetype. The last part "due to counting" comes from patch 4 and counts the events where the counting of movable pages allowed us to change pageblock's migratetype, while the number of free pages alone wouldn't be enough to cross the threshold. patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319 Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792 Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948 Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917 Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031 Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378 Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760 Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618 Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466 Pages steal 179954 192291 210880 123254 94545 81486 Pages steal with pageblock 22153 18943 20154 33562 29969 33444 Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852 Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298 Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554 Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493 Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885 Pages steal with pageblock for movable from recl. 229 198 424 608 487 608 Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099 Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667 Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432 Pages steal with pageblock due to counting 11834 10075 7530 ... for unmovable 8993 7381 4616 ... for movable 2792 2653 2851 ... for reclaimable 49 41 63 What we can see is that "Extfrag fragmenting for unmovable" and "... placed with movable" drops with almost each patch, which is good as we are polluting less movable pageblocks with unmovable pages. The most significant change is patch 4 with movable page counting. On the other hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops though, so these movable allocation fallbacks find only small free pages and are not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises, because the patch increases the chances of pageblock migratetype changes to happen. This affects all migratetypes. The summary is that patch 4 is not a clear win wrt these stats, but I believe that the tradeoff it makes is a good one. There's less pollution of movable pageblocks by unmovable allocations. There's less stealing between pageblock, and those that remain have higher chance of changing migratetype also the pageblock itself, so it should more faithfully reflect the migratetype of the pages within the pageblock. The increase of movable allocations falling back to unmovable pageblock might look dramatic, but those allocations can be migrated by compaction when needed, and other patches in the series (7-9) improve that aspect. Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also reduce the impact on movable fallbacks from patch 4. [1] https://www.spinics.net/lists/linux-mm/msg114237.html This patch (of 8): While currently there are (mostly by accident) no holes in struct compact_control (on x86_64), but we are going to add more bool flags, so place them all together to the end of the structure. While at it, just order all fields from largest to smallest. Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/internal.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 04d08ef91224..004471b72977 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -183,6 +183,7 @@ extern int user_min_free_kbytes; struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ + struct zone *zone; unsigned long nr_freepages; /* Number of isolated free pages */ unsigned long nr_migratepages; /* Number of pages to migrate */ unsigned long total_migrate_scanned; @@ -190,16 +191,15 @@ struct compact_control { unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ + const gfp_t gfp_mask; /* gfp mask of a direct compactor */ + int order; /* order a direct compactor needs */ + const unsigned int alloc_flags; /* alloc flags of a direct compactor */ + const int classzone_idx; /* zone index of a direct compactor */ enum migrate_mode mode; /* Async or sync migration mode */ bool ignore_skip_hint; /* Scan blocks even if marked skip */ bool ignore_block_suitable; /* Scan blocks considered unsuitable */ bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone; /* Whole zone should/has been scanned */ - int order; /* order a direct compactor needs */ - const gfp_t gfp_mask; /* gfp mask of a direct compactor */ - const unsigned int alloc_flags; /* alloc flags of a direct compactor */ - const int classzone_idx; /* zone index of a direct compactor */ - struct zone *zone; bool contended; /* Signal lock or sched contention */ }; -- cgit From 228d7e33903040a0b9dd9a5ee9b3a49c538c0613 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:33 -0700 Subject: mm, compaction: remove redundant watermark check in compact_finished() When detecting whether compaction has succeeded in forming a high-order page, __compact_finished() employs a watermark check, followed by an own search for a suitable page in the freelists. This is not ideal for two reasons: - The watermark check also searches high-order freelists, but has a less strict criteria wrt fallback. It's therefore redundant and waste of cycles. This was different in the past when high-order watermark check attempted to apply reserves to high-order pages. - The watermark check might actually fail due to lack of order-0 pages. Compaction can't help with that, so there's no point in continuing because of that. It's possible that high-order page still exists and it terminates. This patch therefore removes the watermark check. This should save some cycles and terminate compaction sooner in some cases. Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/compaction.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 09c5282ebdd2..01b1fb8f6f47 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1280,7 +1280,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_ const int migratetype) { unsigned int order; - unsigned long watermark; if (cc->contended || fatal_signal_pending(current)) return COMPACT_CONTENDED; @@ -1308,13 +1307,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_ if (is_via_compact_memory(cc->order)) return COMPACT_CONTINUE; - /* Compaction run is not finished if the watermark is not met */ - watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK]; - - if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx, - cc->alloc_flags)) - return COMPACT_CONTINUE; - /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { struct free_area *area = &zone->free_area[order]; -- cgit From 3bc48f96cf11ce8699e419d5e47ae0d456403274 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:37 -0700 Subject: mm, page_alloc: split smallest stolen page in fallback The __rmqueue_fallback() function is called when there's no free page of requested migratetype, and we need to steal from a different one. There are various heuristics to make this event infrequent and reduce permanent fragmentation. The main one is to try stealing from a pageblock that has the most free pages, and possibly steal them all at once and convert the whole pageblock. Precise searching for such pageblock would be expensive, so instead the heuristics walks the free lists from MAX_ORDER down to requested order and assumes that the block with highest-order free page is likely to also have the most free pages in total. Chances are that together with the highest-order page, we steal also pages of lower orders from the same block. But then we still split the highest order page. This is wasteful and can contribute to fragmentation instead of avoiding it. This patch thus changes __rmqueue_fallback() to just steal the page(s) and put them on the freelist of the requested migratetype, and only report whether it was successful. Then we pick (and eventually split) the smallest page with __rmqueue_smallest(). This all happens under zone lock, so nobody can steal it from us in the process. This should reduce fragmentation due to fallbacks. At worst we are only stealing a single highest-order page and waste some cycles by moving it between lists and then removing it, but fallback is not exactly hot path so that should not be a concern. As a side benefit the patch removes some duplicate code by reusing __rmqueue_smallest(). [vbabka@suse.cz: fix endless loop in the modified __rmqueue()] Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 62 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2c25de46c58f..2f1118b4dda4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1948,23 +1948,44 @@ static bool can_steal_fallback(unsigned int order, int start_mt) * use it's pages as requested migratetype in the future. */ static void steal_suitable_fallback(struct zone *zone, struct page *page, - int start_type) + int start_type, bool whole_block) { unsigned int current_order = page_order(page); + struct free_area *area; int pages; + /* + * This can happen due to races and we want to prevent broken + * highatomic accounting. + */ + if (is_migrate_highatomic_page(page)) + goto single_page; + /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { change_pageblock_range(page, current_order, start_type); - return; + goto single_page; } + /* We are not allowed to try stealing from the whole block */ + if (!whole_block) + goto single_page; + pages = move_freepages_block(zone, page, start_type); + /* moving whole block can fail due to zone boundary conditions */ + if (!pages) + goto single_page; /* Claim the whole block if over half of it is free */ if (pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) set_pageblock_migratetype(page, start_type); + + return; + +single_page: + area = &zone->free_area[current_order]; + list_move(&page->lru, &area->free_list[start_type]); } /* @@ -2123,8 +2144,13 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, return false; } -/* Remove an element from the buddy allocator from the fallback list */ -static inline struct page * +/* + * Try finding a free buddy page on the fallback list and put it on the free + * list of requested migratetype, possibly along with other pages from the same + * block, depending on fragmentation avoidance heuristics. Returns true if + * fallback was found so that __rmqueue_smallest() can grab it. + */ +static inline bool __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype) { struct free_area *area; @@ -2145,32 +2171,17 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype) page = list_first_entry(&area->free_list[fallback_mt], struct page, lru); - if (can_steal && !is_migrate_highatomic_page(page)) - steal_suitable_fallback(zone, page, start_migratetype); - /* Remove the page from the freelists */ - area->nr_free--; - list_del(&page->lru); - rmv_page_order(page); - - expand(zone, page, order, current_order, area, - start_migratetype); - /* - * The pcppage_migratetype may differ from pageblock's - * migratetype depending on the decisions in - * find_suitable_fallback(). This is OK as long as it does not - * differ for MIGRATE_CMA pageblocks. Those can be used as - * fallback only via special __rmqueue_cma_fallback() function - */ - set_pcppage_migratetype(page, start_migratetype); + steal_suitable_fallback(zone, page, start_migratetype, + can_steal); trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt); - return page; + return true; } - return NULL; + return false; } /* @@ -2182,13 +2193,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order, { struct page *page; +retry: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page)) { if (migratetype == MIGRATE_MOVABLE) page = __rmqueue_cma_fallback(zone, order); - if (!page) - page = __rmqueue_fallback(zone, order, migratetype); + if (!page && __rmqueue_fallback(zone, order, migratetype)) + goto retry; } trace_mm_page_alloc_zone_locked(page, order, migratetype); -- cgit From 02aa0cdd72483c6dd436ed24d1000f86e0038d28 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:40 -0700 Subject: mm, page_alloc: count movable pages when stealing from pageblock When stealing pages from pageblock of a different migratetype, we count how many free pages were stolen, and change the pageblock's migratetype if more than half of the pageblock was free. This might be too conservative, as there might be other pages that are not free, but were allocated with the same migratetype as our allocation requested. While we cannot determine the migratetype of allocated pages precisely (at least without the page_owner functionality enabled), we can count pages that compaction would try to isolate for migration - those are either on LRU or __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be done as part of free page stealing with little additional overhead. The page stealing code is changed so that it considers free pages plus pages of the "good" migratetype for the decision whether to change pageblock's migratetype. The result should be more accurate migratetype of pageblocks wrt the actual pages in the pageblocks, when stealing from semi-occupied pageblocks. This should help the efficiency of page grouping by mobility. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 47%. The number of movable allocations falling back to other pageblocks are increased by 55%, but these events don't cause permanent fragmentation, so the tradeoff should be positive. Later patches also offset the movable fallback increase to some extent. [akpm@linux-foundation.org: merge fix] Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Cc: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-isolation.h | 5 +-- mm/page_alloc.c | 74 +++++++++++++++++++++++++++++++++--------- mm/page_isolation.c | 5 +-- 3 files changed, 63 insertions(+), 21 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 047d64706f2a..d4cd2014fa6f 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, bool skip_hwpoisoned_pages); void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, - int migratetype); -int move_freepages(struct zone *zone, - struct page *start_page, struct page *end_page, - int migratetype); + int migratetype, int *num_movable); /* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2f1118b4dda4..d90792addeb9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1832,9 +1832,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, * Note that start_page and end_pages are not aligned on a pageblock * boundary. If alignment is required, use move_freepages_block() */ -int move_freepages(struct zone *zone, +static int move_freepages(struct zone *zone, struct page *start_page, struct page *end_page, - int migratetype) + int migratetype, int *num_movable) { struct page *page; unsigned int order; @@ -1851,6 +1851,9 @@ int move_freepages(struct zone *zone, VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); #endif + if (num_movable) + *num_movable = 0; + for (page = start_page; page <= end_page;) { if (!pfn_valid_within(page_to_pfn(page))) { page++; @@ -1861,6 +1864,15 @@ int move_freepages(struct zone *zone, VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); if (!PageBuddy(page)) { + /* + * We assume that pages that could be isolated for + * migration are movable. But we don't actually try + * isolating, as that would be expensive. + */ + if (num_movable && + (PageLRU(page) || __PageMovable(page))) + (*num_movable)++; + page++; continue; } @@ -1876,7 +1888,7 @@ int move_freepages(struct zone *zone, } int move_freepages_block(struct zone *zone, struct page *page, - int migratetype) + int migratetype, int *num_movable) { unsigned long start_pfn, end_pfn; struct page *start_page, *end_page; @@ -1893,7 +1905,8 @@ int move_freepages_block(struct zone *zone, struct page *page, if (!zone_spans_pfn(zone, end_pfn)) return 0; - return move_freepages(zone, start_page, end_page, migratetype); + return move_freepages(zone, start_page, end_page, migratetype, + num_movable); } static void change_pageblock_range(struct page *pageblock_page, @@ -1943,22 +1956,26 @@ static bool can_steal_fallback(unsigned int order, int start_mt) /* * This function implements actual steal behaviour. If order is large enough, * we can steal whole pageblock. If not, we first move freepages in this - * pageblock and check whether half of pages are moved or not. If half of - * pages are moved, we can change migratetype of pageblock and permanently - * use it's pages as requested migratetype in the future. + * pageblock to our migratetype and determine how many already-allocated pages + * are there in the pageblock with a compatible migratetype. If at least half + * of pages are free or compatible, we can change migratetype of the pageblock + * itself, so pages freed in the future will be put on the correct free list. */ static void steal_suitable_fallback(struct zone *zone, struct page *page, int start_type, bool whole_block) { unsigned int current_order = page_order(page); struct free_area *area; - int pages; + int free_pages, movable_pages, alike_pages; + int old_block_type; + + old_block_type = get_pageblock_migratetype(page); /* * This can happen due to races and we want to prevent broken * highatomic accounting. */ - if (is_migrate_highatomic_page(page)) + if (is_migrate_highatomic(old_block_type)) goto single_page; /* Take ownership for orders >= pageblock_order */ @@ -1971,13 +1988,39 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, if (!whole_block) goto single_page; - pages = move_freepages_block(zone, page, start_type); + free_pages = move_freepages_block(zone, page, start_type, + &movable_pages); + /* + * Determine how many pages are compatible with our allocation. + * For movable allocation, it's the number of movable pages which + * we just obtained. For other types it's a bit more tricky. + */ + if (start_type == MIGRATE_MOVABLE) { + alike_pages = movable_pages; + } else { + /* + * If we are falling back a RECLAIMABLE or UNMOVABLE allocation + * to MOVABLE pageblock, consider all non-movable pages as + * compatible. If it's UNMOVABLE falling back to RECLAIMABLE or + * vice versa, be conservative since we can't distinguish the + * exact migratetype of non-movable pages. + */ + if (old_block_type == MIGRATE_MOVABLE) + alike_pages = pageblock_nr_pages + - (free_pages + movable_pages); + else + alike_pages = 0; + } + /* moving whole block can fail due to zone boundary conditions */ - if (!pages) + if (!free_pages) goto single_page; - /* Claim the whole block if over half of it is free */ - if (pages >= (1 << (pageblock_order-1)) || + /* + * If a sufficient number of pages in the block are either free or of + * comparable migratability as our allocation, claim the whole block. + */ + if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) set_pageblock_migratetype(page, start_type); @@ -2055,7 +2098,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone, && !is_migrate_cma(mt)) { zone->nr_reserved_highatomic += pageblock_nr_pages; set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); - move_freepages_block(zone, page, MIGRATE_HIGHATOMIC); + move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL); } out_unlock: @@ -2132,7 +2175,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * may increase. */ set_pageblock_migratetype(page, ac->migratetype); - ret = move_freepages_block(zone, page, ac->migratetype); + ret = move_freepages_block(zone, page, ac->migratetype, + NULL); if (ret) { spin_unlock_irqrestore(&zone->lock, flags); return ret; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 7927bbb54a4e..5092e4ef00c8 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -66,7 +66,8 @@ out: set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_isolate_pageblock++; - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE); + nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, + NULL); __mod_zone_freepage_state(zone, -nr_pages, migratetype); } @@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) * pageblock scanning for freepage moving. */ if (!isolated_page) { - nr_pages = move_freepages_block(zone, page, migratetype); + nr_pages = move_freepages_block(zone, page, migratetype, NULL); __mod_zone_freepage_state(zone, nr_pages, migratetype); } set_pageblock_migratetype(page, migratetype); -- cgit From b682debd97153706ffbe2fe3f8ec30a7ee11f9e1 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:43 -0700 Subject: mm, compaction: change migrate_async_suitable() to suitable_migration_source() Preparation for making the decisions more complex and depending on compact_control flags. No functional change. Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 5 +++++ mm/compaction.c | 19 +++++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e0c3c5e3d8a0..ebaccd4e7d8c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES]; # define is_migrate_cma_page(_page) false #endif +static inline bool is_migrate_movable(int mt) +{ + return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; +} + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/mm/compaction.c b/mm/compaction.c index 01b1fb8f6f47..a20876e37648 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -89,11 +89,6 @@ static void map_pages(struct list_head *list) list_splice(&tmp_list, list); } -static inline bool migrate_async_suitable(int migratetype) -{ - return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; -} - #ifdef CONFIG_COMPACTION int PageMovable(struct page *page) @@ -988,6 +983,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, #endif /* CONFIG_COMPACTION || CONFIG_CMA */ #ifdef CONFIG_COMPACTION +static bool suitable_migration_source(struct compact_control *cc, + struct page *page) +{ + if (cc->mode != MIGRATE_ASYNC) + return true; + + return is_migrate_movable(get_pageblock_migratetype(page)); +} + /* Returns true if the page is within a block suitable for migration to */ static bool suitable_migration_target(struct compact_control *cc, struct page *page) @@ -1007,7 +1011,7 @@ static bool suitable_migration_target(struct compact_control *cc, return true; /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ - if (migrate_async_suitable(get_pageblock_migratetype(page))) + if (is_migrate_movable(get_pageblock_migratetype(page))) return true; /* Otherwise skip the block */ @@ -1242,8 +1246,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, * Async compaction is optimistic to see if the minimum amount * of work satisfies the allocation. */ - if (cc->mode == MIGRATE_ASYNC && - !migrate_async_suitable(get_pageblock_migratetype(page))) + if (!suitable_migration_source(cc, page)) continue; /* Perform the isolation */ -- cgit From d39773a0622c267fef3f79e3b1f0e7bdbad8a1a8 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:46 -0700 Subject: mm, compaction: add migratetype to compact_control Preparation patch. We are going to need migratetype at lower layers than compact_zone() and compact_finished(). Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/compaction.c | 15 +++++++-------- mm/internal.h | 1 + 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a20876e37648..365b3c8ae943 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1279,10 +1279,11 @@ static inline bool is_via_compact_memory(int order) return order == -1; } -static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc, - const int migratetype) +static enum compact_result __compact_finished(struct zone *zone, + struct compact_control *cc) { unsigned int order; + const int migratetype = cc->migratetype; if (cc->contended || fatal_signal_pending(current)) return COMPACT_CONTENDED; @@ -1338,12 +1339,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_ } static enum compact_result compact_finished(struct zone *zone, - struct compact_control *cc, - const int migratetype) + struct compact_control *cc) { int ret; - ret = __compact_finished(zone, cc, migratetype); + ret = __compact_finished(zone, cc); trace_mm_compaction_finished(zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; @@ -1476,9 +1476,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro enum compact_result ret; unsigned long start_pfn = zone->zone_start_pfn; unsigned long end_pfn = zone_end_pfn(zone); - const int migratetype = gfpflags_to_migratetype(cc->gfp_mask); const bool sync = cc->mode != MIGRATE_ASYNC; + cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); ret = compaction_suitable(zone, cc->order, cc->alloc_flags, cc->classzone_idx); /* Compaction is likely to fail */ @@ -1528,8 +1528,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro migrate_prep_local(); - while ((ret = compact_finished(zone, cc, migratetype)) == - COMPACT_CONTINUE) { + while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { int err; switch (isolate_migratepages(zone, cc)) { diff --git a/mm/internal.h b/mm/internal.h index 004471b72977..e7e709fd3043 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -193,6 +193,7 @@ struct compact_control { unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ + int migratetype; /* migratetype of direct compactor */ const unsigned int alloc_flags; /* alloc flags of a direct compactor */ const int classzone_idx; /* zone index of a direct compactor */ enum migrate_mode mode; /* Async or sync migration mode */ -- cgit From 282722b0d258ec23fc79d80165418fee83f01736 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:49 -0700 Subject: mm, compaction: restrict async compaction to pageblocks of same migratetype The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE pageblocks. This is a heuristic intended to reduce latency, based on the assumption that non-MOVABLE pageblocks are unlikely to contain movable pages. However, with the exception of THP's, most high-order allocations are not movable. Should the async compaction succeed, this increases the chance that the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the long-term fragmentation worse. This patch attempts to help the situation by changing async direct compaction so that the migrate scanner only scans the pageblocks of the requested migratetype. If it's a non-MOVABLE type and there are such pageblocks that do contain movable pages, chances are that the allocation can succeed within one of such pageblocks, removing the need for a fallback. If that fails, the subsequent sync attempt will ignore this restriction. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 30%. The number of movable allocations falling back is reduced by 12%. Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/compaction.c | 11 +++++++++-- mm/page_alloc.c | 20 +++++++++++++------- 2 files changed, 22 insertions(+), 9 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 365b3c8ae943..206847d35978 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -986,10 +986,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, static bool suitable_migration_source(struct compact_control *cc, struct page *page) { - if (cc->mode != MIGRATE_ASYNC) + int block_mt; + + if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; - return is_migrate_movable(get_pageblock_migratetype(page)); + block_mt = get_pageblock_migratetype(page); + + if (cc->migratetype == MIGRATE_MOVABLE) + return is_migrate_movable(block_mt); + else + return block_mt == cc->migratetype; } /* Returns true if the page is within a block suitable for migration to */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d90792addeb9..e7486afa7fa7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3665,6 +3665,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct page *page = NULL; unsigned int alloc_flags; unsigned long did_some_progress; @@ -3732,12 +3733,17 @@ retry_cpuset: /* * For costly allocations, try direct compaction first, as it's likely - * that we have enough base pages and don't need to reclaim. Don't try - * that for allocations that are allowed to ignore watermarks, as the - * ALLOC_NO_WATERMARKS attempt didn't yet happen. + * that we have enough base pages and don't need to reclaim. For non- + * movable high-order allocations, do that as well, as compaction will + * try prevent permanent fragmentation by migrating from blocks of the + * same migratetype. + * Don't try this for allocations that are allowed to ignore + * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ - if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER && - !gfp_pfmemalloc_allowed(gfp_mask)) { + if (can_direct_reclaim && + (costly_order || + (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) + && !gfp_pfmemalloc_allowed(gfp_mask)) { page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, INIT_COMPACT_PRIORITY, @@ -3749,7 +3755,7 @@ retry_cpuset: * Checks for costly allocations with __GFP_NORETRY, which * includes THP page fault allocations */ - if (gfp_mask & __GFP_NORETRY) { + if (costly_order && (gfp_mask & __GFP_NORETRY)) { /* * If compaction is deferred for high-order allocations, * it is because sync compaction recently failed. If @@ -3830,7 +3836,7 @@ retry: * Do not retry costly high order allocations unless they are * __GFP_REPEAT */ - if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) + if (costly_order && !(gfp_mask & __GFP_REPEAT)) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, -- cgit From baf6a9a1db5a40ebfa5d3e761428d3deb2cc3a3b Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:54:52 -0700 Subject: mm, compaction: finish whole pageblock to reduce fragmentation The main goal of direct compaction is to form a high-order page for allocation, but it should also help against long-term fragmentation when possible. Most lower-than-pageblock-order compactions are for non-movable allocations, which means that if we compact in a movable pageblock and terminate as soon as we create the high-order page, it's unlikely that the fallback heuristics will claim the whole block. Instead there might be a single unmovable page in a pageblock full of movable pages, and the next unmovable allocation might pick another pageblock and increase long-term fragmentation. To help against such scenarios, this patch changes the termination criteria for compaction so that the current pageblock is finished even though the high-order page already exists. Note that it might be possible that the high-order page formed elsewhere in the zone due to parallel activity, but this patch doesn't try to detect that. This is only done with sync compaction, because async compaction is limited to pageblock of the same migratetype, where it cannot result in a migratetype fallback. (Async compaction also eagerly skips order-aligned blocks where isolation fails, which is against the goal of migrating away as much of the pageblock as possible.) As a result of this patch, long-term memory fragmentation should be reduced. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 20%. The number Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++-- mm/internal.h | 1 + 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 206847d35978..613c59e928cb 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone, if (is_via_compact_memory(cc->order)) return COMPACT_CONTINUE; + if (cc->finishing_block) { + /* + * We have finished the pageblock, but better check again that + * we really succeeded. + */ + if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) + cc->finishing_block = false; + else + return COMPACT_CONTINUE; + } + /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { struct free_area *area = &zone->free_area[order]; @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone, * other migratetype buddy lists. */ if (find_suitable_fallback(area, order, migratetype, - true, &can_steal) != -1) - return COMPACT_SUCCESS; + true, &can_steal) != -1) { + + /* movable pages are OK in any pageblock */ + if (migratetype == MIGRATE_MOVABLE) + return COMPACT_SUCCESS; + + /* + * We are stealing for a non-movable allocation. Make + * sure we finish compacting the current pageblock + * first so it is as free as possible and we won't + * have to steal another one soon. This only applies + * to sync compaction, as async compaction operates + * on pageblocks of the same migratetype. + */ + if (cc->mode == MIGRATE_ASYNC || + IS_ALIGNED(cc->migrate_pfn, + pageblock_nr_pages)) { + return COMPACT_SUCCESS; + } + + cc->finishing_block = true; + return COMPACT_CONTINUE; + } } return COMPACT_NO_SUITABLE_PAGE; diff --git a/mm/internal.h b/mm/internal.h index e7e709fd3043..0e4f558412fb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -202,6 +202,7 @@ struct compact_control { bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone; /* Whole zone should/has been scanned */ bool contended; /* Signal lock or sched contention */ + bool finishing_block; /* Finishing current pageblock */ }; unsigned long -- cgit From f245e1c17a702964ad552878d01a10e53cf0e8e5 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Mon, 8 May 2017 15:54:55 -0700 Subject: fs/proc/inode.c: remove cast from memory allocation Coccinelle emits this warning: WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless. Remove unnecessary cast. Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc Signed-off-by: Tobin C. Harding Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cc7a8030275..e250910cffc8 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -58,7 +58,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb) struct proc_inode *ei; struct inode *inode; - ei = (struct proc_inode *)kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL); + ei = kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL); if (!ei) return NULL; ei->pid = NULL; -- cgit From 63259457a2eea54cc3b3a284b4bc7da52398a19a Mon Sep 17 00:00:00 2001 From: Gao Feng Date: Mon, 8 May 2017 15:54:58 -0700 Subject: proc/sysctl: fix the int overflow for jiffies conversion do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to avoid overflow. But actually the *valp is int type, so it still causes overflow. For example, echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time Then, cat ./sys/net/ipv4/tcp_keepalive_time The output is "-1", it is not expected. Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it. Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com Signed-off-by: Gao Feng Cc: Arnaldo Carvalho de Melo Cc: Ingo Molnar Cc: Alexey Dobriyan Cc: Eric Dumazet Cc: Josh Poimboeuf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 21343d110296..4dfba1a76cc3 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2576,7 +2576,7 @@ static int do_proc_dointvec_jiffies_conv(bool *negp, unsigned long *lvalp, int write, void *data) { if (write) { - if (*lvalp > LONG_MAX / HZ) + if (*lvalp > INT_MAX / HZ) return 1; *valp = *negp ? -(*lvalp*HZ) : (*lvalp*HZ); } else { -- cgit From 3d88936f35bc67f7bee2efd4eec69baf26c17a5a Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Mon, 8 May 2017 15:55:02 -0700 Subject: drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked() Moving from get_user_pages() to get_user_pages_unlocked() simplifies the code and takes advantage of VM_FAULT_RETRY functionality when faulting in pages. Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes Cc: Michal Hocko Cc: Paolo Bonzini Cc: Kumar Gala Cc: Mihai Caraman Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/virt/fsl_hypervisor.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/virt/fsl_hypervisor.c b/drivers/virt/fsl_hypervisor.c index 150ce2abf6c8..d3eca879a0a8 100644 --- a/drivers/virt/fsl_hypervisor.c +++ b/drivers/virt/fsl_hypervisor.c @@ -243,11 +243,8 @@ static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user *p) sg_list = PTR_ALIGN(sg_list_unaligned, sizeof(struct fh_sg_list)); /* Get the physical addresses of the source buffer */ - down_read(¤t->mm->mmap_sem); - num_pinned = get_user_pages(param.local_vaddr - lb_offset, - num_pages, (param.source == -1) ? 0 : FOLL_WRITE, - pages, NULL); - up_read(¤t->mm->mmap_sem); + num_pinned = get_user_pages_unlocked(param.local_vaddr - lb_offset, + num_pages, pages, (param.source == -1) ? 0 : FOLL_WRITE); if (num_pinned != num_pages) { /* get_user_pages() failed */ -- cgit From 7c30f352c852bae2715ad65ac4a38ca9af7d7696 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Mon, 8 May 2017 15:55:05 -0700 Subject: jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp jiffies_64 is defined in kernel/time/timer.c with ____cacheline_aligned_in_smp, however this macro is not part of the declaration of jiffies and jiffies_64 in jiffies.h. As a result clang generates the following warning: kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection] __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; ^ include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp' ^ include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned' __section__(".data..cacheline_aligned"))) ^ include/linux/jiffies.h:77:12: note: previous attribute is here extern u64 __jiffy_data jiffies_64; ^ include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data' Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke Cc: "Jason A . Donenfeld" Cc: Grant Grundler Cc: Michael Davidson Cc: Greg Hackmann Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/jiffies.h | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 624215cebee5..36872fbb815d 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -1,6 +1,7 @@ #ifndef _LINUX_JIFFIES_H #define _LINUX_JIFFIES_H +#include #include #include #include @@ -63,19 +64,13 @@ extern int register_refined_jiffies(long clock_tick_rate); /* TICK_USEC is the time between ticks in usec assuming fake USER_HZ */ #define TICK_USEC ((1000000UL + USER_HZ/2) / USER_HZ) -/* some arch's have a small-data section that can be accessed register-relative - * but that can only take up to, say, 4-byte variables. jiffies being part of - * an 8-byte variable may not be correctly accessed unless we force the issue - */ -#define __jiffy_data __attribute__((section(".data"))) - /* * The 64-bit value is not atomic - you MUST NOT read it * without sampling the sequence number in jiffies_lock. * get_jiffies_64() will do this for you as appropriate. */ -extern u64 __jiffy_data jiffies_64; -extern unsigned long volatile __jiffy_data jiffies; +extern u64 __cacheline_aligned_in_smp jiffies_64; +extern unsigned long volatile __cacheline_aligned_in_smp jiffies; #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void); -- cgit From 31b8cc80776c1b5a17abda6e0bbb5c615b9d90e4 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 8 May 2017 15:55:08 -0700 Subject: make help: add tools help target Add a top-level Makefile help target for Userspace tools. Also make each help "heading" end with a colon ':'. Link: http://lkml.kernel.org/r/55c986ff-3966-3e47-2984-7349da2cce51@infradead.org Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 43534cca1de9..220121fdca4d 100644 --- a/Makefile +++ b/Makefile @@ -1374,7 +1374,7 @@ help: @echo ' headers_install - Install sanitised kernel headers to INSTALL_HDR_PATH'; \ echo ' (default: $(INSTALL_HDR_PATH))'; \ echo '' - @echo 'Static analysers' + @echo 'Static analysers:' @echo ' checkstack - Generate a list of stack hogs' @echo ' namespacecheck - Name space analysis on compiled kernel' @echo ' versioncheck - Sanity check on version.h usage' @@ -1384,7 +1384,7 @@ help: @echo ' headerdep - Detect inclusion cycles in headers' @$(MAKE) -f $(srctree)/scripts/Makefile.help checker-help @echo '' - @echo 'Kernel selftest' + @echo 'Kernel selftest:' @echo ' kselftest - Build and run kernel selftest (run as root)' @echo ' Build, install, and boot kernel before' @echo ' running kselftest on it' @@ -1392,6 +1392,10 @@ help: @echo ' kselftest-merge - Merge all the config dependencies of kselftest to existed' @echo ' .config.' @echo '' + @echo 'Userspace tools targets:' + @echo ' use "make tools/help"' + @echo ' or "cd tools; make help"' + @echo '' @echo 'Kernel packaging:' @$(MAKE) $(build)=$(package-dir) help @echo '' -- cgit From 780cbcf28781511d2cb235c375127265209796a8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 8 May 2017 15:55:11 -0700 Subject: kernel/hung_task.c: defer showing held locks When I was running my testcase which may block hundreds of threads on fs locks, I got lockup due to output from debug_show_all_locks() added by commit b2d4c2edb2e4 ("locking/hung_task: Show all locks"). For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state and 500 out of 1000 threads hold some lock, debug_show_all_locks() from for_each_process_thread() loop will report locks held by 500 threads for 1000 times. This is a too much noise. In order to make sure rcu_lock_break() is called frequently, we should avoid calling debug_show_all_locks() from for_each_process_thread() loop because debug_show_all_locks() effectively calls for_each_process_thread() loop. Let's defer calling debug_show_all_locks() till before panic() or leaving for_each_process_thread() loop. Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa Reviewed-by: Vegard Nossum Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/hung_task.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index f0f8e2a8496f..751593ed7c0b 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -43,6 +43,7 @@ unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_ int __read_mostly sysctl_hung_task_warnings = 10; static int __read_mostly did_panic; +static bool hung_task_show_lock; static struct task_struct *watchdog_task; @@ -120,12 +121,14 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); sched_show_task(t); - debug_show_all_locks(); + hung_task_show_lock = true; } touch_nmi_watchdog(); if (sysctl_hung_task_panic) { + if (hung_task_show_lock) + debug_show_all_locks(); trigger_all_cpu_backtrace(); panic("hung_task: blocked tasks"); } @@ -172,6 +175,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) if (test_taint(TAINT_DIE) || did_panic) return; + hung_task_show_lock = false; rcu_read_lock(); for_each_process_thread(g, t) { if (!max_count--) @@ -187,6 +191,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) } unlock: rcu_read_unlock(); + if (hung_task_show_lock) + debug_show_all_locks(); } static long hung_timeout_jiffies(unsigned long last_checked, -- cgit From 146180c052a00172f4dc08eaade836fd02f61fb5 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 8 May 2017 15:55:14 -0700 Subject: drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests The "DIV_ROUND_UP(size, PAGE_SIZE)" operation can overflow if "size" is more than ULLONG_MAX - PAGE_SIZE. Link: http://lkml.kernel.org/r/20170322111950.GA11279@mwanda Signed-off-by: Dan Carpenter Cc: Jorgen Hansen Cc: Masahiro Yamada Cc: Michal Hocko Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/misc/vmw_vmci/vmci_queue_pair.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c index 498c0854305f..06c4974ee8dd 100644 --- a/drivers/misc/vmw_vmci/vmci_queue_pair.c +++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c @@ -298,8 +298,11 @@ static void *qp_alloc_queue(u64 size, u32 flags) size_t pas_size; size_t vas_size; size_t queue_size = sizeof(*queue) + sizeof(*queue->kernel_if); - const u64 num_pages = DIV_ROUND_UP(size, PAGE_SIZE) + 1; + u64 num_pages; + if (size > SIZE_MAX - PAGE_SIZE) + return NULL; + num_pages = DIV_ROUND_UP(size, PAGE_SIZE) + 1; if (num_pages > (SIZE_MAX - queue_size) / (sizeof(*queue->kernel_if->u.g.pas) + @@ -624,9 +627,12 @@ static struct vmci_queue *qp_host_alloc_queue(u64 size) { struct vmci_queue *queue; size_t queue_page_size; - const u64 num_pages = DIV_ROUND_UP(size, PAGE_SIZE) + 1; + u64 num_pages; const size_t queue_size = sizeof(*queue) + sizeof(*(queue->kernel_if)); + if (size > SIZE_MAX - PAGE_SIZE) + return NULL; + num_pages = DIV_ROUND_UP(size, PAGE_SIZE) + 1; if (num_pages > (SIZE_MAX - queue_size) / sizeof(*queue->kernel_if->u.h.page)) return NULL; -- cgit From 8128a31eaadbcdfa37774bbd28f3f00bac69996a Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 8 May 2017 15:55:17 -0700 Subject: drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() c2port_device_register() never returns NULL, it uses error pointers. Link: http://lkml.kernel.org/r/20170412083321.GC3250@mwanda Fixes: 65131cd52b9e ("c2port: add c2port support for Eurotech Duramar 2150") Signed-off-by: Dan Carpenter Acked-by: Rodolfo Giometti Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/misc/c2port/c2port-duramar2150.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/c2port/c2port-duramar2150.c b/drivers/misc/c2port/c2port-duramar2150.c index 5484301d57d9..3dc61ea7dc64 100644 --- a/drivers/misc/c2port/c2port-duramar2150.c +++ b/drivers/misc/c2port/c2port-duramar2150.c @@ -129,8 +129,8 @@ static int __init duramar2150_c2port_init(void) duramar2150_c2port_dev = c2port_device_register("uc", &duramar2150_c2port_ops, NULL); - if (!duramar2150_c2port_dev) { - ret = -ENODEV; + if (IS_ERR(duramar2150_c2port_dev)) { + ret = PTR_ERR(duramar2150_c2port_dev); goto free_region; } -- cgit From ebd03a9aacc7d47a9d5dc7ed63adf594f9a0391c Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 8 May 2017 15:55:20 -0700 Subject: Revert "lib/test_sort.c: make it explicitly non-modular" Patch series "lib: add module support to sort tests". This patch series allows to compile the array-based and linked list sort test code either to loadable modules, or builtin into the kernel. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. This patch (of 3): This reverts commit 8893f519330bb073a49c5b4676fce4be6f1be15d. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. Link: http://lkml.kernel.org/r/1488287219-15832-2-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven Reviewed-by: Andy Shevchenko Cc: Arnd Bergmann Cc: Paul Gortmaker Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/test_sort.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/test_sort.c b/lib/test_sort.c index 4db3911db50a..d389c1cc2f6c 100644 --- a/lib/test_sort.c +++ b/lib/test_sort.c @@ -1,11 +1,8 @@ #include #include -#include +#include -/* - * A simple boot-time regression test - * License: GPL - */ +/* a simple boot-time regression test */ #define TEST_LEN 1000 @@ -41,4 +38,6 @@ exit: kfree(a); return err; } -subsys_initcall(test_sort_init); + +module_init(test_sort_init); +MODULE_LICENSE("GPL"); -- cgit From 5c4e679898eabca844059743c05253cf33b90e94 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 8 May 2017 15:55:23 -0700 Subject: lib: add module support to array-based sort tests Allow to compile the array-based sort test code either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-3-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven Reviewed-by: Andy Shevchenko Cc: Arnd Bergmann Cc: Paul Gortmaker Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/Kconfig.debug | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e2a617e09ab7..de3f7c151320 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1728,10 +1728,11 @@ config TEST_LIST_SORT If unsure, say N. config TEST_SORT - bool "Array-based sort test" - depends on DEBUG_KERNEL + tristate "Array-based sort test" + depends on DEBUG_KERNEL || m help - This option enables the self-test function of 'sort()' at boot. + This option enables the self-test function of 'sort()' at boot, + or at module load time. If unsure, say N. -- cgit From e327fd7c86678ba22d2b7cd25f6ceef1ffe29c8a Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 8 May 2017 15:55:26 -0700 Subject: lib: add module support to linked list sorting tests Extract the linked list sorting test code into its own source file, to allow to compile it either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-4-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven Reviewed-by: Andy Shevchenko Cc: Arnd Bergmann Cc: Paul Gortmaker Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/Kconfig.debug | 7 +-- lib/Makefile | 1 + lib/list_sort.c | 149 -------------------------------------------------- lib/test_list_sort.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 155 insertions(+), 152 deletions(-) create mode 100644 lib/test_list_sort.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index de3f7c151320..e4587ebe52c7 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1719,11 +1719,12 @@ config LKDTM Documentation/fault-injection/provoke-crashes.txt config TEST_LIST_SORT - bool "Linked list sorting test" - depends on DEBUG_KERNEL + tristate "Linked list sorting test" + depends on DEBUG_KERNEL || m help Enable this to turn on 'list_sort()' function test. This test is - executed only once during system boot, so affects only boot time. + executed only once during system boot (so affects only boot time), + or at module load time. If unsure, say N. diff --git a/lib/Makefile b/lib/Makefile index a155c73e3437..0166fbc0fa81 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o obj-$(CONFIG_TEST_KASAN) += test_kasan.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o +obj-$(CONFIG_TEST_LIST_SORT) += test_list_sort.o obj-$(CONFIG_TEST_LKM) += test_module.o obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o obj-$(CONFIG_TEST_SORT) += test_sort.o diff --git a/lib/list_sort.c b/lib/list_sort.c index 3fe401067e20..9e9acc37652f 100644 --- a/lib/list_sort.c +++ b/lib/list_sort.c @@ -1,6 +1,3 @@ - -#define pr_fmt(fmt) "list_sort_test: " fmt - #include #include #include @@ -145,149 +142,3 @@ void list_sort(void *priv, struct list_head *head, merge_and_restore_back_links(priv, cmp, head, part[max_lev], list); } EXPORT_SYMBOL(list_sort); - -#ifdef CONFIG_TEST_LIST_SORT - -#include -#include - -/* - * The pattern of set bits in the list length determines which cases - * are hit in list_sort(). - */ -#define TEST_LIST_LEN (512+128+2) /* not including head */ - -#define TEST_POISON1 0xDEADBEEF -#define TEST_POISON2 0xA324354C - -struct debug_el { - unsigned int poison1; - struct list_head list; - unsigned int poison2; - int value; - unsigned serial; -}; - -/* Array, containing pointers to all elements in the test list */ -static struct debug_el **elts __initdata; - -static int __init check(struct debug_el *ela, struct debug_el *elb) -{ - if (ela->serial >= TEST_LIST_LEN) { - pr_err("error: incorrect serial %d\n", ela->serial); - return -EINVAL; - } - if (elb->serial >= TEST_LIST_LEN) { - pr_err("error: incorrect serial %d\n", elb->serial); - return -EINVAL; - } - if (elts[ela->serial] != ela || elts[elb->serial] != elb) { - pr_err("error: phantom element\n"); - return -EINVAL; - } - if (ela->poison1 != TEST_POISON1 || ela->poison2 != TEST_POISON2) { - pr_err("error: bad poison: %#x/%#x\n", - ela->poison1, ela->poison2); - return -EINVAL; - } - if (elb->poison1 != TEST_POISON1 || elb->poison2 != TEST_POISON2) { - pr_err("error: bad poison: %#x/%#x\n", - elb->poison1, elb->poison2); - return -EINVAL; - } - return 0; -} - -static int __init cmp(void *priv, struct list_head *a, struct list_head *b) -{ - struct debug_el *ela, *elb; - - ela = container_of(a, struct debug_el, list); - elb = container_of(b, struct debug_el, list); - - check(ela, elb); - return ela->value - elb->value; -} - -static int __init list_sort_test(void) -{ - int i, count = 1, err = -ENOMEM; - struct debug_el *el; - struct list_head *cur; - LIST_HEAD(head); - - pr_debug("start testing list_sort()\n"); - - elts = kcalloc(TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL); - if (!elts) { - pr_err("error: cannot allocate memory\n"); - return err; - } - - for (i = 0; i < TEST_LIST_LEN; i++) { - el = kmalloc(sizeof(*el), GFP_KERNEL); - if (!el) { - pr_err("error: cannot allocate memory\n"); - goto exit; - } - /* force some equivalencies */ - el->value = prandom_u32() % (TEST_LIST_LEN / 3); - el->serial = i; - el->poison1 = TEST_POISON1; - el->poison2 = TEST_POISON2; - elts[i] = el; - list_add_tail(&el->list, &head); - } - - list_sort(NULL, &head, cmp); - - err = -EINVAL; - for (cur = head.next; cur->next != &head; cur = cur->next) { - struct debug_el *el1; - int cmp_result; - - if (cur->next->prev != cur) { - pr_err("error: list is corrupted\n"); - goto exit; - } - - cmp_result = cmp(NULL, cur, cur->next); - if (cmp_result > 0) { - pr_err("error: list is not sorted\n"); - goto exit; - } - - el = container_of(cur, struct debug_el, list); - el1 = container_of(cur->next, struct debug_el, list); - if (cmp_result == 0 && el->serial >= el1->serial) { - pr_err("error: order of equivalent elements not " - "preserved\n"); - goto exit; - } - - if (check(el, el1)) { - pr_err("error: element check failed\n"); - goto exit; - } - count++; - } - if (head.prev != cur) { - pr_err("error: list is corrupted\n"); - goto exit; - } - - - if (count != TEST_LIST_LEN) { - pr_err("error: bad list length %d", count); - goto exit; - } - - err = 0; -exit: - for (i = 0; i < TEST_LIST_LEN; i++) - kfree(elts[i]); - kfree(elts); - return err; -} -late_initcall(list_sort_test); -#endif /* CONFIG_TEST_LIST_SORT */ diff --git a/lib/test_list_sort.c b/lib/test_list_sort.c new file mode 100644 index 000000000000..28e817387b04 --- /dev/null +++ b/lib/test_list_sort.c @@ -0,0 +1,150 @@ +#define pr_fmt(fmt) "list_sort_test: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* + * The pattern of set bits in the list length determines which cases + * are hit in list_sort(). + */ +#define TEST_LIST_LEN (512+128+2) /* not including head */ + +#define TEST_POISON1 0xDEADBEEF +#define TEST_POISON2 0xA324354C + +struct debug_el { + unsigned int poison1; + struct list_head list; + unsigned int poison2; + int value; + unsigned serial; +}; + +/* Array, containing pointers to all elements in the test list */ +static struct debug_el **elts __initdata; + +static int __init check(struct debug_el *ela, struct debug_el *elb) +{ + if (ela->serial >= TEST_LIST_LEN) { + pr_err("error: incorrect serial %d\n", ela->serial); + return -EINVAL; + } + if (elb->serial >= TEST_LIST_LEN) { + pr_err("error: incorrect serial %d\n", elb->serial); + return -EINVAL; + } + if (elts[ela->serial] != ela || elts[elb->serial] != elb) { + pr_err("error: phantom element\n"); + return -EINVAL; + } + if (ela->poison1 != TEST_POISON1 || ela->poison2 != TEST_POISON2) { + pr_err("error: bad poison: %#x/%#x\n", + ela->poison1, ela->poison2); + return -EINVAL; + } + if (elb->poison1 != TEST_POISON1 || elb->poison2 != TEST_POISON2) { + pr_err("error: bad poison: %#x/%#x\n", + elb->poison1, elb->poison2); + return -EINVAL; + } + return 0; +} + +static int __init cmp(void *priv, struct list_head *a, struct list_head *b) +{ + struct debug_el *ela, *elb; + + ela = container_of(a, struct debug_el, list); + elb = container_of(b, struct debug_el, list); + + check(ela, elb); + return ela->value - elb->value; +} + +static int __init list_sort_test(void) +{ + int i, count = 1, err = -ENOMEM; + struct debug_el *el; + struct list_head *cur; + LIST_HEAD(head); + + pr_debug("start testing list_sort()\n"); + + elts = kcalloc(TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL); + if (!elts) { + pr_err("error: cannot allocate memory\n"); + return err; + } + + for (i = 0; i < TEST_LIST_LEN; i++) { + el = kmalloc(sizeof(*el), GFP_KERNEL); + if (!el) { + pr_err("error: cannot allocate memory\n"); + goto exit; + } + /* force some equivalencies */ + el->value = prandom_u32() % (TEST_LIST_LEN / 3); + el->serial = i; + el->poison1 = TEST_POISON1; + el->poison2 = TEST_POISON2; + elts[i] = el; + list_add_tail(&el->list, &head); + } + + list_sort(NULL, &head, cmp); + + err = -EINVAL; + for (cur = head.next; cur->next != &head; cur = cur->next) { + struct debug_el *el1; + int cmp_result; + + if (cur->next->prev != cur) { + pr_err("error: list is corrupted\n"); + goto exit; + } + + cmp_result = cmp(NULL, cur, cur->next); + if (cmp_result > 0) { + pr_err("error: list is not sorted\n"); + goto exit; + } + + el = container_of(cur, struct debug_el, list); + el1 = container_of(cur->next, struct debug_el, list); + if (cmp_result == 0 && el->serial >= el1->serial) { + pr_err("error: order of equivalent elements not " + "preserved\n"); + goto exit; + } + + if (check(el, el1)) { + pr_err("error: element check failed\n"); + goto exit; + } + count++; + } + if (head.prev != cur) { + pr_err("error: list is corrupted\n"); + goto exit; + } + + + if (count != TEST_LIST_LEN) { + pr_err("error: bad list length %d", count); + goto exit; + } + + err = 0; +exit: + for (i = 0; i < TEST_LIST_LEN; i++) + kfree(elts[i]); + kfree(elts); + return err; +} +module_init(list_sort_test); +MODULE_LICENSE("GPL"); -- cgit From 0cd5246bf8e61bb1f61a4bd1203573f8a9adf085 Mon Sep 17 00:00:00 2001 From: "Luis R. Rodriguez" Date: Mon, 8 May 2017 15:55:30 -0700 Subject: firmware/Makefile: force recompilation if makefile changes If you modify the target asm we currently do not force the recompilation of the firmware files. The target asm is in the firmware/Makefile, peg this file as a dependency to require re-compilation of firmware targets when the asm changes. Link: http://lkml.kernel.org/r/20170123150727.4883-1-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez Cc: Masahiro Yamada Cc: Michal Marek Cc: Ming Lei Cc: Greg Kroah-Hartman Cc: Tom Gundersen Cc: David Woodhouse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- firmware/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/firmware/Makefile b/firmware/Makefile index e297e1b52636..fa3e81c2a97b 100644 --- a/firmware/Makefile +++ b/firmware/Makefile @@ -176,7 +176,8 @@ quiet_cmd_fwbin = MK_FW $@ wordsize_deps := $(wildcard include/config/64bit.h include/config/32bit.h \ include/config/ppc32.h include/config/ppc64.h \ include/config/superh32.h include/config/superh64.h \ - include/config/x86_32.h include/config/x86_64.h) + include/config/x86_32.h include/config/x86_64.h \ + firmware/Makefile) $(patsubst %,$(obj)/%.gen.S, $(fw-shipped-y)): %: $(wordsize_deps) $(call cmd,fwbin,$(patsubst %.gen.S,%,$@)) -- cgit From cd8618ab3df3ac6018cecb9dc626ff72c39eb503 Mon Sep 17 00:00:00 2001 From: Ruslan Bilovol Date: Mon, 8 May 2017 15:55:33 -0700 Subject: checkpatch: remove obsolete CONFIG_EXPERIMENTAL checks Config EXPERIMENTAL has been removed from kernel in 2013 (see commit 3d374d09f16f: "final removal of CONFIG_EXPERIMENTAL"), there is no any reason to do these checks now. Link: http://lkml.kernel.org/r/1488234097-20119-1-git-send-email-ruslan.bilovol@gmail.com Signed-off-by: Ruslan Bilovol Acked-by: Kees Cook Acked-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index baa3c7be04ad..30eeba4f1602 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2757,13 +2757,6 @@ sub process { #print "is_start<$is_start> is_end<$is_end> length<$length>\n"; } -# discourage the addition of CONFIG_EXPERIMENTAL in Kconfig. - if ($realfile =~ /Kconfig/ && - $line =~ /.\s*depends on\s+.*\bEXPERIMENTAL\b/) { - WARN("CONFIG_EXPERIMENTAL", - "Use of CONFIG_EXPERIMENTAL is deprecated. For alternatives, see https://lkml.org/lkml/2012/10/23/580\n"); - } - # discourage the use of boolean for type definition attributes of Kconfig options if ($realfile =~ /Kconfig/ && $line =~ /^\+\s*\bboolean\b/) { @@ -3157,12 +3150,6 @@ sub process { } } -# discourage the addition of CONFIG_EXPERIMENTAL in #if(def). - if ($line =~ /^\+\s*\#\s*if.*\bCONFIG_EXPERIMENTAL\b/) { - WARN("CONFIG_EXPERIMENTAL", - "Use of CONFIG_EXPERIMENTAL is deprecated. For alternatives, see https://lkml.org/lkml/2012/10/23/580\n"); - } - # check for RCS/CVS revision markers if ($rawline =~ /^\+.*\$(Revision|Log|Id)(?:\$|)/) { WARN("CVS_KEYWORD", -- cgit From 0b523769ebb9473c60df1b0f70615aa82ebac2c9 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:36 -0700 Subject: checkpatch: add ability to find bad uses of vsprintf %p extensions %pK was at least once misused at %pk in an out-of-tree module. This lead to some security concerns. Add the ability to track single and multiple line statements for misuses of %p. [akpm@linux-foundation.org: add helpful comment into lib/vsprintf.c] [akpm@linux-foundation.org: text tweak] Link: http://lkml.kernel.org/r/163a690510e636a23187c0dc9caa09ddac6d4cde.1488228427.git.joe@perches.com Signed-off-by: Joe Perches Acked-by: Kees Cook Acked-by: William Roberts Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/vsprintf.c | 3 +++ scripts/checkpatch.pl | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 176641cc549d..2d41de3f98a1 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1477,6 +1477,9 @@ int kptr_restrict __read_mostly; * by an extra set of alphanumeric characters that are extended format * specifiers. * + * Please update scripts/checkpatch.pl when adding/removing conversion + * characters. (Search for "check for vsprintf extension"). + * * Right now we handle: * * - 'F' For symbolic function descriptor pointers with offset diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 30eeba4f1602..732bb3e2fe9a 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5663,6 +5663,32 @@ sub process { } } + # check for vsprintf extension %p misuses + if ($^V && $^V ge 5.10.0 && + defined $stat && + $stat =~ /^\+(?![^\{]*\{\s*).*\b(\w+)\s*\(.*$String\s*,/s && + $1 !~ /^_*volatile_*$/) { + my $bad_extension = ""; + my $lc = $stat =~ tr@\n@@; + $lc = $lc + $linenr; + for (my $count = $linenr; $count <= $lc; $count++) { + my $fmt = get_quoted_string($lines[$count - 1], raw_line($count, 0)); + $fmt =~ s/%%//g; + if ($fmt =~ /(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGN]).)/) { + $bad_extension = $1; + last; + } + } + if ($bad_extension ne "") { + my $stat_real = raw_line($linenr, 0); + for (my $count = $linenr + 1; $count <= $lc; $count++) { + $stat_real = $stat_real . "\n" . raw_line($count, 0); + } + WARN("VSPRINTF_POINTER_EXTENSION", + "Invalid vsprintf pointer extension '$bad_extension'\n" . "$here\n$stat_real\n"); + } + } + # Check for misused memsets if ($^V && $^V ge 5.10.0 && defined $stat && -- cgit From 4dbed76f2429516b9519620dacdda6c750640e8d Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:39 -0700 Subject: checkpatch: improve EMBEDDED_FUNCTION_NAME test The existing behavior relies on patch context to identify function declarations. Add the ability to find function declarations when there is an open brace in column 1. This finds function declarations only in specific single line forms where the function name is on a single line like: int foo(args...) { and int foo(args...) { It does not recognize function declarations like: int foo(int bar, int baz) { Link: http://lkml.kernel.org/r/738d74bbbe1a06b80f11ed504818107c68903095.1488155636.git.joe@perches.com Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 732bb3e2fe9a..832e8150dba3 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3126,6 +3126,17 @@ sub process { # check we are in a valid C source file if not then ignore this hunk next if ($realfile !~ /\.(h|c)$/); +# check if this appears to be the start function declaration, save the name + if ($sline =~ /^\+\{\s*$/ && + $prevline =~ /^\+(?:(?:(?:$Storage|$Inline)\s*)*\s*$Type\s*)?($Ident)\(/) { + $context_function = $1; + } + +# check if this appears to be the end of function declaration + if ($sline =~ /^\+\}\s*$/) { + undef $context_function; + } + # check indentation of any line with a bare else # (but not if it is a multiple line "if (foo) return bar; else return baz;") # if the previous line is a break or return and is indented 1 tab more... -- cgit From eb3a58de3ec4940fc6b2a9f810895d772a2a9794 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:42 -0700 Subject: checkpatch: allow space leading blank lines in email headers Allow a leading space and otherwise blank link in the email headers as it can be a line wrapped Spamassassin multiple line string or any other valid rfc 2822/5322 email header. The line with space causes checkpatch to erroneously think that it's in the content body, as opposed to headers and thus flag a mail header as an unwrapped long comment line. Link: http://lkml.kernel.org/r/d75a9f0b78b3488078429f4037d9fff3bdfa3b78.1490247180.git.joe@perches.com Signed-off-by: Joe Perches Reported-by: Darren Hart (VMware) Tested-by: Darren Hart (VMware) Reviewed-by: Darren Hart (VMware) Original-patch-by: John 'Warthog9' Hawley (VMware) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 832e8150dba3..089c974aa3a5 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2628,8 +2628,8 @@ sub process { # Check if it's the start of a commit log # (not a header line and we haven't seen the patch filename) if ($in_header_lines && $realfile =~ /^$/ && - !($rawline =~ /^\s+\S/ || - $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) { + !($rawline =~ /^\s+(?:\S|$)/ || + $rawline =~ /^(?:commit\b|from\b|[\w-]+:)/i)) { $in_header_lines = 0; $in_commit_log = 1; $has_commit_log = 1; -- cgit From d9190e4e1a057ad55027cbefd1d21f6eae93fe0e Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:45 -0700 Subject: checkpatch: avoid suggesting struct definitions should be const Many structs are generally used const and there is a known list of these structs. struct definitions should not be generally be declared const. Add a test for the lack of an open brace immediately after the struct to avoid definitions. This avoids the false positive "struct foo should normally be const" message only when the open brace is on the same line as the definition. Link: http://lkml.kernel.org/r/0dce709150d712e66f1b90b03827634b53b28085.1491845946.git.joe@perches.com Signed-off-by: Joe Perches Cc: Arthur Brainville Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 089c974aa3a5..3a1cb9d7474e 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -6090,11 +6090,11 @@ sub process { } # check for various structs that are normally const (ops, kgdb, device_tree) +# and avoid what seem like struct definitions 'struct foo {' if ($line !~ /\bconst\b/ && - $line =~ /\bstruct\s+($const_structs)\b/) { + $line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/) { WARN("CONST_STRUCT", - "struct $1 should normally be const\n" . - $herecurr); + "struct $1 should normally be const\n" . $herecurr); } # use of NR_CPUS is usually wrong -- cgit From e795556a581a849bf33b22521bf8ebc97ab899ea Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:48 -0700 Subject: checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE test The logic currrently misses macros that start with an if statement. e.g.: #define foo(bar) if (bar) baz; Add a test for macro content that starts with if Link: http://lkml.kernel.org/r/a9d41aafe1673889caf1a9850208fb7fd74107a0.1491783914.git.joe@perches.com Signed-off-by: Joe Perches Reported-by: Andreas Mohr Original-patch-by: Alfonso Lima Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3a1cb9d7474e..feb9e856f11d 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -4849,8 +4849,10 @@ sub process { $dstat !~ /^\(\{/ && # ({... $ctx !~ /^.\s*#\s*define\s+TRACE_(?:SYSTEM|INCLUDE_FILE|INCLUDE_PATH)\b/) { - - if ($dstat =~ /;/) { + if ($dstat =~ /^\s*if\b/) { + ERROR("MULTISTATEMENT_MACRO_USE_DO_WHILE", + "Macros starting with if should be enclosed by a do - while loop to avoid possible if/else logic defects\n" . "$herectx"); + } elsif ($dstat =~ /;/) { ERROR("MULTISTATEMENT_MACRO_USE_DO_WHILE", "Macros with multiple statements should be enclosed in a do - while loop\n" . "$herectx"); } else { -- cgit From e4b7d3091def85336d838e38dea91be37bd4d1f5 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:51 -0700 Subject: checkpatch: clarify the EMBEDDED_FUNCTION_NAME message Try to make the conversion of embedded function names to "%s: ", __func__ a bit clearer. Add a bit more information to the comment describing the test too. Link: http://lkml.kernel.org/r/38f5d32f0aec1cd98cb9ceeedd6a736cc9a802db.1491759835.git.joe@perches.com Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index feb9e856f11d..3e2d9b0fe5a5 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5174,14 +5174,16 @@ sub process { "break quoted strings at a space character\n" . $hereprev); } -#check for an embedded function name in a string when the function is known -# as part of a diff. This does not work for -f --file checking as it -#depends on patch context providing the function name +# check for an embedded function name in a string when the function is known +# This does not work very well for -f --file checking as it depends on patch +# context providing the function name or a single line form for in-file +# function declarations if ($line =~ /^\+.*$String/ && defined($context_function) && - get_quoted_string($line, $rawline) =~ /\b$context_function\b/) { + get_quoted_string($line, $rawline) =~ /\b$context_function\b/ && + length(get_quoted_string($line, $rawline)) != (length($context_function) + 2)) { WARN("EMBEDDED_FUNCTION_NAME", - "Prefer using \"%s\", __func__ to embedded function names\n" . $herecurr); + "Prefer using '\"%s...\", __func__' to using '$context_function', this function's name, in a string\n" . $herecurr); } # check for spaces before a quoted newline -- cgit From e882dbfc248cf28d6afd6fc6d8db8be58a824158 Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Mon, 8 May 2017 15:55:54 -0700 Subject: checkpatch: special audit for revert commit line Currently checkpatch.pl does not recognize git's default commit revert message and will complain about the hash format. Add special audit for revert commit message line to fix it. Link: http://lkml.kernel.org/r/20170411191532.74381-1-wvw@google.com Signed-off-by: Wei Wang Acked-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3e2d9b0fe5a5..b1befa2cec26 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2539,6 +2539,7 @@ sub process { # Check for git id commit length and improperly formed commit descriptions if ($in_commit_log && !$commit_log_possible_stack_dump && $line !~ /^\s*(?:Link|Patchwork|http|https|BugLink):/i && + $line !~ /^This reverts commit [0-9a-f]{7,40}/ && ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i || ($line =~ /(?:\s|^)[0-9a-f]{12,40}(?:[\s"'\(\[]|$)/i && $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i && -- cgit From 1b4a2ed4c8773524cc7890c4cd57d58b39c049eb Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:55:57 -0700 Subject: checkpatch: improve k.alloc with multiplication and sizeof test Find multi-line uses of k.alloc by using the $stat variable and not the $line variable. This can still --fix only the single line variant though. Link: http://lkml.kernel.org/r/3f4b23d37cd4c7d8628eefc25afe83ba8fb3ab55.1493167076.git.joe@perches.com Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index b1befa2cec26..d2c074feaa7d 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5922,7 +5922,8 @@ sub process { # check for k[mz]alloc with multiplies that could be kmalloc_array/kcalloc if ($^V && $^V ge 5.10.0 && - $line =~ /\b($Lval)\s*\=\s*(?:$balanced_parens)?\s*(k[mz]alloc)\s*\(\s*($FuncArg)\s*\*\s*($FuncArg)\s*,/) { + defined $stat && + $stat =~ /^\+\s*($Lval)\s*\=\s*(?:$balanced_parens)?\s*(k[mz]alloc)\s*\(\s*($FuncArg)\s*\*\s*($FuncArg)\s*,/) { my $oldfunc = $3; my $a1 = $4; my $a2 = $10; @@ -5936,11 +5937,17 @@ sub process { } if ($r1 !~ /^sizeof\b/ && $r2 =~ /^sizeof\s*\S/ && !($r1 =~ /^$Constant$/ || $r1 =~ /^[A-Z_][A-Z0-9_]*$/)) { + my $ctx = ''; + my $herectx = $here . "\n"; + my $cnt = statement_rawlines($stat); + for (my $n = 0; $n < $cnt; $n++) { + $herectx .= raw_line($linenr, $n) . "\n"; + } if (WARN("ALLOC_WITH_MULTIPLY", - "Prefer $newfunc over $oldfunc with multiply\n" . $herecurr) && + "Prefer $newfunc over $oldfunc with multiply\n" . $herectx) && + $cnt == 1 && $fix) { $fixed[$fixlinenr] =~ s/\b($Lval)\s*\=\s*(?:$balanced_parens)?\s*(k[mz]alloc)\s*\(\s*($FuncArg)\s*\*\s*($FuncArg)/$1 . ' = ' . "$newfunc(" . trim($r1) . ', ' . trim($r2)/e; - } } } -- cgit From 75ad8c575a5ad105e2afc2051c68abceb9c65431 Mon Sep 17 00:00:00 2001 From: Jerome Forissier Date: Mon, 8 May 2017 15:56:00 -0700 Subject: checkpatch: add --typedefsfile When using checkpatch on out-of-tree code, it may occur that some project-specific types are used, which will cause spurious warnings. Add the --typedefsfile option as a way to extend the known types and deal with this issue. This was developed for OP-TEE [1]. We run a Travis job on all pull requests [2], and checkpatch is part of that. The typical false warning we get on a regular basis is with some pointers to functions returning TEE_Result [3], which is a typedef from the GlobalPlatform APIs. We consider it is acceptable to use GP types in the OP-TEE core implementation, that's why this patch would be helpful for us. [1] https://github.com/OP-TEE/optee_os [2] https://travis-ci.org/OP-TEE/optee_os/builds [3] https://travis-ci.org/OP-TEE/optee_os/builds/193355335#L1733 Link: http://lkml.kernel.org/r/ba1124d6dfa599bb0dd1d8919dd45dd09ce541a4.1492702192.git.jerome.forissier@linaro.org Signed-off-by: Jerome Forissier Cc: Joe Perches Cc: Andy Whitcroft Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 52 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index d2c074feaa7d..65bb50076632 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -55,6 +55,7 @@ my $spelling_file = "$D/spelling.txt"; my $codespell = 0; my $codespellfile = "/usr/share/codespell/dictionary.txt"; my $conststructsfile = "$D/const_structs.checkpatch"; +my $typedefsfile = ""; my $color = 1; my $allow_c99_comments = 1; @@ -113,6 +114,7 @@ Options: --codespell Use the codespell dictionary for spelling/typos (default:/usr/share/codespell/dictionary.txt) --codespellfile Use this codespell dictionary + --typedefsfile Read additional types from this file --color Use colors when output is STDOUT (default: on) -h, --help, --version display this help and exit @@ -208,6 +210,7 @@ GetOptions( 'test-only=s' => \$tst_only, 'codespell!' => \$codespell, 'codespellfile=s' => \$codespellfile, + 'typedefsfile=s' => \$typedefsfile, 'color!' => \$color, 'h|help' => \$help, 'version' => \$help @@ -629,28 +632,43 @@ if ($codespell) { $misspellings = join("|", sort keys %spelling_fix) if keys %spelling_fix; -my $const_structs = ""; -if (open(my $conststructs, '<', $conststructsfile)) { - while (<$conststructs>) { - my $line = $_; +sub read_words { + my ($wordsRef, $file) = @_; - $line =~ s/\s*\n?$//g; - $line =~ s/^\s*//g; + if (open(my $words, '<', $file)) { + while (<$words>) { + my $line = $_; - next if ($line =~ m/^\s*#/); - next if ($line =~ m/^\s*$/); - if ($line =~ /\s/) { - print("$conststructsfile: '$line' invalid - ignored\n"); - next; - } + $line =~ s/\s*\n?$//g; + $line =~ s/^\s*//g; - $const_structs .= '|' if ($const_structs ne ""); - $const_structs .= $line; + next if ($line =~ m/^\s*#/); + next if ($line =~ m/^\s*$/); + if ($line =~ /\s/) { + print("$file: '$line' invalid - ignored\n"); + next; + } + + $$wordsRef .= '|' if ($$wordsRef ne ""); + $$wordsRef .= $line; + } + close($file); + return 1; } - close($conststructsfile); -} else { - warn "No structs that should be const will be found - file '$conststructsfile': $!\n"; + + return 0; +} + +my $const_structs = ""; +read_words(\$const_structs, $conststructsfile) + or warn "No structs that should be const will be found - file '$conststructsfile': $!\n"; + +my $typeOtherTypedefs = ""; +if (length($typedefsfile)) { + read_words(\$typeOtherTypedefs, $typedefsfile) + or warn "No additional types will be considered - file '$typedefsfile': $!\n"; } +$typeTypedefs .= '|' . $typeOtherTypedefs if ($typeOtherTypedefs ne ""); sub build_types { my $mods = "(?x: \n" . join("|\n ", (@modifierList, @modifierListFile)) . "\n)"; -- cgit From 74fd4f347bfc10c1b19a18d0760f220eed1b2023 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:56:02 -0700 Subject: checkpatch: improve the embedded function name test for patch contexts The current test works only for a single patch context as it is done in the foreach ($rawlines) loop that precedes the loop where the actual $context_function variable is used. Move the set of $context_function into the foreach (@lines) loop where it is useful for each patch context. Link: http://lkml.kernel.org/r/6c675a31c74fbfad4fc45b9f462303d60ca2a283.1493486091.git.joe@perches.com Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 65bb50076632..33740404cd9e 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2213,8 +2213,7 @@ sub process { } #next; } - if ($rawline=~/^\@\@ -\d+(?:,\d+)? \+(\d+)(,(\d+))? \@\@(.*)/) { - my $context = $4; + if ($rawline =~ /^\@\@ -\d+(?:,\d+)? \+(\d+)(,(\d+))? \@\@/) { $realline=$1-1; if (defined $2) { $realcnt=$3+1; @@ -2223,12 +2222,6 @@ sub process { } $in_comment = 0; - if ($context =~ /\b(\w+)\s*\(/) { - $context_function = $1; - } else { - undef $context_function; - } - # Guestimate if this is a continuing comment. Run # the context looking for a comment "edge". If this # edge is a close comment then we must be in a comment @@ -2299,7 +2292,8 @@ sub process { #extract the line range in the file after the patch is applied if (!$in_commit_log && - $line =~ /^\@\@ -\d+(?:,\d+)? \+(\d+)(,(\d+))? \@\@/) { + $line =~ /^\@\@ -\d+(?:,\d+)? \+(\d+)(,(\d+))? \@\@(.*)/) { + my $context = $4; $is_patch = 1; $first_line = $linenr + 1; $realline=$1-1; @@ -2315,6 +2309,11 @@ sub process { %suppress_whiletrailers = (); %suppress_export = (); $suppress_statement = 0; + if ($context =~ /\b(\w+)\s*\(/) { + $context_function = $1; + } else { + undef $context_function; + } next; # track the line number as we move through the hunk, note that -- cgit From f6950a735f29e782bc219ece22bb91d6e1ab7bbc Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:56:05 -0700 Subject: checkpatch: improve the SUSPECT_CODE_INDENT test The current SUSPECT_CODE_INDENT test does not recognize several defective code style defects where code following a logical test is inappropriately indented. Before this patch, for code like: if (foo) bar(); checkpatch would not emit a warning. Improve the test to warn when code after a logical test has the same indentation as the logical test. Perform the same indentation test for "else" blocks too. Link: http://lkml.kernel.org/r/df2374b68c4a68af2b7ef08afe486584811f610a.1493683942.git.joe@perches.com Signed-off-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- scripts/checkpatch.pl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 33740404cd9e..4b9569fa931b 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3354,7 +3354,7 @@ sub process { } # Check relative indent for conditionals and blocks. - if ($line =~ /\b(?:(?:if|while|for|(?:[a-z_]+|)for_each[a-z_]+)\s*\(|do\b)/ && $line !~ /^.\s*#/ && $line !~ /\}\s*while\s*/) { + if ($line =~ /\b(?:(?:if|while|for|(?:[a-z_]+|)for_each[a-z_]+)\s*\(|(?:do|else)\b)/ && $line !~ /^.\s*#/ && $line !~ /\}\s*while\s*/) { ($stat, $cond, $line_nr_next, $remain_next, $off_next) = ctx_statement_block($linenr, $realcnt, 0) if (!defined $stat); @@ -3446,6 +3446,8 @@ sub process { if ($check && $s ne '' && (($sindent % 8) != 0 || ($sindent < $indent) || + ($sindent == $indent && + ($s !~ /^\s*(?:\}|\{|else\b)/)) || ($sindent > $indent + 8))) { WARN("SUSPECT_CODE_INDENT", "suspect code indent for conditional statements ($indent, $sindent)\n" . $herecurr . "$stat_real\n"); -- cgit From 7fe6a42e87ef20edd7faf1b2f4bf18d95922d1e4 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 May 2017 15:56:08 -0700 Subject: reiserfs: use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Link: http://lkml.kernel.org/r/20170329210419.GA40066@beast Signed-off-by: Kees Cook Cc: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/reiserfs/item_ops.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/reiserfs/item_ops.c b/fs/reiserfs/item_ops.c index aca73dd73906..e3c558d1b78c 100644 --- a/fs/reiserfs/item_ops.c +++ b/fs/reiserfs/item_ops.c @@ -724,18 +724,18 @@ static void errcatch_print_vi(struct virtual_item *vi) } static struct item_operations errcatch_ops = { - errcatch_bytes_number, - errcatch_decrement_key, - errcatch_is_left_mergeable, - errcatch_print_item, - errcatch_check_item, - - errcatch_create_vi, - errcatch_check_left, - errcatch_check_right, - errcatch_part_size, - errcatch_unit_num, - errcatch_print_vi + .bytes_number = errcatch_bytes_number, + .decrement_key = errcatch_decrement_key, + .is_left_mergeable = errcatch_is_left_mergeable, + .print_item = errcatch_print_item, + .check_item = errcatch_check_item, + + .create_vi = errcatch_create_vi, + .check_left = errcatch_check_left, + .check_right = errcatch_check_right, + .part_size = errcatch_part_size, + .unit_num = errcatch_unit_num, + .print_vi = errcatch_print_vi }; #if ! (TYPE_STAT_DATA == 0 && TYPE_INDIRECT == 1 && TYPE_DIRECT == 2 && TYPE_DIRENTRY == 3) -- cgit From 19659c59af52df22d1b85208a2c37b2d46290541 Mon Sep 17 00:00:00 2001 From: Hoeun Ryu Date: Mon, 8 May 2017 15:56:11 -0700 Subject: fork: free vmapped stacks in cache when cpus are offline Using virtually mapped stack, kernel stacks are allocated via vmalloc. In the current implementation, two stacks per cpu can be cached when tasks are freed and the cached stacks are used again in task duplications. But the cached stacks may remain unfreed even when cpu are offline. By adding a cpu hotplug callback to free the cached stacks when a cpu goes offline, the pages of the cached stacks are not wasted. Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com Signed-off-by: Hoeun Ryu Reviewed-by: Thomas Gleixner Acked-by: Michal Hocko Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Kees Cook Cc: "Eric W. Biederman" Cc: Oleg Nesterov Cc: Mateusz Guzik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index dd5a371c392a..55e325f4b457 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -179,6 +179,24 @@ void __weak arch_release_thread_stack(unsigned long *stack) */ #define NR_CACHED_STACKS 2 static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]); + +static int free_vm_stack_cache(unsigned int cpu) +{ + struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu); + int i; + + for (i = 0; i < NR_CACHED_STACKS; i++) { + struct vm_struct *vm_stack = cached_vm_stacks[i]; + + if (!vm_stack) + continue; + + vfree(vm_stack->addr); + cached_vm_stacks[i] = NULL; + } + + return 0; +} #endif static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) @@ -467,6 +485,11 @@ void __init fork_init(void) for (i = 0; i < UCOUNT_COUNTS; i++) { init_user_ns.ucount_max[i] = max_threads/2; } + +#ifdef CONFIG_VMAP_STACK + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "fork:vm_stack_cache", + NULL, free_vm_stack_cache); +#endif } int __weak arch_dup_task_struct(struct task_struct *dst, -- cgit From c311c797998c1e70eade463dd60b843da4f1a203 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Mon, 8 May 2017 15:56:15 -0700 Subject: cpumask: make "nr_cpumask_bits" unsigned Bit searching functions accept "unsigned long" indices but "nr_cpumask_bits" is "int" which is signed, so inevitable sign extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of uses across whole kernel. Change "nr_cpumask_bits" to unsigned, this number can't be negative after all. It allows to do implicit zero-extension on x86_64 without MOVSX. Change signed comparisons into unsigned comparisons where necessary. Other uses looks fine because it is either argument passed to a function or comparison is already unsigned. Net win on allyesconfig type of kernel: ~2.8 KB (!) add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833) function old new delta xen_exit_mmap 691 735 +44 qstat_read 426 440 +14 __cpufreq_cooling_register 1678 1687 +9 trace_rb_cpu_prepare 447 455 +8 vermagic 54 60 +6 nfp_driver_version 54 60 +6 rcu_torture_stats_print 1147 1151 +4 find_next_push_cpu 267 269 +2 xen_irq_resume 961 960 -1 ... init_vp_index 946 906 -40 od_set_powersave_bias 328 281 -47 power_cpu_exit 193 139 -54 arch_show_interrupts 3538 3484 -54 select_idle_sibling 1558 1471 -87 Total: Before=158358910, After=158356077, chg -0.00% Same arguments apply to "nr_cpu_ids" but I haven't yet found enough courage to delve into this issue (and proper fix may require new type "cpu_t" which is whole separate story). Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2 Signed-off-by: Alexey Dobriyan Cc: Rusty Russell Cc: Heiko Carstens Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/mips/kernel/perf_event_mipsxx.c | 2 +- arch/s390/kernel/perf_cpum_sf.c | 2 +- include/linux/cpumask.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c index 9452b02ce079..313a88b2973f 100644 --- a/arch/mips/kernel/perf_event_mipsxx.c +++ b/arch/mips/kernel/perf_event_mipsxx.c @@ -618,7 +618,7 @@ static int mipspmu_event_init(struct perf_event *event) return -ENOENT; } - if (event->cpu >= nr_cpumask_bits || + if ((unsigned int)event->cpu >= nr_cpumask_bits || (event->cpu >= 0 && !cpu_online(event->cpu))) return -ENODEV; diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c index 9a4f279d25ca..ca960d0370d5 100644 --- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -823,7 +823,7 @@ static int cpumsf_pmu_event_init(struct perf_event *event) } /* Check online status of the CPU to which the event is pinned */ - if (event->cpu >= nr_cpumask_bits || + if ((unsigned int)event->cpu >= nr_cpumask_bits || (event->cpu >= 0 && !cpu_online(event->cpu))) return -ENODEV; diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 1a675604b17d..2404ad238c0b 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -40,9 +40,9 @@ extern int nr_cpu_ids; #ifdef CONFIG_CPUMASK_OFFSTACK /* Assuming NR_CPUS is huge, a runtime limit is more efficient. Also, * not all bits may be allocated. */ -#define nr_cpumask_bits nr_cpu_ids +#define nr_cpumask_bits ((unsigned int)nr_cpu_ids) #else -#define nr_cpumask_bits NR_CPUS +#define nr_cpumask_bits ((unsigned int)NR_CPUS) #endif /* -- cgit From 692f66f26a4c19d73249736aa973c13a1521b387 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Mon, 8 May 2017 15:56:18 -0700 Subject: crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini Acked-by: Dave Young Cc: Fenghua Yu Cc: Tony Luck Cc: Eric Biederman Cc: Mahesh Salgaonkar Cc: Vivek Goyal Cc: Michael Ellerman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/Kconfig | 4 + include/linux/crash_core.h | 65 +++++++ include/linux/kexec.h | 57 +----- include/linux/printk.h | 4 +- kernel/Makefile | 1 + kernel/crash_core.c | 445 +++++++++++++++++++++++++++++++++++++++++++++ kernel/kexec_core.c | 403 ---------------------------------------- kernel/ksysfs.c | 8 +- kernel/printk/printk.c | 6 +- 9 files changed, 531 insertions(+), 462 deletions(-) create mode 100644 include/linux/crash_core.h create mode 100644 kernel/crash_core.c diff --git a/arch/Kconfig b/arch/Kconfig index 640999412d11..dcbd462b68b1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -2,7 +2,11 @@ # General architecture dependent options # +config CRASH_CORE + bool + config KEXEC_CORE + select CRASH_CORE bool config HAVE_IMA_KEXEC diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h new file mode 100644 index 000000000000..18d0f946fda3 --- /dev/null +++ b/include/linux/crash_core.h @@ -0,0 +1,65 @@ +#ifndef LINUX_CRASH_CORE_H +#define LINUX_CRASH_CORE_H + +#include +#include +#include + +#define CRASH_CORE_NOTE_NAME "CORE" +#define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4) +#define CRASH_CORE_NOTE_NAME_BYTES ALIGN(sizeof(CRASH_CORE_NOTE_NAME), 4) +#define CRASH_CORE_NOTE_DESC_BYTES ALIGN(sizeof(struct elf_prstatus), 4) + +#define CRASH_CORE_NOTE_BYTES ((CRASH_CORE_NOTE_HEAD_BYTES * 2) + \ + CRASH_CORE_NOTE_NAME_BYTES + \ + CRASH_CORE_NOTE_DESC_BYTES) + +#define VMCOREINFO_BYTES (4096) +#define VMCOREINFO_NOTE_NAME "VMCOREINFO" +#define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4) +#define VMCOREINFO_NOTE_SIZE ((CRASH_CORE_NOTE_HEAD_BYTES * 2) + \ + VMCOREINFO_NOTE_NAME_BYTES + \ + VMCOREINFO_BYTES) + +typedef u32 note_buf_t[CRASH_CORE_NOTE_BYTES/4]; + +void crash_save_vmcoreinfo(void); +void arch_crash_save_vmcoreinfo(void); +__printf(1, 2) +void vmcoreinfo_append_str(const char *fmt, ...); +phys_addr_t paddr_vmcoreinfo_note(void); + +#define VMCOREINFO_OSRELEASE(value) \ + vmcoreinfo_append_str("OSRELEASE=%s\n", value) +#define VMCOREINFO_PAGESIZE(value) \ + vmcoreinfo_append_str("PAGESIZE=%ld\n", value) +#define VMCOREINFO_SYMBOL(name) \ + vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) +#define VMCOREINFO_SIZE(name) \ + vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ + (unsigned long)sizeof(name)) +#define VMCOREINFO_STRUCT_SIZE(name) \ + vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ + (unsigned long)sizeof(struct name)) +#define VMCOREINFO_OFFSET(name, field) \ + vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \ + (unsigned long)offsetof(struct name, field)) +#define VMCOREINFO_LENGTH(name, value) \ + vmcoreinfo_append_str("LENGTH(%s)=%lu\n", #name, (unsigned long)value) +#define VMCOREINFO_NUMBER(name) \ + vmcoreinfo_append_str("NUMBER(%s)=%ld\n", #name, (long)name) +#define VMCOREINFO_CONFIG(name) \ + vmcoreinfo_append_str("CONFIG_%s=y\n", #name) + +extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; +extern size_t vmcoreinfo_size; +extern size_t vmcoreinfo_max_size; + +int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, + unsigned long long *crash_size, unsigned long long *crash_base); +int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, + unsigned long long *crash_size, unsigned long long *crash_base); +int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, + unsigned long long *crash_size, unsigned long long *crash_base); + +#endif /* LINUX_CRASH_CORE_H */ diff --git a/include/linux/kexec.h b/include/linux/kexec.h index d419d0e51fe5..c9481ebcbc0c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -14,17 +14,15 @@ #if !defined(__ASSEMBLY__) +#include #include #include #ifdef CONFIG_KEXEC_CORE #include -#include #include #include -#include -#include #include #include @@ -62,19 +60,15 @@ #define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE #endif -#define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4) -#define KEXEC_CORE_NOTE_NAME "CORE" -#define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4) -#define KEXEC_CORE_NOTE_DESC_BYTES ALIGN(sizeof(struct elf_prstatus), 4) +#define KEXEC_CORE_NOTE_NAME CRASH_CORE_NOTE_NAME + /* * The per-cpu notes area is a list of notes terminated by a "NULL" * note header. For kdump, the code in vmcore.c runs in the context * of the second kernel to combine them into one note. */ #ifndef KEXEC_NOTE_BYTES -#define KEXEC_NOTE_BYTES ( (KEXEC_NOTE_HEAD_BYTES * 2) + \ - KEXEC_CORE_NOTE_NAME_BYTES + \ - KEXEC_CORE_NOTE_DESC_BYTES ) +#define KEXEC_NOTE_BYTES CRASH_CORE_NOTE_BYTES #endif /* @@ -256,33 +250,6 @@ extern void crash_kexec(struct pt_regs *); int kexec_should_crash(struct task_struct *); int kexec_crash_loaded(void); void crash_save_cpu(struct pt_regs *regs, int cpu); -void crash_save_vmcoreinfo(void); -void arch_crash_save_vmcoreinfo(void); -__printf(1, 2) -void vmcoreinfo_append_str(const char *fmt, ...); -phys_addr_t paddr_vmcoreinfo_note(void); - -#define VMCOREINFO_OSRELEASE(value) \ - vmcoreinfo_append_str("OSRELEASE=%s\n", value) -#define VMCOREINFO_PAGESIZE(value) \ - vmcoreinfo_append_str("PAGESIZE=%ld\n", value) -#define VMCOREINFO_SYMBOL(name) \ - vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name) -#define VMCOREINFO_SIZE(name) \ - vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ - (unsigned long)sizeof(name)) -#define VMCOREINFO_STRUCT_SIZE(name) \ - vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \ - (unsigned long)sizeof(struct name)) -#define VMCOREINFO_OFFSET(name, field) \ - vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \ - (unsigned long)offsetof(struct name, field)) -#define VMCOREINFO_LENGTH(name, value) \ - vmcoreinfo_append_str("LENGTH(%s)=%lu\n", #name, (unsigned long)value) -#define VMCOREINFO_NUMBER(name) \ - vmcoreinfo_append_str("NUMBER(%s)=%ld\n", #name, (long)name) -#define VMCOREINFO_CONFIG(name) \ - vmcoreinfo_append_str("CONFIG_%s=y\n", #name) extern struct kimage *kexec_image; extern struct kimage *kexec_crash_image; @@ -303,31 +270,15 @@ extern int kexec_load_disabled; #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ KEXEC_FILE_NO_INITRAMFS) -#define VMCOREINFO_BYTES (4096) -#define VMCOREINFO_NOTE_NAME "VMCOREINFO" -#define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4) -#define VMCOREINFO_NOTE_SIZE (KEXEC_NOTE_HEAD_BYTES*2 + VMCOREINFO_BYTES \ - + VMCOREINFO_NOTE_NAME_BYTES) - /* Location of a reserved region to hold the crash kernel. */ extern struct resource crashk_res; extern struct resource crashk_low_res; -typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4]; extern note_buf_t __percpu *crash_notes; -extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; -extern size_t vmcoreinfo_size; -extern size_t vmcoreinfo_max_size; /* flag to track if kexec reboot is in progress */ extern bool kexec_in_progress; -int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); int crash_shrink_memory(unsigned long new_size); size_t crash_get_memory_size(void); void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); diff --git a/include/linux/printk.h b/include/linux/printk.h index 571257e0f53d..e10f27468322 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -198,7 +198,7 @@ extern void wake_up_klogd(void); char *log_buf_addr_get(void); u32 log_buf_len_get(void); -void log_buf_kexec_setup(void); +void log_buf_vmcoreinfo_setup(void); void __init setup_log_buf(int early); __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...); void dump_stack_print_info(const char *log_lvl); @@ -246,7 +246,7 @@ static inline u32 log_buf_len_get(void) return 0; } -static inline void log_buf_kexec_setup(void) +static inline void log_buf_vmcoreinfo_setup(void) { } diff --git a/kernel/Makefile b/kernel/Makefile index b302b4731d16..72aa080f91f0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_MODULE_SIG) += module_signing.o obj-$(CONFIG_KALLSYMS) += kallsyms.o obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o +obj-$(CONFIG_CRASH_CORE) += crash_core.o obj-$(CONFIG_KEXEC_CORE) += kexec_core.o obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_KEXEC_FILE) += kexec_file.o diff --git a/kernel/crash_core.c b/kernel/crash_core.c new file mode 100644 index 000000000000..4261587a34d2 --- /dev/null +++ b/kernel/crash_core.c @@ -0,0 +1,445 @@ +/* + * crash.c - kernel crash support code. + * Copyright (C) 2002-2004 Eric Biederman + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include +#include +#include + +#include +#include + +/* vmcoreinfo stuff */ +static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES]; +u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; +size_t vmcoreinfo_size; +size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); + +/* + * parsing the "crashkernel" commandline + * + * this code is intended to be called from architecture specific code + */ + + +/* + * This function parses command lines in the format + * + * crashkernel=ramsize-range:size[,...][@offset] + * + * The function returns 0 on success and -EINVAL on failure. + */ +static int __init parse_crashkernel_mem(char *cmdline, + unsigned long long system_ram, + unsigned long long *crash_size, + unsigned long long *crash_base) +{ + char *cur = cmdline, *tmp; + + /* for each entry of the comma-separated list */ + do { + unsigned long long start, end = ULLONG_MAX, size; + + /* get the start of the range */ + start = memparse(cur, &tmp); + if (cur == tmp) { + pr_warn("crashkernel: Memory value expected\n"); + return -EINVAL; + } + cur = tmp; + if (*cur != '-') { + pr_warn("crashkernel: '-' expected\n"); + return -EINVAL; + } + cur++; + + /* if no ':' is here, than we read the end */ + if (*cur != ':') { + end = memparse(cur, &tmp); + if (cur == tmp) { + pr_warn("crashkernel: Memory value expected\n"); + return -EINVAL; + } + cur = tmp; + if (end <= start) { + pr_warn("crashkernel: end <= start\n"); + return -EINVAL; + } + } + + if (*cur != ':') { + pr_warn("crashkernel: ':' expected\n"); + return -EINVAL; + } + cur++; + + size = memparse(cur, &tmp); + if (cur == tmp) { + pr_warn("Memory value expected\n"); + return -EINVAL; + } + cur = tmp; + if (size >= system_ram) { + pr_warn("crashkernel: invalid size\n"); + return -EINVAL; + } + + /* match ? */ + if (system_ram >= start && system_ram < end) { + *crash_size = size; + break; + } + } while (*cur++ == ','); + + if (*crash_size > 0) { + while (*cur && *cur != ' ' && *cur != '@') + cur++; + if (*cur == '@') { + cur++; + *crash_base = memparse(cur, &tmp); + if (cur == tmp) { + pr_warn("Memory value expected after '@'\n"); + return -EINVAL; + } + } + } + + return 0; +} + +/* + * That function parses "simple" (old) crashkernel command lines like + * + * crashkernel=size[@offset] + * + * It returns 0 on success and -EINVAL on failure. + */ +static int __init parse_crashkernel_simple(char *cmdline, + unsigned long long *crash_size, + unsigned long long *crash_base) +{ + char *cur = cmdline; + + *crash_size = memparse(cmdline, &cur); + if (cmdline == cur) { + pr_warn("crashkernel: memory value expected\n"); + return -EINVAL; + } + + if (*cur == '@') + *crash_base = memparse(cur+1, &cur); + else if (*cur != ' ' && *cur != '\0') { + pr_warn("crashkernel: unrecognized char: %c\n", *cur); + return -EINVAL; + } + + return 0; +} + +#define SUFFIX_HIGH 0 +#define SUFFIX_LOW 1 +#define SUFFIX_NULL 2 +static __initdata char *suffix_tbl[] = { + [SUFFIX_HIGH] = ",high", + [SUFFIX_LOW] = ",low", + [SUFFIX_NULL] = NULL, +}; + +/* + * That function parses "suffix" crashkernel command lines like + * + * crashkernel=size,[high|low] + * + * It returns 0 on success and -EINVAL on failure. + */ +static int __init parse_crashkernel_suffix(char *cmdline, + unsigned long long *crash_size, + const char *suffix) +{ + char *cur = cmdline; + + *crash_size = memparse(cmdline, &cur); + if (cmdline == cur) { + pr_warn("crashkernel: memory value expected\n"); + return -EINVAL; + } + + /* check with suffix */ + if (strncmp(cur, suffix, strlen(suffix))) { + pr_warn("crashkernel: unrecognized char: %c\n", *cur); + return -EINVAL; + } + cur += strlen(suffix); + if (*cur != ' ' && *cur != '\0') { + pr_warn("crashkernel: unrecognized char: %c\n", *cur); + return -EINVAL; + } + + return 0; +} + +static __init char *get_last_crashkernel(char *cmdline, + const char *name, + const char *suffix) +{ + char *p = cmdline, *ck_cmdline = NULL; + + /* find crashkernel and use the last one if there are more */ + p = strstr(p, name); + while (p) { + char *end_p = strchr(p, ' '); + char *q; + + if (!end_p) + end_p = p + strlen(p); + + if (!suffix) { + int i; + + /* skip the one with any known suffix */ + for (i = 0; suffix_tbl[i]; i++) { + q = end_p - strlen(suffix_tbl[i]); + if (!strncmp(q, suffix_tbl[i], + strlen(suffix_tbl[i]))) + goto next; + } + ck_cmdline = p; + } else { + q = end_p - strlen(suffix); + if (!strncmp(q, suffix, strlen(suffix))) + ck_cmdline = p; + } +next: + p = strstr(p+1, name); + } + + if (!ck_cmdline) + return NULL; + + return ck_cmdline; +} + +static int __init __parse_crashkernel(char *cmdline, + unsigned long long system_ram, + unsigned long long *crash_size, + unsigned long long *crash_base, + const char *name, + const char *suffix) +{ + char *first_colon, *first_space; + char *ck_cmdline; + + BUG_ON(!crash_size || !crash_base); + *crash_size = 0; + *crash_base = 0; + + ck_cmdline = get_last_crashkernel(cmdline, name, suffix); + + if (!ck_cmdline) + return -EINVAL; + + ck_cmdline += strlen(name); + + if (suffix) + return parse_crashkernel_suffix(ck_cmdline, crash_size, + suffix); + /* + * if the commandline contains a ':', then that's the extended + * syntax -- if not, it must be the classic syntax + */ + first_colon = strchr(ck_cmdline, ':'); + first_space = strchr(ck_cmdline, ' '); + if (first_colon && (!first_space || first_colon < first_space)) + return parse_crashkernel_mem(ck_cmdline, system_ram, + crash_size, crash_base); + + return parse_crashkernel_simple(ck_cmdline, crash_size, crash_base); +} + +/* + * That function is the entry point for command line parsing and should be + * called from the arch-specific code. + */ +int __init parse_crashkernel(char *cmdline, + unsigned long long system_ram, + unsigned long long *crash_size, + unsigned long long *crash_base) +{ + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + "crashkernel=", NULL); +} + +int __init parse_crashkernel_high(char *cmdline, + unsigned long long system_ram, + unsigned long long *crash_size, + unsigned long long *crash_base) +{ + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + "crashkernel=", suffix_tbl[SUFFIX_HIGH]); +} + +int __init parse_crashkernel_low(char *cmdline, + unsigned long long system_ram, + unsigned long long *crash_size, + unsigned long long *crash_base) +{ + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, + "crashkernel=", suffix_tbl[SUFFIX_LOW]); +} + +static u32 *append_elf_note(u32 *buf, char *name, unsigned int type, + void *data, size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) + 3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void update_vmcoreinfo_note(void) +{ + u32 *buf = vmcoreinfo_note; + + if (!vmcoreinfo_size) + return; + buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data, + vmcoreinfo_size); + final_note(buf); +} + +void crash_save_vmcoreinfo(void) +{ + vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); + update_vmcoreinfo_note(); +} + +void vmcoreinfo_append_str(const char *fmt, ...) +{ + va_list args; + char buf[0x50]; + size_t r; + + va_start(args, fmt); + r = vscnprintf(buf, sizeof(buf), fmt, args); + va_end(args); + + r = min(r, vmcoreinfo_max_size - vmcoreinfo_size); + + memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r); + + vmcoreinfo_size += r; +} + +/* + * provide an empty default implementation here -- architecture + * code may override this + */ +void __weak arch_crash_save_vmcoreinfo(void) +{} + +phys_addr_t __weak paddr_vmcoreinfo_note(void) +{ + return __pa_symbol((unsigned long)(char *)&vmcoreinfo_note); +} + +static int __init crash_save_vmcoreinfo_init(void) +{ + VMCOREINFO_OSRELEASE(init_uts_ns.name.release); + VMCOREINFO_PAGESIZE(PAGE_SIZE); + + VMCOREINFO_SYMBOL(init_uts_ns); + VMCOREINFO_SYMBOL(node_online_map); +#ifdef CONFIG_MMU + VMCOREINFO_SYMBOL(swapper_pg_dir); +#endif + VMCOREINFO_SYMBOL(_stext); + VMCOREINFO_SYMBOL(vmap_area_list); + +#ifndef CONFIG_NEED_MULTIPLE_NODES + VMCOREINFO_SYMBOL(mem_map); + VMCOREINFO_SYMBOL(contig_page_data); +#endif +#ifdef CONFIG_SPARSEMEM + VMCOREINFO_SYMBOL(mem_section); + VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); + VMCOREINFO_STRUCT_SIZE(mem_section); + VMCOREINFO_OFFSET(mem_section, section_mem_map); +#endif + VMCOREINFO_STRUCT_SIZE(page); + VMCOREINFO_STRUCT_SIZE(pglist_data); + VMCOREINFO_STRUCT_SIZE(zone); + VMCOREINFO_STRUCT_SIZE(free_area); + VMCOREINFO_STRUCT_SIZE(list_head); + VMCOREINFO_SIZE(nodemask_t); + VMCOREINFO_OFFSET(page, flags); + VMCOREINFO_OFFSET(page, _refcount); + VMCOREINFO_OFFSET(page, mapping); + VMCOREINFO_OFFSET(page, lru); + VMCOREINFO_OFFSET(page, _mapcount); + VMCOREINFO_OFFSET(page, private); + VMCOREINFO_OFFSET(page, compound_dtor); + VMCOREINFO_OFFSET(page, compound_order); + VMCOREINFO_OFFSET(page, compound_head); + VMCOREINFO_OFFSET(pglist_data, node_zones); + VMCOREINFO_OFFSET(pglist_data, nr_zones); +#ifdef CONFIG_FLAT_NODE_MEM_MAP + VMCOREINFO_OFFSET(pglist_data, node_mem_map); +#endif + VMCOREINFO_OFFSET(pglist_data, node_start_pfn); + VMCOREINFO_OFFSET(pglist_data, node_spanned_pages); + VMCOREINFO_OFFSET(pglist_data, node_id); + VMCOREINFO_OFFSET(zone, free_area); + VMCOREINFO_OFFSET(zone, vm_stat); + VMCOREINFO_OFFSET(zone, spanned_pages); + VMCOREINFO_OFFSET(free_area, free_list); + VMCOREINFO_OFFSET(list_head, next); + VMCOREINFO_OFFSET(list_head, prev); + VMCOREINFO_OFFSET(vmap_area, va_start); + VMCOREINFO_OFFSET(vmap_area, list); + VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER); + log_buf_vmcoreinfo_setup(); + VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); + VMCOREINFO_NUMBER(NR_FREE_PAGES); + VMCOREINFO_NUMBER(PG_lru); + VMCOREINFO_NUMBER(PG_private); + VMCOREINFO_NUMBER(PG_swapcache); + VMCOREINFO_NUMBER(PG_slab); +#ifdef CONFIG_MEMORY_FAILURE + VMCOREINFO_NUMBER(PG_hwpoison); +#endif + VMCOREINFO_NUMBER(PG_head_mask); + VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); +#ifdef CONFIG_HUGETLB_PAGE + VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR); +#endif + + arch_crash_save_vmcoreinfo(); + update_vmcoreinfo_note(); + + return 0; +} + +subsys_initcall(crash_save_vmcoreinfo_init); diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index bfe62d5b3872..9dd722912850 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -51,12 +51,6 @@ DEFINE_MUTEX(kexec_mutex); /* Per cpu memory for storing cpu states in case of system crash. */ note_buf_t __percpu *crash_notes; -/* vmcoreinfo stuff */ -static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES]; -u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; -size_t vmcoreinfo_size; -size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); - /* Flag to indicate we are going to kexec a new kernel */ bool kexec_in_progress = false; @@ -1084,403 +1078,6 @@ static int __init crash_notes_memory_init(void) subsys_initcall(crash_notes_memory_init); -/* - * parsing the "crashkernel" commandline - * - * this code is intended to be called from architecture specific code - */ - - -/* - * This function parses command lines in the format - * - * crashkernel=ramsize-range:size[,...][@offset] - * - * The function returns 0 on success and -EINVAL on failure. - */ -static int __init parse_crashkernel_mem(char *cmdline, - unsigned long long system_ram, - unsigned long long *crash_size, - unsigned long long *crash_base) -{ - char *cur = cmdline, *tmp; - - /* for each entry of the comma-separated list */ - do { - unsigned long long start, end = ULLONG_MAX, size; - - /* get the start of the range */ - start = memparse(cur, &tmp); - if (cur == tmp) { - pr_warn("crashkernel: Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (*cur != '-') { - pr_warn("crashkernel: '-' expected\n"); - return -EINVAL; - } - cur++; - - /* if no ':' is here, than we read the end */ - if (*cur != ':') { - end = memparse(cur, &tmp); - if (cur == tmp) { - pr_warn("crashkernel: Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (end <= start) { - pr_warn("crashkernel: end <= start\n"); - return -EINVAL; - } - } - - if (*cur != ':') { - pr_warn("crashkernel: ':' expected\n"); - return -EINVAL; - } - cur++; - - size = memparse(cur, &tmp); - if (cur == tmp) { - pr_warn("Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (size >= system_ram) { - pr_warn("crashkernel: invalid size\n"); - return -EINVAL; - } - - /* match ? */ - if (system_ram >= start && system_ram < end) { - *crash_size = size; - break; - } - } while (*cur++ == ','); - - if (*crash_size > 0) { - while (*cur && *cur != ' ' && *cur != '@') - cur++; - if (*cur == '@') { - cur++; - *crash_base = memparse(cur, &tmp); - if (cur == tmp) { - pr_warn("Memory value expected after '@'\n"); - return -EINVAL; - } - } - } - - return 0; -} - -/* - * That function parses "simple" (old) crashkernel command lines like - * - * crashkernel=size[@offset] - * - * It returns 0 on success and -EINVAL on failure. - */ -static int __init parse_crashkernel_simple(char *cmdline, - unsigned long long *crash_size, - unsigned long long *crash_base) -{ - char *cur = cmdline; - - *crash_size = memparse(cmdline, &cur); - if (cmdline == cur) { - pr_warn("crashkernel: memory value expected\n"); - return -EINVAL; - } - - if (*cur == '@') - *crash_base = memparse(cur+1, &cur); - else if (*cur != ' ' && *cur != '\0') { - pr_warn("crashkernel: unrecognized char: %c\n", *cur); - return -EINVAL; - } - - return 0; -} - -#define SUFFIX_HIGH 0 -#define SUFFIX_LOW 1 -#define SUFFIX_NULL 2 -static __initdata char *suffix_tbl[] = { - [SUFFIX_HIGH] = ",high", - [SUFFIX_LOW] = ",low", - [SUFFIX_NULL] = NULL, -}; - -/* - * That function parses "suffix" crashkernel command lines like - * - * crashkernel=size,[high|low] - * - * It returns 0 on success and -EINVAL on failure. - */ -static int __init parse_crashkernel_suffix(char *cmdline, - unsigned long long *crash_size, - const char *suffix) -{ - char *cur = cmdline; - - *crash_size = memparse(cmdline, &cur); - if (cmdline == cur) { - pr_warn("crashkernel: memory value expected\n"); - return -EINVAL; - } - - /* check with suffix */ - if (strncmp(cur, suffix, strlen(suffix))) { - pr_warn("crashkernel: unrecognized char: %c\n", *cur); - return -EINVAL; - } - cur += strlen(suffix); - if (*cur != ' ' && *cur != '\0') { - pr_warn("crashkernel: unrecognized char: %c\n", *cur); - return -EINVAL; - } - - return 0; -} - -static __init char *get_last_crashkernel(char *cmdline, - const char *name, - const char *suffix) -{ - char *p = cmdline, *ck_cmdline = NULL; - - /* find crashkernel and use the last one if there are more */ - p = strstr(p, name); - while (p) { - char *end_p = strchr(p, ' '); - char *q; - - if (!end_p) - end_p = p + strlen(p); - - if (!suffix) { - int i; - - /* skip the one with any known suffix */ - for (i = 0; suffix_tbl[i]; i++) { - q = end_p - strlen(suffix_tbl[i]); - if (!strncmp(q, suffix_tbl[i], - strlen(suffix_tbl[i]))) - goto next; - } - ck_cmdline = p; - } else { - q = end_p - strlen(suffix); - if (!strncmp(q, suffix, strlen(suffix))) - ck_cmdline = p; - } -next: - p = strstr(p+1, name); - } - - if (!ck_cmdline) - return NULL; - - return ck_cmdline; -} - -static int __init __parse_crashkernel(char *cmdline, - unsigned long long system_ram, - unsigned long long *crash_size, - unsigned long long *crash_base, - const char *name, - const char *suffix) -{ - char *first_colon, *first_space; - char *ck_cmdline; - - BUG_ON(!crash_size || !crash_base); - *crash_size = 0; - *crash_base = 0; - - ck_cmdline = get_last_crashkernel(cmdline, name, suffix); - - if (!ck_cmdline) - return -EINVAL; - - ck_cmdline += strlen(name); - - if (suffix) - return parse_crashkernel_suffix(ck_cmdline, crash_size, - suffix); - /* - * if the commandline contains a ':', then that's the extended - * syntax -- if not, it must be the classic syntax - */ - first_colon = strchr(ck_cmdline, ':'); - first_space = strchr(ck_cmdline, ' '); - if (first_colon && (!first_space || first_colon < first_space)) - return parse_crashkernel_mem(ck_cmdline, system_ram, - crash_size, crash_base); - - return parse_crashkernel_simple(ck_cmdline, crash_size, crash_base); -} - -/* - * That function is the entry point for command line parsing and should be - * called from the arch-specific code. - */ -int __init parse_crashkernel(char *cmdline, - unsigned long long system_ram, - unsigned long long *crash_size, - unsigned long long *crash_base) -{ - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", NULL); -} - -int __init parse_crashkernel_high(char *cmdline, - unsigned long long system_ram, - unsigned long long *crash_size, - unsigned long long *crash_base) -{ - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_HIGH]); -} - -int __init parse_crashkernel_low(char *cmdline, - unsigned long long system_ram, - unsigned long long *crash_size, - unsigned long long *crash_base) -{ - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_LOW]); -} - -static void update_vmcoreinfo_note(void) -{ - u32 *buf = vmcoreinfo_note; - - if (!vmcoreinfo_size) - return; - buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data, - vmcoreinfo_size); - final_note(buf); -} - -void crash_save_vmcoreinfo(void) -{ - vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); - update_vmcoreinfo_note(); -} - -void vmcoreinfo_append_str(const char *fmt, ...) -{ - va_list args; - char buf[0x50]; - size_t r; - - va_start(args, fmt); - r = vscnprintf(buf, sizeof(buf), fmt, args); - va_end(args); - - r = min(r, vmcoreinfo_max_size - vmcoreinfo_size); - - memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r); - - vmcoreinfo_size += r; -} - -/* - * provide an empty default implementation here -- architecture - * code may override this - */ -void __weak arch_crash_save_vmcoreinfo(void) -{} - -phys_addr_t __weak paddr_vmcoreinfo_note(void) -{ - return __pa_symbol((unsigned long)(char *)&vmcoreinfo_note); -} - -static int __init crash_save_vmcoreinfo_init(void) -{ - VMCOREINFO_OSRELEASE(init_uts_ns.name.release); - VMCOREINFO_PAGESIZE(PAGE_SIZE); - - VMCOREINFO_SYMBOL(init_uts_ns); - VMCOREINFO_SYMBOL(node_online_map); -#ifdef CONFIG_MMU - VMCOREINFO_SYMBOL(swapper_pg_dir); -#endif - VMCOREINFO_SYMBOL(_stext); - VMCOREINFO_SYMBOL(vmap_area_list); - -#ifndef CONFIG_NEED_MULTIPLE_NODES - VMCOREINFO_SYMBOL(mem_map); - VMCOREINFO_SYMBOL(contig_page_data); -#endif -#ifdef CONFIG_SPARSEMEM - VMCOREINFO_SYMBOL(mem_section); - VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); - VMCOREINFO_STRUCT_SIZE(mem_section); - VMCOREINFO_OFFSET(mem_section, section_mem_map); -#endif - VMCOREINFO_STRUCT_SIZE(page); - VMCOREINFO_STRUCT_SIZE(pglist_data); - VMCOREINFO_STRUCT_SIZE(zone); - VMCOREINFO_STRUCT_SIZE(free_area); - VMCOREINFO_STRUCT_SIZE(list_head); - VMCOREINFO_SIZE(nodemask_t); - VMCOREINFO_OFFSET(page, flags); - VMCOREINFO_OFFSET(page, _refcount); - VMCOREINFO_OFFSET(page, mapping); - VMCOREINFO_OFFSET(page, lru); - VMCOREINFO_OFFSET(page, _mapcount); - VMCOREINFO_OFFSET(page, private); - VMCOREINFO_OFFSET(page, compound_dtor); - VMCOREINFO_OFFSET(page, compound_order); - VMCOREINFO_OFFSET(page, compound_head); - VMCOREINFO_OFFSET(pglist_data, node_zones); - VMCOREINFO_OFFSET(pglist_data, nr_zones); -#ifdef CONFIG_FLAT_NODE_MEM_MAP - VMCOREINFO_OFFSET(pglist_data, node_mem_map); -#endif - VMCOREINFO_OFFSET(pglist_data, node_start_pfn); - VMCOREINFO_OFFSET(pglist_data, node_spanned_pages); - VMCOREINFO_OFFSET(pglist_data, node_id); - VMCOREINFO_OFFSET(zone, free_area); - VMCOREINFO_OFFSET(zone, vm_stat); - VMCOREINFO_OFFSET(zone, spanned_pages); - VMCOREINFO_OFFSET(free_area, free_list); - VMCOREINFO_OFFSET(list_head, next); - VMCOREINFO_OFFSET(list_head, prev); - VMCOREINFO_OFFSET(vmap_area, va_start); - VMCOREINFO_OFFSET(vmap_area, list); - VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER); - log_buf_kexec_setup(); - VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); - VMCOREINFO_NUMBER(NR_FREE_PAGES); - VMCOREINFO_NUMBER(PG_lru); - VMCOREINFO_NUMBER(PG_private); - VMCOREINFO_NUMBER(PG_swapcache); - VMCOREINFO_NUMBER(PG_slab); -#ifdef CONFIG_MEMORY_FAILURE - VMCOREINFO_NUMBER(PG_hwpoison); -#endif - VMCOREINFO_NUMBER(PG_head_mask); - VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); -#ifdef CONFIG_HUGETLB_PAGE - VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR); -#endif - - arch_crash_save_vmcoreinfo(); - update_vmcoreinfo_note(); - - return 0; -} - -subsys_initcall(crash_save_vmcoreinfo_init); - /* * Move into place and start executing a preloaded standalone * executable. If nothing was preloaded return an error. diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index 0999679d6f26..23cd70651238 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -125,6 +125,10 @@ static ssize_t kexec_crash_size_store(struct kobject *kobj, } KERNEL_ATTR_RW(kexec_crash_size); +#endif /* CONFIG_KEXEC_CORE */ + +#ifdef CONFIG_CRASH_CORE + static ssize_t vmcoreinfo_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -134,7 +138,7 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj, } KERNEL_ATTR_RO(vmcoreinfo); -#endif /* CONFIG_KEXEC_CORE */ +#endif /* CONFIG_CRASH_CORE */ /* whether file capabilities are enabled */ static ssize_t fscaps_show(struct kobject *kobj, @@ -219,6 +223,8 @@ static struct attribute * kernel_attrs[] = { &kexec_loaded_attr.attr, &kexec_crash_loaded_attr.attr, &kexec_crash_size_attr.attr, +#endif +#ifdef CONFIG_CRASH_CORE &vmcoreinfo_attr.attr, #endif #ifndef CONFIG_TINY_RCU diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 779479ac9f57..fb2d1591f671 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -32,7 +32,7 @@ #include #include #include -#include +#include #include #include #include @@ -1002,7 +1002,7 @@ const struct file_operations kmsg_fops = { .release = devkmsg_release, }; -#ifdef CONFIG_KEXEC_CORE +#ifdef CONFIG_CRASH_CORE /* * This appends the listed symbols to /proc/vmcore * @@ -1011,7 +1011,7 @@ const struct file_operations kmsg_fops = { * symbols are specifically used so that utilities can access and extract the * dmesg log from a vmcore file after a crash. */ -void log_buf_kexec_setup(void) +void log_buf_vmcoreinfo_setup(void) { VMCOREINFO_SYMBOL(log_buf); VMCOREINFO_SYMBOL(log_buf_len); -- cgit From 51dbd92520d4344fef78481b1bcbc3a7de32b69b Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Mon, 8 May 2017 15:56:21 -0700 Subject: ia64: reuse append_elf_note() and final_note() functions Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini Acked-by: Dave Young Acked-by: Tony Luck Cc: Fenghua Yu Cc: Eric Biederman Cc: Mahesh Salgaonkar Cc: Vivek Goyal Cc: Michael Ellerman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ia64/kernel/crash.c | 22 ---------------------- include/linux/crash_core.h | 4 ++++ include/linux/elf.h | 2 ++ kernel/crash_core.c | 34 ++++++++++++++-------------------- kernel/kexec_core.c | 28 ---------------------------- 5 files changed, 20 insertions(+), 70 deletions(-) diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c index 2955f359e2a7..75859a07d75b 100644 --- a/arch/ia64/kernel/crash.c +++ b/arch/ia64/kernel/crash.c @@ -27,28 +27,6 @@ static int kdump_freeze_monarch; static int kdump_on_init = 1; static int kdump_on_fatal_mca = 1; -static inline Elf64_Word -*append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data, - size_t data_len) -{ - struct elf_note *note = (struct elf_note *)buf; - note->n_namesz = strlen(name) + 1; - note->n_descsz = data_len; - note->n_type = type; - buf += (sizeof(*note) + 3)/4; - memcpy(buf, name, note->n_namesz); - buf += (note->n_namesz + 3)/4; - memcpy(buf, data, data_len); - buf += (data_len + 3)/4; - return buf; -} - -static void -final_note(void *buf) -{ - memset(buf, 0, sizeof(struct elf_note)); -} - extern void ia64_dump_cpu_regs(void *); static DEFINE_PER_CPU(struct elf_prstatus, elf_prstatus); diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 18d0f946fda3..541a197ba4a2 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -55,6 +55,10 @@ extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; extern size_t vmcoreinfo_size; extern size_t vmcoreinfo_max_size; +Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, + void *data, size_t data_len); +void final_note(Elf_Word *buf); + int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, diff --git a/include/linux/elf.h b/include/linux/elf.h index 20fa8d8ae313..ba069e8f4f78 100644 --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -29,6 +29,7 @@ extern Elf32_Dyn _DYNAMIC []; #define elf_note elf32_note #define elf_addr_t Elf32_Off #define Elf_Half Elf32_Half +#define Elf_Word Elf32_Word #else @@ -39,6 +40,7 @@ extern Elf64_Dyn _DYNAMIC []; #define elf_note elf64_note #define elf_addr_t Elf64_Off #define Elf_Half Elf64_Half +#define Elf_Word Elf64_Word #endif diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 4261587a34d2..fcbd568f1e95 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -291,32 +291,26 @@ int __init parse_crashkernel_low(char *cmdline, "crashkernel=", suffix_tbl[SUFFIX_LOW]); } -static u32 *append_elf_note(u32 *buf, char *name, unsigned int type, - void *data, size_t data_len) +Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, + void *data, size_t data_len) { - struct elf_note note; - - note.n_namesz = strlen(name) + 1; - note.n_descsz = data_len; - note.n_type = type; - memcpy(buf, ¬e, sizeof(note)); - buf += (sizeof(note) + 3)/4; - memcpy(buf, name, note.n_namesz); - buf += (note.n_namesz + 3)/4; - memcpy(buf, data, note.n_descsz); - buf += (note.n_descsz + 3)/4; + struct elf_note *note = (struct elf_note *)buf; + + note->n_namesz = strlen(name) + 1; + note->n_descsz = data_len; + note->n_type = type; + buf += DIV_ROUND_UP(sizeof(*note), sizeof(Elf_Word)); + memcpy(buf, name, note->n_namesz); + buf += DIV_ROUND_UP(note->n_namesz, sizeof(Elf_Word)); + memcpy(buf, data, data_len); + buf += DIV_ROUND_UP(data_len, sizeof(Elf_Word)); return buf; } -static void final_note(u32 *buf) +void final_note(Elf_Word *buf) { - struct elf_note note; - - note.n_namesz = 0; - note.n_descsz = 0; - note.n_type = 0; - memcpy(buf, ¬e, sizeof(note)); + memset(buf, 0, sizeof(struct elf_note)); } static void update_vmcoreinfo_note(void) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 9dd722912850..ae1a3ba24df5 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -990,34 +990,6 @@ unlock: return ret; } -static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, - size_t data_len) -{ - struct elf_note note; - - note.n_namesz = strlen(name) + 1; - note.n_descsz = data_len; - note.n_type = type; - memcpy(buf, ¬e, sizeof(note)); - buf += (sizeof(note) + 3)/4; - memcpy(buf, name, note.n_namesz); - buf += (note.n_namesz + 3)/4; - memcpy(buf, data, note.n_descsz); - buf += (note.n_descsz + 3)/4; - - return buf; -} - -static void final_note(u32 *buf) -{ - struct elf_note note; - - note.n_namesz = 0; - note.n_descsz = 0; - note.n_type = 0; - memcpy(buf, ¬e, sizeof(note)); -} - void crash_save_cpu(struct pt_regs *regs, int cpu) { struct elf_prstatus prstatus; -- cgit From 22bd0177bd08677a8888f4d1d8361b0326f9119b Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Mon, 8 May 2017 15:56:24 -0700 Subject: powerpc/fadump: remove dependency with CONFIG_KEXEC Now that crashkernel parameter parsing and vmcoreinfo related code is moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of definitions of fadump_append_elf_note() & fadump_final_note() functions to reuse similar functions compiled under CONFIG_CRASH_CORE. Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar Acked-by: Michael Ellerman Cc: Fenghua Yu Cc: Tony Luck Cc: Dave Young Cc: Eric Biederman Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/powerpc/Kconfig | 10 ++++++---- arch/powerpc/include/asm/fadump.h | 2 ++ arch/powerpc/kernel/crash.c | 2 -- arch/powerpc/kernel/fadump.c | 34 +++------------------------------- arch/powerpc/kernel/setup-common.c | 5 +++++ 5 files changed, 16 insertions(+), 37 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index f07f727cbfd2..d8834e8bfb05 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -571,21 +571,23 @@ config RELOCATABLE_TEST relocation code. config CRASH_DUMP - bool "Build a kdump crash kernel" + bool "Build a dump capture kernel" depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP) select RELOCATABLE if PPC64 || 44x || FSL_BOOKE help - Build a kernel suitable for use as a kdump capture kernel. + Build a kernel suitable for use as a dump capture kernel. The same kernel binary can be used as production kernel and dump capture kernel. config FA_DUMP bool "Firmware-assisted dump" - depends on PPC64 && PPC_RTAS && CRASH_DUMP && KEXEC_CORE + depends on PPC64 && PPC_RTAS + select CRASH_CORE + select CRASH_DUMP help A robust mechanism to get reliable kernel crash dump with assistance from firmware. This approach does not use kexec, - instead firmware assists in booting the kdump kernel + instead firmware assists in booting the capture kernel while preserving memory contents. Firmware-assisted dump is meant to be a kdump replacement offering robustness and speed not possible without system firmware assistance. diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 0031806475f0..60b91084f33c 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -73,6 +73,8 @@ reg_entry++; \ }) +extern int crashing_cpu; + /* Kernel Dump section info */ struct fadump_section { __be32 request_flag; diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 47b63de81f9b..cbabb5adccd9 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -43,8 +43,6 @@ #define IPI_TIMEOUT 10000 #define REAL_MODE_TIMEOUT 10000 -/* This keeps a track of which one is the crashing cpu. */ -int crashing_cpu = -1; static int time_to_dump; #define CRASH_HANDLER_MAX 3 diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 243dbef7e926..d2e1476d9870 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -527,34 +527,6 @@ fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) return reg_entry; } -static u32 *fadump_append_elf_note(u32 *buf, char *name, unsigned type, - void *data, size_t data_len) -{ - struct elf_note note; - - note.n_namesz = strlen(name) + 1; - note.n_descsz = data_len; - note.n_type = type; - memcpy(buf, ¬e, sizeof(note)); - buf += (sizeof(note) + 3)/4; - memcpy(buf, name, note.n_namesz); - buf += (note.n_namesz + 3)/4; - memcpy(buf, data, note.n_descsz); - buf += (note.n_descsz + 3)/4; - - return buf; -} - -static void fadump_final_note(u32 *buf) -{ - struct elf_note note; - - note.n_namesz = 0; - note.n_descsz = 0; - note.n_type = 0; - memcpy(buf, ¬e, sizeof(note)); -} - static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) { struct elf_prstatus prstatus; @@ -565,8 +537,8 @@ static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) * prstatus.pr_pid = ???? */ elf_core_copy_kernel_regs(&prstatus.pr_reg, regs); - buf = fadump_append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS, - &prstatus, sizeof(prstatus)); + buf = append_elf_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS, + &prstatus, sizeof(prstatus)); return buf; } @@ -707,7 +679,7 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) note_buf = fadump_regs_to_elf_notes(note_buf, ®s); } } - fadump_final_note(note_buf); + final_note(note_buf); if (fdh) { pr_debug("Updating elfcore header (%llx) with cpu notes\n", diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 5c10b5925ac2..69e077180db6 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -125,6 +125,11 @@ int ppc_do_canonicalize_irqs; EXPORT_SYMBOL(ppc_do_canonicalize_irqs); #endif +#ifdef CONFIG_CRASH_CORE +/* This keeps a track of which one is the crashing cpu. */ +int crashing_cpu = -1; +#endif + /* also used by kexec */ void machine_shutdown(void) { -- cgit From 11550dc0a00b793236e8dedcf1f489f4627ddf7e Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Mon, 8 May 2017 15:56:28 -0700 Subject: powerpc/fadump: reuse crashkernel parameter for fadump memory reservation fadump supports specifying memory to reserve for fadump's crash kernel with fadump_reserve_mem kernel parameter. This parameter currently supports passing a fixed memory size, like fadump_reserve_mem= only. This patch aims to add support for other syntaxes like range-based memory size :[,:,:,...] which allows using the same parameter to boot the kernel with different system RAM sizes. As crashkernel parameter already supports the above mentioned syntaxes, this patch deprecates fadump_reserve_mem parameter and reuses crashkernel parameter instead, to specify memory for fadump's crash kernel memory reservation as well. If any offset is provided in crashkernel parameter, it will be ignored in case of fadump, as fadump reserves memory at end of RAM. Advantages using crashkernel parameter instead of fadump_reserve_mem parameter are one less kernel parameter overall, code reuse and support for multiple syntaxes to specify memory. Suggested-by: Dave Young Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar Acked-by: Michael Ellerman Cc: Fenghua Yu Cc: Tony Luck Cc: Dave Young Cc: Eric Biederman Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/powerpc/kernel/fadump.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d2e1476d9870..466569e26278 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -209,14 +209,20 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm, */ static inline unsigned long fadump_calculate_reserve_size(void) { - unsigned long size; + int ret; + unsigned long long base, size; /* - * Check if the size is specified through fadump_reserve_mem= cmdline - * option. If yes, then use that. + * Check if the size is specified through crashkernel= cmdline + * option. If yes, then use that but ignore base as fadump + * reserves memory at end of RAM. */ - if (fw_dump.reserve_bootvar) + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + &size, &base); + if (ret == 0 && size > 0) { + fw_dump.reserve_bootvar = (unsigned long)size; return fw_dump.reserve_bootvar; + } /* divide by 20 to get 5% of value */ size = memblock_end_of_DRAM() / 20; @@ -371,15 +377,6 @@ static int __init early_fadump_param(char *p) } early_param("fadump", early_fadump_param); -/* Look for fadump_reserve_mem= cmdline option */ -static int __init early_fadump_reserve_mem(char *p) -{ - if (p) - fw_dump.reserve_bootvar = memparse(p, &p); - return 0; -} -early_param("fadump_reserve_mem", early_fadump_reserve_mem); - static void register_fw_dump(struct fadump_mem_struct *fdm) { int rc; -- cgit From 92019efc6cf2126edc482342446f1252d091081a Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Mon, 8 May 2017 15:56:31 -0700 Subject: powerpc/fadump: update documentation about crashkernel parameter reuse As we are reusing crashkernel parameter instead of fadump_reserve_mem parameter to specify the memory to reserve for fadump's crash kernel, update the documentation accordingly. Link: http://lkml.kernel.org/r/149035347559.6881.14224829694291758581.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini Acked-by: Michael Ellerman Cc: Fenghua Yu Cc: Tony Luck Cc: Dave Young Cc: Eric Biederman Cc: Mahesh Salgaonkar Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/powerpc/firmware-assisted-dump.txt | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 19b1e3d09a19..9cabaf8a207e 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -55,10 +55,14 @@ as follows: booted with restricted memory. By default, the boot memory size will be the larger of 5% of system RAM or 256MB. Alternatively, user can also specify boot memory size - through boot parameter 'fadump_reserve_mem=' which will - override the default calculated size. Use this option - if default boot memory size is not sufficient for second - kernel to boot successfully. + through boot parameter 'crashkernel=' which will override + the default calculated size. Use this option if default + boot memory size is not sufficient for second kernel to + boot successfully. For syntax of crashkernel= parameter, + refer to Documentation/kdump/kdump.txt. If any offset is + provided in crashkernel= parameter, it will be ignored + as fadump reserves memory at end of RAM for boot memory + dump preservation in case of a crash. -- After the low memory (boot memory) area has been saved, the firmware will reset PCI and other hardware state. It will @@ -158,13 +162,16 @@ How to enable firmware-assisted dump (fadump): 1. Set config option CONFIG_FA_DUMP=y and build kernel. 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. -3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline +3. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. -NOTE: If firmware-assisted dump fails to reserve memory then it will - fallback to existing kdump mechanism if 'crashkernel=' option - is set at kernel cmdline. +NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead + use 'crashkernel=' to specify size of the memory to reserve + for boot memory dump preservation. + 2. If firmware-assisted dump fails to reserve memory then it + will fallback to existing kdump mechanism if 'crashkernel=' + option is set at kernel cmdline. Sysfs/debugfs files: ------------ -- cgit From 8896c23d2ef803f1883fea73117a435925c2b4c4 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 8 May 2017 15:56:34 -0700 Subject: pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() alloc_pidmap() advances pid_namespace::last_pid. When first pid allocation fails, then next created process will have pid 2 and pid_ns_prepare_proc() won't be called. So, pid_namespace::proc_mnt will never be initialized (not to mention that there won't be a child reaper). I saw crash stack of such case on kernel 3.10: BUG: unable to handle kernel NULL pointer dereference at (null) IP: proc_flush_task+0x8f/0x1b0 Call Trace: release_task+0x3f/0x490 wait_consider_task.part.10+0x7ff/0xb00 do_wait+0x11f/0x280 SyS_wait4+0x7d/0x110 We may fix this by restore of last_pid in 0 or by prohibiting of futher allocations. Since there was a similar issue in Oleg Nesterov's commit 314a8ad0f18a ("pidns: fix free_pid() to handle the first fork failure"). and it was fixed via prohibiting allocation, let's follow this way, and do the same. Link: http://lkml.kernel.org/r/149201021004.4863.6762095011554287922.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai Acked-by: Cyrill Gorcunov Cc: Andrei Vagin Cc: Andreas Gruenbacher Cc: Kees Cook Cc: Michael Kerrisk Cc: Al Viro Cc: Oleg Nesterov Cc: Paul Moore Cc: Eric Biederman Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Serge Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/pid.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/pid.c b/kernel/pid.c index 0143ac0ddceb..fd1cde1e4576 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -321,8 +321,10 @@ struct pid *alloc_pid(struct pid_namespace *ns) } if (unlikely(is_child_reaper(pid))) { - if (pid_ns_prepare_proc(ns)) + if (pid_ns_prepare_proc(ns)) { + disable_pid_allocation(ns); goto out_free; + } } get_pid_ns(ns); -- cgit From 25b14e92af1a563c7331466ca59188f88050bbf0 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 8 May 2017 15:56:38 -0700 Subject: ns: allow ns_entries to have custom symlink content Patch series "Expose task pid_ns_for_children to userspace". pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. If they have a custom pid_ns_for_children before dump, they must have the same ns after restore. Otherwise, restored task bumped into enviroment it does not expect. This patchset solves the problem. It exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] This patch (of 2): Make possible to have link content prefix yyy different from the link name xxx: $ readlink /proc/[pid]/ns/xxx yyy:[4026531838] This will be used in next patch. Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai Reviewed-by: Cyrill Gorcunov Acked-by: Andrei Vagin Cc: Andreas Gruenbacher Cc: Kees Cook Cc: Michael Kerrisk Cc: Al Viro Cc: Oleg Nesterov Cc: Paul Moore Cc: Eric Biederman Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Serge Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/nsfs.c | 4 +++- include/linux/proc_ns.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/nsfs.c b/fs/nsfs.c index 323f492e0822..f3db56e83dd2 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -196,9 +196,11 @@ int ns_get_name(char *buf, size_t size, struct task_struct *task, { struct ns_common *ns; int res = -ENOENT; + const char *name; ns = ns_ops->get(task); if (ns) { - res = snprintf(buf, size, "%s:[%u]", ns_ops->name, ns->inum); + name = ns_ops->real_ns_name ? : ns_ops->name; + res = snprintf(buf, size, "%s:[%u]", name, ns->inum); ns_ops->put(ns); } return res; diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 12cb8bd81d2d..88dba3b53375 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -14,6 +14,7 @@ struct inode; struct proc_ns_operations { const char *name; + const char *real_ns_name; int type; struct ns_common *(*get)(struct task_struct *task); void (*put)(struct ns_common *ns); -- cgit From eaa0d190bfe1ed891b814a52712dcd852554cb08 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 8 May 2017 15:56:41 -0700 Subject: pidns: expose task pid_ns_for_children to userspace pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai Cc: Andrei Vagin Cc: Andreas Gruenbacher Cc: Kees Cook Cc: Michael Kerrisk Cc: Al Viro Cc: Oleg Nesterov Cc: Paul Moore Cc: Eric Biederman Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Serge Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/namespaces.c | 1 + include/linux/proc_ns.h | 1 + kernel/pid_namespace.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+) diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c index 766f0c637ad1..3803b24ca220 100644 --- a/fs/proc/namespaces.c +++ b/fs/proc/namespaces.c @@ -23,6 +23,7 @@ static const struct proc_ns_operations *ns_entries[] = { #endif #ifdef CONFIG_PID_NS &pidns_operations, + &pidns_for_children_operations, #endif #ifdef CONFIG_USER_NS &userns_operations, diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 88dba3b53375..58ab28d81fc2 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -27,6 +27,7 @@ extern const struct proc_ns_operations netns_operations; extern const struct proc_ns_operations utsns_operations; extern const struct proc_ns_operations ipcns_operations; extern const struct proc_ns_operations pidns_operations; +extern const struct proc_ns_operations pidns_for_children_operations; extern const struct proc_ns_operations userns_operations; extern const struct proc_ns_operations mntns_operations; extern const struct proc_ns_operations cgroupns_operations; diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index de461aa0bf9a..d1f3e9f558b8 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -374,6 +374,29 @@ static struct ns_common *pidns_get(struct task_struct *task) return ns ? &ns->ns : NULL; } +static struct ns_common *pidns_for_children_get(struct task_struct *task) +{ + struct pid_namespace *ns = NULL; + + task_lock(task); + if (task->nsproxy) { + ns = task->nsproxy->pid_ns_for_children; + get_pid_ns(ns); + } + task_unlock(task); + + if (ns) { + read_lock(&tasklist_lock); + if (!ns->child_reaper) { + put_pid_ns(ns); + ns = NULL; + } + read_unlock(&tasklist_lock); + } + + return ns ? &ns->ns : NULL; +} + static void pidns_put(struct ns_common *ns) { put_pid_ns(to_pid_ns(ns)); @@ -443,6 +466,17 @@ const struct proc_ns_operations pidns_operations = { .get_parent = pidns_get_parent, }; +const struct proc_ns_operations pidns_for_children_operations = { + .name = "pid_for_children", + .real_ns_name = "pid", + .type = CLONE_NEWPID, + .get = pidns_for_children_get, + .put = pidns_put, + .install = pidns_install, + .owner = pidns_owner, + .get_parent = pidns_get_parent, +}; + static __init int pid_namespaces_init(void) { pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC); -- cgit From 8c733420bdd5a6cc2d8540fb5aa32d1fbf3cb3ff Mon Sep 17 00:00:00 2001 From: Zhang Xiao Date: Mon, 8 May 2017 15:56:45 -0700 Subject: taskstats: add e/u/stime for TGID command The elapsed time, user CPU time and system CPU time for the thread group status request are presently left at zero. Fill these in. [akpm@linux-foundation.org: run ktime_get_ns() a single time] [akpm@linux-foundation.org: include linux/sched/cputime.h for task_cputime()] Link: http://lkml.kernel.org/r/1488508424-12322-1-git-send-email-xiao.zhang@windriver.com Signed-off-by: Zhang Xiao Cc: Balbir Singh Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/taskstats.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/taskstats.c b/kernel/taskstats.c index 8a5e44236f78..4559e914452b 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -30,6 +30,7 @@ #include #include #include +#include /* * Maximum length of a cpumask that can be specified in @@ -210,6 +211,8 @@ static int fill_stats_for_tgid(pid_t tgid, struct taskstats *stats) struct task_struct *tsk, *first; unsigned long flags; int rc = -ESRCH; + u64 delta, utime, stime; + u64 start_time; /* * Add additional stats from live tasks except zombie thread group @@ -227,6 +230,7 @@ static int fill_stats_for_tgid(pid_t tgid, struct taskstats *stats) memset(stats, 0, sizeof(*stats)); tsk = first; + start_time = ktime_get_ns(); do { if (tsk->exit_state) continue; @@ -238,6 +242,16 @@ static int fill_stats_for_tgid(pid_t tgid, struct taskstats *stats) */ delayacct_add_tsk(stats, tsk); + /* calculate task elapsed time in nsec */ + delta = start_time - tsk->start_time; + /* Convert to micro seconds */ + do_div(delta, NSEC_PER_USEC); + stats->ac_etime += delta; + + task_cputime(tsk, &utime, &stime); + stats->ac_utime += div_u64(utime, NSEC_PER_USEC); + stats->ac_stime += div_u64(stime, NSEC_PER_USEC); + stats->nvcsw += tsk->nvcsw; stats->nivcsw += tsk->nivcsw; } while_each_thread(first, tsk); -- cgit From f61e869d519c0c11a8d80a503cfdfb4897df855a Mon Sep 17 00:00:00 2001 From: Dmitry Vyukov Date: Mon, 8 May 2017 15:56:48 -0700 Subject: kcov: simplify interrupt check in_interrupt() semantics are confusing and wrong for most users as it also returns true when bh is disabled. Thus we open coded a proper check for interrupts in __sanitizer_cov_trace_pc() with a lengthy explanatory comment. Use the new in_task() predicate instead. Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov Cc: Kefeng Wang Cc: James Morse Cc: Alexander Popov Cc: Andrey Konovalov Cc: Hillf Danton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kcov.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 85e5546cd791..cd771993f96f 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -60,15 +60,8 @@ void notrace __sanitizer_cov_trace_pc(void) /* * We are interested in code coverage as a function of a syscall inputs, * so we ignore code executed in interrupts. - * The checks for whether we are in an interrupt are open-coded, because - * 1. We can't use in_interrupt() here, since it also returns true - * when we are inside local_bh_disable() section. - * 2. We don't want to use (in_irq() | in_serving_softirq() | in_nmi()), - * since that leads to slower generated code (three separate tests, - * one for each of the flags). */ - if (!t || (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET - | NMI_MASK))) + if (!t || !in_task()) return; mode = READ_ONCE(t->kcov_mode); if (mode == KCOV_MODE_TRACE) { -- cgit From f2ad37da805414e9385e7ca2961e1d0431df3799 Mon Sep 17 00:00:00 2001 From: Dmitry Vyukov Date: Mon, 8 May 2017 15:56:51 -0700 Subject: lib/fault-inject.c: use correct check for interrupts in_interrupt() also returns true when bh is disabled in task context. That's not what fail_task() wants to check. Use the new in_task() predicate that does the right thing. Link: http://lkml.kernel.org/r/20170321091805.140676-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov Reviewed-by: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/fault-inject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/fault-inject.c b/lib/fault-inject.c index 6a823a53e357..4ff157159a0d 100644 --- a/lib/fault-inject.c +++ b/lib/fault-inject.c @@ -56,7 +56,7 @@ static void fail_dump(struct fault_attr *attr) static bool fail_task(struct fault_attr *attr, struct task_struct *task) { - return !in_interrupt() && task->make_it_fail; + return in_task() && task->make_it_fail; } #define MAX_STACK_TRACE_DEPTH 32 -- cgit From da5e108b0288d390dae40f51c09bbb30358bf7a7 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Mon, 8 May 2017 15:56:54 -0700 Subject: lib/zlib_inflate/inftrees.c: fix potential buffer overflow smatch says: WARNING: please, no spaces at the start of a line #30: FILE: lib/zlib_inflate/inftrees.c:112: + for (min = 1; min < MAXBITS; min++)$ total: 0 errors, 1 warnings, 8 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/zlib-inflate-fix-potential-buffer-overflow.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/zlib_inflate/inftrees.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/zlib_inflate/inftrees.c b/lib/zlib_inflate/inftrees.c index 3fe6ce5b53e5..028943052926 100644 --- a/lib/zlib_inflate/inftrees.c +++ b/lib/zlib_inflate/inftrees.c @@ -109,7 +109,7 @@ int zlib_inflate_table(codetype type, unsigned short *lens, unsigned codes, *bits = 1; return 0; /* no symbols, but wait for decoding to report error */ } - for (min = 1; min <= MAXBITS; min++) + for (min = 1; min < MAXBITS; min++) if (count[min] != 0) break; if (root < min) root = min; -- cgit From cff75e0b6fe835800f8e08a32d731119cd9e3b79 Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Mon, 8 May 2017 15:56:57 -0700 Subject: initramfs: provide a way to ignore image provided by bootloader Many "embedded" architectures provide CMDLINE_FORCE to allow the kernel to override the command line provided by an inflexible bootloader. However there is currrently no way for the kernel to override the initramfs image provided by the bootloader meaning there are still ways for bootloaders to make things difficult for us. Fix this by introducing INITRAMFS_FORCE which can prevent the kernel from loading the bootloader supplied image. We use CMDLINE_FORCE (and its friend CMDLINE_EXTEND) to imply that the system has an inflexible bootloader. This allow us to avoid presenting this config option to users of systems where inflexible bootloaders aren't usually a problem. Link: http://lkml.kernel.org/r/20170217121940.30126-1-daniel.thompson@linaro.org Signed-off-by: Daniel Thompson Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/initramfs.c | 2 +- usr/Kconfig | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/init/initramfs.c b/init/initramfs.c index 8daf7ac6c7e2..9d180273ee8c 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -613,7 +613,7 @@ static int __init populate_rootfs(void) if (err) panic("%s", err); /* Failed to decompress INTERNAL initramfs */ /* If available load the bootloader supplied initrd */ - if (initrd_start) { + if (initrd_start && !IS_ENABLED(CONFIG_INITRAMFS_FORCE)) { #ifdef CONFIG_BLK_DEV_RAM int fd; printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n"); diff --git a/usr/Kconfig b/usr/Kconfig index 6278f135256d..c0c48507e44e 100644 --- a/usr/Kconfig +++ b/usr/Kconfig @@ -21,6 +21,16 @@ config INITRAMFS_SOURCE If you are not sure, leave it blank. +config INITRAMFS_FORCE + bool "Ignore the initramfs passed by the bootloader" + depends on CMDLINE_EXTEND || CMDLINE_FORCE + help + This option causes the kernel to ignore the initramfs image + (or initrd image) passed to it by the bootloader. This is + analogous to CMDLINE_FORCE, which is found on some architectures, + and is useful if you cannot or don't want to change the image + your bootloader passes to the kernel. + config INITRAMFS_ROOT_UID int "User ID to map to 0 (user root)" depends on INITRAMFS_SOURCE!="" -- cgit From 046aa1265f08d8400bd4b63171238c9daba15ec3 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 8 May 2017 15:57:00 -0700 Subject: initramfs: use vfs_stat/lstat directly sys_newlstat is a system call implementation that is meant for user space, and that copies kernel-internal data structure to the user format, which is not needed for in-kernel users. Further, as we rearrange the system call implementation so we can extend it with 64-bit time_t, the prototype for sys_newlstat changes. This changes the initramfs code to use vfs_lstat directly, to get it out of the way of the time_t changes, and make it slightly more efficient in the process. Along the same lines we also replace sys_stat and sys_stat64 with vfs_stat. Link: http://lkml.kernel.org/r/20170314214932.4052842-1-arnd@arndb.de Signed-off-by: Arnd Bergmann Cc: Alexander Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/do_mounts.h | 22 ++++------------------ init/initramfs.c | 12 ++++++------ 2 files changed, 10 insertions(+), 24 deletions(-) diff --git a/init/do_mounts.h b/init/do_mounts.h index 067af1d9e8b6..282d65bfd674 100644 --- a/init/do_mounts.h +++ b/init/do_mounts.h @@ -19,29 +19,15 @@ static inline int create_dev(char *name, dev_t dev) return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev)); } -#if BITS_PER_LONG == 32 static inline u32 bstat(char *name) { - struct stat64 stat; - if (sys_stat64(name, &stat) != 0) + struct kstat stat; + if (vfs_stat(name, &stat) != 0) return 0; - if (!S_ISBLK(stat.st_mode)) + if (!S_ISBLK(stat.mode)) return 0; - if (stat.st_rdev != (u32)stat.st_rdev) - return 0; - return stat.st_rdev; -} -#else -static inline u32 bstat(char *name) -{ - struct stat stat; - if (sys_newstat(name, &stat) != 0) - return 0; - if (!S_ISBLK(stat.st_mode)) - return 0; - return stat.st_rdev; + return stat.rdev; } -#endif #ifdef CONFIG_BLK_DEV_RAM diff --git a/init/initramfs.c b/init/initramfs.c index 9d180273ee8c..8a532050043f 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -312,10 +312,10 @@ static int __init maybe_link(void) static void __init clean_path(char *path, umode_t fmode) { - struct stat st; + struct kstat st; - if (!sys_newlstat(path, &st) && (st.st_mode ^ fmode) & S_IFMT) { - if (S_ISDIR(st.st_mode)) + if (!vfs_lstat(path, &st) && (st.mode ^ fmode) & S_IFMT) { + if (S_ISDIR(st.mode)) sys_rmdir(path); else sys_unlink(path); @@ -581,13 +581,13 @@ static void __init clean_rootfs(void) num = sys_getdents64(fd, dirp, BUF_SIZE); while (num > 0) { while (num > 0) { - struct stat st; + struct kstat st; int ret; - ret = sys_newlstat(dirp->d_name, &st); + ret = vfs_lstat(dirp->d_name, &st); WARN_ON_ONCE(ret); if (!ret) { - if (S_ISDIR(st.st_mode)) + if (S_ISDIR(st.mode)) sys_rmdir(dirp->d_name); else sys_unlink(dirp->d_name); -- cgit From f0cb88026f51a0d8c952cf5e09e1933ef10b50b4 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 8 May 2017 15:57:03 -0700 Subject: ipc/shm: some shmat cleanups Clean up early flag and address some minutia. Link: http://lkml.kernel.org/r/1486673582-6979-3-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- ipc/shm.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index 481d2a9c298a..34c4344e8d4b 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1095,11 +1095,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr, unsigned long shmlba) { struct shmid_kernel *shp; - unsigned long addr; + unsigned long addr = (unsigned long)shmaddr; unsigned long size; struct file *file; int err; - unsigned long flags; + unsigned long flags = MAP_SHARED; unsigned long prot; int acc_mode; struct ipc_namespace *ns; @@ -1111,7 +1111,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, err = -EINVAL; if (shmid < 0) goto out; - else if ((addr = (ulong)shmaddr)) { + + if (addr) { if (addr & (shmlba - 1)) { /* * Round down to the nearest multiple of shmlba. @@ -1126,13 +1127,10 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, #endif goto out; } - flags = MAP_SHARED | MAP_FIXED; - } else { - if ((shmflg & SHM_REMAP)) - goto out; - flags = MAP_SHARED; - } + flags |= MAP_FIXED; + } else if ((shmflg & SHM_REMAP)) + goto out; if (shmflg & SHM_RDONLY) { prot = PROT_READ; -- cgit From 60f3e00d25b44e3aa51846590d1e10f408466a83 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 8 May 2017 15:57:06 -0700 Subject: sysv,ipc: cacheline align kern_ipc_perm Assign 'struct kern_ipc_perm' its own cacheline to avoid false sharing with sysv ipc calls. While the structure itself is rather read-mostly throughout the lifespan of ipc, the spinlock causes most of the invalidations. One example is commit 31a7c4746e9 ("ipc/sem.c: cacheline align the ipc spinlock for semaphores"). Therefore, extend this to all ipc. The effect of cacheline alignment on sems can be seen in sembench, which deals mostly with semtimedop wait/wakes is seen to improve raw throughput (worker loops) between 8 to 12% on a 24-core x86 with over 4 threads. Link: http://lkml.kernel.org/r/1486673582-6979-4-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/ipc.h | 7 +++---- include/linux/sem.h | 3 +-- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/include/linux/ipc.h b/include/linux/ipc.h index 9d84942ae2e5..71fd92d81b26 100644 --- a/include/linux/ipc.h +++ b/include/linux/ipc.h @@ -8,8 +8,7 @@ #define IPCMNI 32768 /* <= MAX_INT limit for ipc arrays (including sysctl changes) */ /* used by in-kernel data structures */ -struct kern_ipc_perm -{ +struct kern_ipc_perm { spinlock_t lock; bool deleted; int id; @@ -18,9 +17,9 @@ struct kern_ipc_perm kgid_t gid; kuid_t cuid; kgid_t cgid; - umode_t mode; + umode_t mode; unsigned long seq; void *security; -}; +} ____cacheline_aligned_in_smp; #endif /* _LINUX_IPC_H */ diff --git a/include/linux/sem.h b/include/linux/sem.h index 4fc222f8755d..9edec926e9d9 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -10,8 +10,7 @@ struct task_struct; /* One sem_array data structure for each set of semaphores in the system. */ struct sem_array { - struct kern_ipc_perm ____cacheline_aligned_in_smp - sem_perm; /* permissions .. see ipc.h */ + struct kern_ipc_perm sem_perm; /* permissions .. see ipc.h */ time_t sem_ctime; /* last change time */ struct sem *sem_base; /* ptr to first semaphore in array */ struct list_head pending_alter; /* pending operations */ -- cgit From a7c3e901a46ff54c016d040847eda598a9e3e653 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:09 -0700 Subject: mm: introduce kv[mz]alloc helpers Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko Signed-off-by: Stephen Rothwell Reviewed-by: Andreas Dilger [ext4 part] Acked-by: Vlastimil Babka Cc: John Hubbard Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/kvm/lapic.c | 4 ++-- arch/x86/kvm/page_track.c | 4 ++-- arch/x86/kvm/x86.c | 4 ++-- drivers/md/dm-stats.c | 7 +----- fs/ext4/mballoc.c | 2 +- fs/ext4/super.c | 4 ++-- fs/f2fs/f2fs.h | 20 ----------------- fs/f2fs/file.c | 4 ++-- fs/f2fs/node.c | 6 +++--- fs/f2fs/segment.c | 14 ++++++------ fs/seq_file.c | 16 +------------- include/linux/kvm_host.h | 2 -- include/linux/mm.h | 14 ++++++++++++ include/linux/vmalloc.h | 1 + ipc/util.c | 7 +----- mm/nommu.c | 5 +++++ mm/util.c | 45 +++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 9 +++++++- security/apparmor/apparmorfs.c | 2 +- security/apparmor/include/lib.h | 11 ---------- security/apparmor/lib.c | 30 -------------------------- security/apparmor/match.c | 2 +- security/apparmor/policy_unpack.c | 2 +- virt/kvm/kvm_main.c | 18 +++------------- 24 files changed, 103 insertions(+), 130 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index bad6a25067bc..d2a892fc92bf 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -177,8 +177,8 @@ static void recalculate_apic_map(struct kvm *kvm) if (kvm_apic_present(vcpu)) max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic)); - new = kvm_kvzalloc(sizeof(struct kvm_apic_map) + - sizeof(struct kvm_lapic *) * ((u64)max_id + 1)); + new = kvzalloc(sizeof(struct kvm_apic_map) + + sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL); if (!new) goto out; diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c index 60168cdd0546..ea67dc876316 100644 --- a/arch/x86/kvm/page_track.c +++ b/arch/x86/kvm/page_track.c @@ -40,8 +40,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot, int i; for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) { - slot->arch.gfn_track[i] = kvm_kvzalloc(npages * - sizeof(*slot->arch.gfn_track[i])); + slot->arch.gfn_track[i] = kvzalloc(npages * + sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL); if (!slot->arch.gfn_track[i]) goto track_free; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ccbd45ecd41a..ee22226e3807 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8199,13 +8199,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, slot->base_gfn, level) + 1; slot->arch.rmap[i] = - kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i])); + kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL); if (!slot->arch.rmap[i]) goto out_free; if (i == 0) continue; - linfo = kvm_kvzalloc(lpages * sizeof(*linfo)); + linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL); if (!linfo) goto out_free; diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c index 0250e7e521ab..6028d8247f58 100644 --- a/drivers/md/dm-stats.c +++ b/drivers/md/dm-stats.c @@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node) if (!claim_shared_memory(alloc_size)) return NULL; - if (alloc_size <= KMALLOC_MAX_SIZE) { - p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node); - if (p) - return p; - } - p = vzalloc_node(alloc_size, node); + p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node); if (p) return p; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 354dc1a894c2..b60698c104fd 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2393,7 +2393,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups) return 0; size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size); - new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL); + new_groupinfo = kvzalloc(size, GFP_KERNEL); if (!new_groupinfo) { ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group"); return -ENOMEM; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a9c72e39a4ee..b2c74644d5de 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2153,7 +2153,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup) return 0; size = roundup_pow_of_two(size * sizeof(struct flex_groups)); - new_groups = ext4_kvzalloc(size, GFP_KERNEL); + new_groups = kvzalloc(size, GFP_KERNEL); if (!new_groups) { ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups", size / (int) sizeof(struct flex_groups)); @@ -3887,7 +3887,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount; } } - sbi->s_group_desc = ext4_kvmalloc(db_count * + sbi->s_group_desc = kvmalloc(db_count * sizeof(struct buffer_head *), GFP_KERNEL); if (sbi->s_group_desc == NULL) { diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 0a6e115562f6..1fc17a1fc5d0 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2005,26 +2005,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi, return kmalloc(size, flags); } -static inline void *f2fs_kvmalloc(size_t size, gfp_t flags) -{ - void *ret; - - ret = kmalloc(size, flags | __GFP_NOWARN); - if (!ret) - ret = __vmalloc(size, flags, PAGE_KERNEL); - return ret; -} - -static inline void *f2fs_kvzalloc(size_t size, gfp_t flags) -{ - void *ret; - - ret = kzalloc(size, flags | __GFP_NOWARN); - if (!ret) - ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL); - return ret; -} - #define get_inode_mode(i) \ ((is_inode_flag_set(i, FI_ACL_MODE)) ? \ (F2FS_I(i)->i_acl_mode) : ((i)->i_mode)) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 5f7317875a67..0849af78381f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1012,11 +1012,11 @@ static int __exchange_data_block(struct inode *src_inode, while (len) { olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len); - src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL); + src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL); if (!src_blkaddr) return -ENOMEM; - do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL); + do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL); if (!do_replace) { kvfree(src_blkaddr); return -ENOMEM; diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 481aa8dc79f4..0ea1dca8a0e2 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -2621,17 +2621,17 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi) { struct f2fs_nm_info *nm_i = NM_I(sbi); - nm_i->free_nid_bitmap = f2fs_kvzalloc(nm_i->nat_blocks * + nm_i->free_nid_bitmap = kvzalloc(nm_i->nat_blocks * NAT_ENTRY_BITMAP_SIZE, GFP_KERNEL); if (!nm_i->free_nid_bitmap) return -ENOMEM; - nm_i->nat_block_bitmap = f2fs_kvzalloc(nm_i->nat_blocks / 8, + nm_i->nat_block_bitmap = kvzalloc(nm_i->nat_blocks / 8, GFP_KERNEL); if (!nm_i->nat_block_bitmap) return -ENOMEM; - nm_i->free_nid_count = f2fs_kvzalloc(nm_i->nat_blocks * + nm_i->free_nid_count = kvzalloc(nm_i->nat_blocks * sizeof(unsigned short), GFP_KERNEL); if (!nm_i->free_nid_count) return -ENOMEM; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 29ef7088c558..13806f642ab5 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2501,13 +2501,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi) SM_I(sbi)->sit_info = sit_i; - sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) * + sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) * sizeof(struct seg_entry), GFP_KERNEL); if (!sit_i->sentries) return -ENOMEM; bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi)); - sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL); + sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL); if (!sit_i->dirty_sentries_bitmap) return -ENOMEM; @@ -2540,7 +2540,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi) return -ENOMEM; if (sbi->segs_per_sec > 1) { - sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) * + sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) * sizeof(struct sec_entry), GFP_KERNEL); if (!sit_i->sec_entries) return -ENOMEM; @@ -2591,12 +2591,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi) SM_I(sbi)->free_info = free_i; bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi)); - free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL); + free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL); if (!free_i->free_segmap) return -ENOMEM; sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi)); - free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL); + free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL); if (!free_i->free_secmap) return -ENOMEM; @@ -2764,7 +2764,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi) struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi)); - dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL); + dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL); if (!dirty_i->victim_secmap) return -ENOMEM; return 0; @@ -2786,7 +2786,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi) bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi)); for (i = 0; i < NR_DIRTY_TYPE; i++) { - dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL); + dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL); if (!dirty_i->dirty_segmap[i]) return -ENOMEM; } diff --git a/fs/seq_file.c b/fs/seq_file.c index ca69fb99e41a..dc7c2be963ed 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m) static void *seq_buf_alloc(unsigned long size) { - void *buf; - gfp_t gfp = GFP_KERNEL; - - /* - * For high order allocations, use __GFP_NORETRY to avoid oom-killing - - * it's better to fall back to vmalloc() than to kill things. For small - * allocations, just use GFP_KERNEL which will oom kill, thus no need - * for vmalloc fallback. - */ - if (size > PAGE_SIZE) - gfp |= __GFP_NORETRY | __GFP_NOWARN; - buf = kmalloc(size, gfp); - if (!buf && size > PAGE_SIZE) - buf = vmalloc(size); - return buf; + return kvmalloc(size, GFP_KERNEL); } /** diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d0250744507a..5d9b2a08e553 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -767,8 +767,6 @@ void kvm_arch_check_processor_compat(void *rtn); int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); -void *kvm_kvzalloc(unsigned long size); - #ifndef __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d22e69f51ea..08e2849d27ca 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -518,6 +518,20 @@ static inline int is_vmalloc_or_module_addr(const void *x) } #endif +extern void *kvmalloc_node(size_t size, gfp_t flags, int node); +static inline void *kvmalloc(size_t size, gfp_t flags) +{ + return kvmalloc_node(size, flags, NUMA_NO_NODE); +} +static inline void *kvzalloc_node(size_t size, gfp_t flags, int node) +{ + return kvmalloc_node(size, flags | __GFP_ZERO, node); +} +static inline void *kvzalloc(size_t size, gfp_t flags) +{ + return kvmalloc(size, flags | __GFP_ZERO); +} + extern void kvfree(const void *addr); static inline atomic_t *compound_mapcount_ptr(struct page *page) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index d68edffbf142..46991ad3ddd5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, unsigned long start, unsigned long end, gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, int node, const void *caller); +extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags); extern void vfree(const void *addr); extern void vfree_atomic(const void *addr); diff --git a/ipc/util.c b/ipc/util.c index 3459a16a9df9..caec7b1bfaa3 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp) */ void *ipc_alloc(int size) { - void *out; - if (size > PAGE_SIZE) - out = vmalloc(size); - else - out = kmalloc(size, GFP_KERNEL); - return out; + return kvmalloc(size, GFP_KERNEL); } /** diff --git a/mm/nommu.c b/mm/nommu.c index 2d131b97a851..a80411d258fc 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -237,6 +237,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) } EXPORT_SYMBOL(__vmalloc); +void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags) +{ + return __vmalloc(size, flags, PAGE_KERNEL); +} + void *vmalloc_user(unsigned long size) { void *ret; diff --git a/mm/util.c b/mm/util.c index 656dc5e37a87..10a14a0ac3c2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -329,6 +329,51 @@ unsigned long vm_mmap(struct file *file, unsigned long addr, } EXPORT_SYMBOL(vm_mmap); +/** + * kvmalloc_node - attempt to allocate physically contiguous memory, but upon + * failure, fall back to non-contiguous (vmalloc) allocation. + * @size: size of the request. + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL. + * @node: numa node to allocate from + * + * Uses kmalloc to get the memory but if the allocation fails then falls back + * to the vmalloc allocator. Use kvfree for freeing the memory. + * + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported + * + * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people. + */ +void *kvmalloc_node(size_t size, gfp_t flags, int node) +{ + gfp_t kmalloc_flags = flags; + void *ret; + + /* + * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables) + * so the given set of flags has to be compatible. + */ + WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); + + /* + * Make sure that larger requests are not too disruptive - no OOM + * killer and no allocation failure warnings as we have a fallback + */ + if (size > PAGE_SIZE) + kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN; + + ret = kmalloc_node(size, kmalloc_flags, node); + + /* + * It doesn't really make sense to fallback to vmalloc for sub page + * requests + */ + if (ret || size <= PAGE_SIZE) + return ret; + + return __vmalloc_node_flags(size, node, flags | __GFP_HIGHMEM); +} +EXPORT_SYMBOL(kvmalloc_node); + void kvfree(const void *addr) { if (is_vmalloc_addr(addr)) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index b52aeed3f58e..33603239560e 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1786,6 +1786,13 @@ fail: * Allocate enough pages to cover @size from the page level * allocator with @gfp_mask flags. Map them into contiguous * kernel virtual space, using a pagetable protection of @prot. + * + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT + * and __GFP_NOFAIL are not supported + * + * Any use of gfp flags outside of GFP_KERNEL should be consulted + * with mm people. + * */ static void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, pgprot_t prot, @@ -1802,7 +1809,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) } EXPORT_SYMBOL(__vmalloc); -static inline void *__vmalloc_node_flags(unsigned long size, +void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags) { return __vmalloc_node(size, 1, flags, PAGE_KERNEL, diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index 41073f70eb41..be0b49897a67 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -98,7 +98,7 @@ static struct aa_loaddata *aa_simple_write_to_buffer(const char __user *userbuf, return ERR_PTR(-ESPIPE); /* freed by caller to simple_write_to_buffer */ - data = kvmalloc(sizeof(*data) + alloc_size); + data = kvmalloc(sizeof(*data) + alloc_size, GFP_KERNEL); if (data == NULL) return ERR_PTR(-ENOMEM); kref_init(&data->count); diff --git a/security/apparmor/include/lib.h b/security/apparmor/include/lib.h index 0291ff3902f9..550a700563b4 100644 --- a/security/apparmor/include/lib.h +++ b/security/apparmor/include/lib.h @@ -64,17 +64,6 @@ char *aa_split_fqname(char *args, char **ns_name); const char *aa_splitn_fqname(const char *fqname, size_t n, const char **ns_name, size_t *ns_len); void aa_info_message(const char *str); -void *__aa_kvmalloc(size_t size, gfp_t flags); - -static inline void *kvmalloc(size_t size) -{ - return __aa_kvmalloc(size, 0); -} - -static inline void *kvzalloc(size_t size) -{ - return __aa_kvmalloc(size, __GFP_ZERO); -} /** * aa_strneq - compare null terminated @str to a non null terminated substring diff --git a/security/apparmor/lib.c b/security/apparmor/lib.c index 32cafc12593e..7cd788a9445b 100644 --- a/security/apparmor/lib.c +++ b/security/apparmor/lib.c @@ -128,36 +128,6 @@ void aa_info_message(const char *str) printk(KERN_INFO "AppArmor: %s\n", str); } -/** - * __aa_kvmalloc - do allocation preferring kmalloc but falling back to vmalloc - * @size: how many bytes of memory are required - * @flags: the type of memory to allocate (see kmalloc). - * - * Return: allocated buffer or NULL if failed - * - * It is possible that policy being loaded from the user is larger than - * what can be allocated by kmalloc, in those cases fall back to vmalloc. - */ -void *__aa_kvmalloc(size_t size, gfp_t flags) -{ - void *buffer = NULL; - - if (size == 0) - return NULL; - - /* do not attempt kmalloc if we need more than 16 pages at once */ - if (size <= (16*PAGE_SIZE)) - buffer = kmalloc(size, flags | GFP_KERNEL | __GFP_NORETRY | - __GFP_NOWARN); - if (!buffer) { - if (flags & __GFP_ZERO) - buffer = vzalloc(size); - else - buffer = vmalloc(size); - } - return buffer; -} - /** * aa_policy_init - initialize a policy structure * @policy: policy to initialize (NOT NULL) diff --git a/security/apparmor/match.c b/security/apparmor/match.c index eb0efef746f5..960c913381e2 100644 --- a/security/apparmor/match.c +++ b/security/apparmor/match.c @@ -88,7 +88,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize) if (bsize < tsize) goto out; - table = kvzalloc(tsize); + table = kvzalloc(tsize, GFP_KERNEL); if (table) { table->td_id = th.td_id; table->td_flags = th.td_flags; diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c index 2e37c9c26bbd..f3422a91353c 100644 --- a/security/apparmor/policy_unpack.c +++ b/security/apparmor/policy_unpack.c @@ -487,7 +487,7 @@ fail: static void *kvmemdup(const void *src, size_t len) { - void *p = kvmalloc(len); + void *p = kvmalloc(len, GFP_KERNEL); if (p) memcpy(p, src, len); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 88257b311cb5..aca22d36be9c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -504,7 +504,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void) int i; struct kvm_memslots *slots; - slots = kvm_kvzalloc(sizeof(struct kvm_memslots)); + slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!slots) return NULL; @@ -689,18 +689,6 @@ out_err_no_disable: return ERR_PTR(r); } -/* - * Avoid using vmalloc for a small buffer. - * Should not be used when the size is statically known. - */ -void *kvm_kvzalloc(unsigned long size) -{ - if (size > PAGE_SIZE) - return vzalloc(size); - else - return kzalloc(size, GFP_KERNEL); -} - static void kvm_destroy_devices(struct kvm *kvm) { struct kvm_device *dev, *tmp; @@ -782,7 +770,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) { unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot); - memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes); + memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL); if (!memslot->dirty_bitmap) return -ENOMEM; @@ -1008,7 +996,7 @@ int __kvm_set_memory_region(struct kvm *kvm, goto out_free; } - slots = kvm_kvzalloc(sizeof(struct kvm_memslots)); + slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!slots) goto out_free; memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); -- cgit From 1f5307b1e094bfffa83c65c40ac6e3415c108780 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:12 -0700 Subject: mm, vmalloc: properly track vmalloc users __vmalloc_node_flags used to be static inline but this has changed by "mm: introduce kv[mz]alloc helpers" because kvmalloc_node needs to use it as well and the code is outside of the vmalloc proper. I haven't realized that changing this will lead to a subtle bug though. The function is responsible to track the caller as well. This caller is then printed by /proc/vmallocinfo. If __vmalloc_node_flags is not inline then we would get only direct users of __vmalloc_node_flags as callers (e.g. v[mz]alloc) which reduces usefulness of this debugging feature considerably. It simply doesn't help to see that the given range belongs to vmalloc as a caller: 0xffffc90002c79000-0xffffc90002c7d000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N0=3 0xffffc90002c81000-0xffffc90002c85000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c8d000-0xffffc90002c91000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c95000-0xffffc90002c99000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 We really want to catch the _caller_ of the vmalloc function. Fix this issue by making __vmalloc_node_flags static inline again. Link: http://lkml.kernel.org/r/20170502134657.12381-1-mhocko@kernel.org Signed-off-by: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 19 +++++++++++++++++++ mm/vmalloc.c | 12 +----------- 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 46991ad3ddd5..0328ce003992 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -6,6 +6,7 @@ #include #include #include /* pgprot_t */ +#include /* PAGE_KERNEL */ #include struct vm_area_struct; /* vma defining user mapping in mm_types.h */ @@ -80,7 +81,25 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, unsigned long start, unsigned long end, gfp_t gfp_mask, pgprot_t prot, unsigned long vm_flags, int node, const void *caller); +#ifndef CONFIG_MMU extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags); +#else +extern void *__vmalloc_node(unsigned long size, unsigned long align, + gfp_t gfp_mask, pgprot_t prot, + int node, const void *caller); + +/* + * We really want to have this inlined due to caller tracking. This + * function is used by the highlevel vmalloc apis and so we want to track + * their callers and inlining will achieve that. + */ +static inline void *__vmalloc_node_flags(unsigned long size, + int node, gfp_t flags) +{ + return __vmalloc_node(size, 1, flags, PAGE_KERNEL, + node, __builtin_return_address(0)); +} +#endif extern void vfree(const void *addr); extern void vfree_atomic(const void *addr); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 33603239560e..717b1e8b942c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1649,9 +1649,6 @@ void *vmap(struct page **pages, unsigned int count, } EXPORT_SYMBOL(vmap); -static void *__vmalloc_node(unsigned long size, unsigned long align, - gfp_t gfp_mask, pgprot_t prot, - int node, const void *caller); static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot, int node) { @@ -1794,7 +1791,7 @@ fail: * with mm people. * */ -static void *__vmalloc_node(unsigned long size, unsigned long align, +void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, pgprot_t prot, int node, const void *caller) { @@ -1809,13 +1806,6 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) } EXPORT_SYMBOL(__vmalloc); -void *__vmalloc_node_flags(unsigned long size, - int node, gfp_t flags) -{ - return __vmalloc_node(size, 1, flags, PAGE_KERNEL, - node, __builtin_return_address(0)); -} - /** * vmalloc - allocate virtually contiguous memory * @size: allocation size -- cgit From 6c5ab6511f718c3fb19bcc3f78a90b0e0b601675 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:15 -0700 Subject: mm: support __GFP_REPEAT in kvmalloc_node for >32kB vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp. vhost_vsock because it would really like to prefer kmalloc to the vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device allocation to vmalloc") for more context. Michael Tsirkin has also noted: "__GFP_REPEAT overhead is during allocation time. Using vmalloc means all accesses are slowed down. Allocation is not on data path, accesses are." The similar applies to other vhost_kvzalloc users. Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two things to be careful about. First we should prevent from the OOM killer and so have to involve __GFP_NORETRY by default and secondly override __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored for !costly orders. Supporting __GFP_REPEAT like semantic for !costly request is possible it would require changes in the page allocator. This is out of scope of this patch. This patch shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170306103032.2540-3-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Michael S. Tsirkin Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/vhost/net.c | 9 +++------ drivers/vhost/vhost.c | 15 +++------------ drivers/vhost/vsock.c | 9 +++------ mm/util.c | 18 +++++++++++++++--- 4 files changed, 24 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9b519897cc17..f61f852d6cfd 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -817,12 +817,9 @@ static int vhost_net_open(struct inode *inode, struct file *f) struct vhost_virtqueue **vqs; int i; - n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - if (!n) { - n = vmalloc(sizeof *n); - if (!n) - return -ENOMEM; - } + n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT); + if (!n) + return -ENOMEM; vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL); if (!vqs) { kvfree(n); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index f0ba362d4c10..042030e5a035 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -534,18 +534,9 @@ err_mm: } EXPORT_SYMBOL_GPL(vhost_dev_set_owner); -static void *vhost_kvzalloc(unsigned long size) -{ - void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - - if (!n) - n = vzalloc(size); - return n; -} - struct vhost_umem *vhost_dev_reset_owner_prepare(void) { - return vhost_kvzalloc(sizeof(struct vhost_umem)); + return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL); } EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare); @@ -1276,7 +1267,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok); static struct vhost_umem *vhost_umem_alloc(void) { - struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem)); + struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL); if (!umem) return NULL; @@ -1302,7 +1293,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) return -EOPNOTSUPP; if (mem.nregions > max_mem_regions) return -E2BIG; - newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions)); + newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL); if (!newmem) return -ENOMEM; diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index d939ac1a4997..3acef3c5d8ed 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -508,12 +508,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) /* This struct is large and allocation could fail, fall back to vmalloc * if there is no other way. */ - vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - if (!vsock) { - vsock = vmalloc(sizeof(*vsock)); - if (!vsock) - return -ENOMEM; - } + vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT); + if (!vsock) + return -ENOMEM; vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL); if (!vqs) { diff --git a/mm/util.c b/mm/util.c index 10a14a0ac3c2..f4e590b2c0da 100644 --- a/mm/util.c +++ b/mm/util.c @@ -339,7 +339,9 @@ EXPORT_SYMBOL(vm_mmap); * Uses kmalloc to get the memory but if the allocation fails then falls back * to the vmalloc allocator. Use kvfree for freeing the memory. * - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT + * is supported only for large (>32kB) allocations, and it should be used only if + * kmalloc is preferable to the vmalloc fallback, due to visible performance drawbacks. * * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people. */ @@ -358,8 +360,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) * Make sure that larger requests are not too disruptive - no OOM * killer and no allocation failure warnings as we have a fallback */ - if (size > PAGE_SIZE) - kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN; + if (size > PAGE_SIZE) { + kmalloc_flags |= __GFP_NOWARN; + + /* + * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly + * requests because there is no other way to tell the allocator + * that we want to fail rather than retry endlessly. + */ + if (!(kmalloc_flags & __GFP_REPEAT) || + (size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) + kmalloc_flags |= __GFP_NORETRY; + } ret = kmalloc_node(size, kmalloc_flags, node); -- cgit From 43ca5bc4f72ed22e6e20feabdd3eab3c721d98cd Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:18 -0700 Subject: lib/rhashtable.c: simplify a strange allocation pattern alloc_bucket_locks allocation pattern is quite unusual. We are preferring vmalloc when CONFIG_NUMA is enabled. The rationale is that vmalloc will respect the memory policy of the current process and so the backing memory will get distributed over multiple nodes if the requester is configured properly. At least that is the intention, in reality rhastable is shrunk and expanded from a kernel worker so no mempolicy can be assumed. Let's just simplify the code and use kvmalloc helper, which is a transparent way to use kmalloc with vmalloc fallback, if the caller is allowed to block and use the flag otherwise. Link: http://lkml.kernel.org/r/20170306103032.2540-4-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Tom Herbert Cc: Eric Dumazet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/rhashtable.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index a930e436db5d..d9e7274a04cd 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -86,16 +86,9 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl, size = min(size, 1U << tbl->nest); if (sizeof(spinlock_t) != 0) { - tbl->locks = NULL; -#ifdef CONFIG_NUMA - if (size * sizeof(spinlock_t) > PAGE_SIZE && - gfp == GFP_KERNEL) - tbl->locks = vmalloc(size * sizeof(spinlock_t)); -#endif - if (gfp != GFP_KERNEL) - gfp |= __GFP_NOWARN | __GFP_NORETRY; - - if (!tbl->locks) + if (gfpflags_allow_blocking(gfp)) + tbl->locks = kvmalloc(size * sizeof(spinlock_t), gfp); + else tbl->locks = kmalloc_array(size, sizeof(spinlock_t), gfp); if (!tbl->locks) -- cgit From 847f716f9ec2c61f57690c871a307f1349d472d0 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:21 -0700 Subject: net/ipv6/ila/ila_xlat.c: simplify a strange allocation pattern alloc_ila_locks seemed to c&p from alloc_bucket_locks allocation pattern which is quite unusual. The default allocation size is 320 * sizeof(spinlock_t) which is sub page unless lockdep is enabled when the performance benefit is really questionable and not worth the subtle code IMHO. Also note that the context when we call ila_init_net (modprobe or a task creating a net namespace) has to be properly configured. Let's just simplify the code and use kvmalloc helper which is a transparent way to use kmalloc with vmalloc fallback. Link: http://lkml.kernel.org/r/20170306103032.2540-5-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Tom Herbert Cc: Eric Dumazet Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- net/ipv6/ila/ila_xlat.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c index af8f52ee7180..2fd5ca151dcf 100644 --- a/net/ipv6/ila/ila_xlat.c +++ b/net/ipv6/ila/ila_xlat.c @@ -41,13 +41,7 @@ static int alloc_ila_locks(struct ila_net *ilan) size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU); if (sizeof(spinlock_t) != 0) { -#ifdef CONFIG_NUMA - if (size * sizeof(spinlock_t) > PAGE_SIZE) - ilan->locks = vmalloc(size * sizeof(spinlock_t)); - else -#endif - ilan->locks = kmalloc_array(size, sizeof(spinlock_t), - GFP_KERNEL); + ilan->locks = kvmalloc(size * sizeof(spinlock_t), GFP_KERNEL); if (!ilan->locks) return -ENOMEM; for (i = 0; i < size; i++) -- cgit From 81be3dee96346fbe08c31be5ef74f03f6b63cf68 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:24 -0700 Subject: fs/xattr.c: zero out memory copied to userspace in getxattr getxattr uses vmalloc to allocate memory if kzalloc fails. This is filled by vfs_getxattr and then copied to the userspace. vmalloc, however, doesn't zero out the memory so if the specific implementation of the xattr handler is sloppy we can theoretically expose a kernel memory. There is no real sign this is really the case but let's make sure this will not happen and use vzalloc instead. Fixes: 779302e67835 ("fs/xattr.c:getxattr(): improve handling of allocation failures") Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.org Acked-by: Kees Cook Reported-by: Vlastimil Babka Signed-off-by: Michal Hocko Cc: [3.6+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xattr.c b/fs/xattr.c index 7e3317cf4045..94f49a082dd2 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -530,7 +530,7 @@ getxattr(struct dentry *d, const char __user *name, void __user *value, size = XATTR_SIZE_MAX; kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); if (!kvalue) { - kvalue = vmalloc(size); + kvalue = vzalloc(size); if (!kvalue) return -ENOMEM; } -- cgit From 752ade68cbd81d0321dfecc188f655a945551b25 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:27 -0700 Subject: treewide: use kv[mz]alloc* rather than opencoded variants There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Boris Ostrovsky # Xen bits Acked-by: Kees Cook Acked-by: Vlastimil Babka Acked-by: Andreas Dilger # Lustre Acked-by: Christian Borntraeger # KVM/s390 Acked-by: Dan Williams # nvdim Acked-by: David Sterba # btrfs Acked-by: Ilya Dryomov # Ceph Acked-by: Tariq Toukan # mlx4 Acked-by: Leon Romanovsky # mlx5 Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Herbert Xu Cc: Anton Vorontsov Cc: Colin Cross Cc: Tony Luck Cc: "Rafael J. Wysocki" Cc: Ben Skeggs Cc: Kent Overstreet Cc: Santosh Raspatur Cc: Hariprasad S Cc: Yishai Hadas Cc: Oleg Drokin Cc: "Yan, Zheng" Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Eric Dumazet Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/s390/kvm/kvm-s390.c | 10 ++----- crypto/lzo.c | 4 +-- drivers/acpi/apei/erst.c | 8 ++---- drivers/char/agp/generic.c | 8 +----- drivers/gpu/drm/nouveau/nouveau_gem.c | 4 +-- drivers/md/bcache/util.h | 12 ++------ drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h | 3 -- drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 29 +++---------------- drivers/net/ethernet/chelsio/cxgb3/l2t.c | 8 +----- drivers/net/ethernet/chelsio/cxgb3/l2t.h | 1 - drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c | 12 ++++---- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 3 -- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 10 +++---- drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c | 8 +++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 31 ++++---------------- drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c | 14 ++++----- drivers/net/ethernet/chelsio/cxgb4/l2t.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/sched.c | 12 ++++---- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 9 ++---- drivers/net/ethernet/mellanox/mlx4/mr.c | 9 ++---- drivers/nvdimm/dimm_devs.c | 5 +--- .../staging/lustre/lnet/libcfs/linux/linux-mem.c | 11 +------- drivers/xen/evtchn.c | 14 +-------- fs/btrfs/ctree.c | 9 ++---- fs/btrfs/ioctl.c | 9 ++---- fs/btrfs/send.c | 27 ++++++------------ fs/ceph/file.c | 9 ++---- fs/select.c | 5 +--- fs/xattr.c | 27 ++++++------------ include/linux/mlx5/driver.h | 7 +---- include/linux/mm.h | 8 ++++++ lib/iov_iter.c | 5 +--- mm/frame_vector.c | 5 +--- net/ipv4/inet_hashtables.c | 6 +--- net/ipv4/tcp_metrics.c | 5 +--- net/mpls/af_mpls.c | 5 +--- net/netfilter/x_tables.c | 21 +++----------- net/netfilter/xt_recent.c | 5 +--- net/sched/sch_choke.c | 5 +--- net/sched/sch_fq_codel.c | 26 ++++------------- net/sched/sch_hhf.c | 33 ++++++---------------- net/sched/sch_netem.c | 6 +--- net/sched/sch_sfq.c | 6 +--- security/keys/keyctl.c | 22 ++++----------- 44 files changed, 128 insertions(+), 350 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index d5c5c911821a..323297e55e80 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -1166,10 +1166,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; - keys = kmalloc_array(args->count, sizeof(uint8_t), - GFP_KERNEL | __GFP_NOWARN); - if (!keys) - keys = vmalloc(sizeof(uint8_t) * args->count); + keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL); if (!keys) return -ENOMEM; @@ -1211,10 +1208,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args) if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX) return -EINVAL; - keys = kmalloc_array(args->count, sizeof(uint8_t), - GFP_KERNEL | __GFP_NOWARN); - if (!keys) - keys = vmalloc(sizeof(uint8_t) * args->count); + keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL); if (!keys) return -ENOMEM; diff --git a/crypto/lzo.c b/crypto/lzo.c index 168df784da84..218567d717d6 100644 --- a/crypto/lzo.c +++ b/crypto/lzo.c @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm) { void *ctx; - ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN); - if (!ctx) - ctx = vmalloc(LZO1X_MEM_COMPRESS); + ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL); if (!ctx) return ERR_PTR(-ENOMEM); diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index 7207e5fc9d3d..2c462beee551 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -513,7 +513,7 @@ retry: if (i < erst_record_id_cache.len) goto retry; if (erst_record_id_cache.len >= erst_record_id_cache.size) { - int new_size, alloc_size; + int new_size; u64 *new_entries; new_size = erst_record_id_cache.size * 2; @@ -524,11 +524,7 @@ retry: pr_warn(FW_WARN "too many record IDs!\n"); return 0; } - alloc_size = new_size * sizeof(entries[0]); - if (alloc_size < PAGE_SIZE) - new_entries = kmalloc(alloc_size, GFP_KERNEL); - else - new_entries = vmalloc(alloc_size); + new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL); if (!new_entries) return -ENOMEM; memcpy(new_entries, entries, diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index f002fa5d1887..bdf418cac8ef 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -88,13 +88,7 @@ static int agp_get_key(void) void agp_alloc_page_array(size_t size, struct agp_memory *mem) { - mem->pages = NULL; - - if (size <= 2*PAGE_SIZE) - mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (mem->pages == NULL) { - mem->pages = vmalloc(size); - } + mem->pages = kvmalloc(size, GFP_KERNEL); } EXPORT_SYMBOL(agp_alloc_page_array); diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index ca5397beb357..2170534101ca 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size) size *= nmemb; - mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (!mem) - mem = vmalloc(size); + mem = kvmalloc(size, GFP_KERNEL); if (!mem) return ERR_PTR(-ENOMEM); diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h index 5d13930f0f22..cb8d2ccbb6c6 100644 --- a/drivers/md/bcache/util.h +++ b/drivers/md/bcache/util.h @@ -43,11 +43,7 @@ struct closure; (heap)->used = 0; \ (heap)->size = (_size); \ _bytes = (heap)->size * sizeof(*(heap)->data); \ - (heap)->data = NULL; \ - if (_bytes < KMALLOC_MAX_SIZE) \ - (heap)->data = kmalloc(_bytes, (gfp)); \ - if ((!(heap)->data) && ((gfp) & GFP_KERNEL)) \ - (heap)->data = vmalloc(_bytes); \ + (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL); \ (heap)->data; \ }) @@ -136,12 +132,8 @@ do { \ \ (fifo)->mask = _allocated_size - 1; \ (fifo)->front = (fifo)->back = 0; \ - (fifo)->data = NULL; \ \ - if (_bytes < KMALLOC_MAX_SIZE) \ - (fifo)->data = kmalloc(_bytes, (gfp)); \ - if ((!(fifo)->data) && ((gfp) & GFP_KERNEL)) \ - (fifo)->data = vmalloc(_bytes); \ + (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL); \ (fifo)->data; \ }) diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h index 920d918ed193..f04e81f33795 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h @@ -41,9 +41,6 @@ #define VALIDATE_TID 1 -void *cxgb_alloc_mem(unsigned long size); -void cxgb_free_mem(void *addr); - /* * Map an ATID or STID to their entries in the corresponding TID tables. */ diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c index 76684dcb874c..fa81445e334c 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c @@ -1151,27 +1151,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new, l2t_release(tdev, e); } -/* - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc. - * The allocated memory is cleared. - */ -void *cxgb_alloc_mem(unsigned long size) -{ - void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); - - if (!p) - p = vzalloc(size); - return p; -} - -/* - * Free memory allocated through t3_alloc_mem(). - */ -void cxgb_free_mem(void *addr) -{ - kvfree(addr); -} - /* * Allocate and initialize the TID tables. Returns 0 on success. */ @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids, unsigned long size = ntids * sizeof(*t->tid_tab) + natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab); - t->tid_tab = cxgb_alloc_mem(size); + t->tid_tab = kvzalloc(size, GFP_KERNEL); if (!t->tid_tab) return -ENOMEM; @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids, static void free_tid_maps(struct tid_info *t) { - cxgb_free_mem(t->tid_tab); + kvfree(t->tid_tab); } static inline void add_adapter(struct adapter *adap) @@ -1293,7 +1272,7 @@ int cxgb3_offload_activate(struct adapter *adapter) return 0; out_free_l2t: - t3_free_l2t(l2td); + kvfree(l2td); out_free: kfree(t); return err; @@ -1302,7 +1281,7 @@ out_free: static void clean_l2_data(struct rcu_head *head) { struct l2t_data *d = container_of(head, struct l2t_data, rcu_head); - t3_free_l2t(d); + kvfree(d); } diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c index 52063587e1e9..26264125865f 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity) struct l2t_data *d; int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry); - d = cxgb_alloc_mem(size); + d = kvzalloc(size, GFP_KERNEL); if (!d) return NULL; @@ -462,9 +462,3 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity) } return d; } - -void t3_free_l2t(struct l2t_data *d) -{ - cxgb_free_mem(d); -} - diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.h b/drivers/net/ethernet/chelsio/cxgb3/l2t.h index 8cffcdfd5678..c2fd323c4078 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.h +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.h @@ -115,7 +115,6 @@ int t3_l2t_send_slow(struct t3cdev *dev, struct sk_buff *skb, struct l2t_entry *e); void t3_l2t_send_event(struct t3cdev *dev, struct l2t_entry *e); struct l2t_data *t3_init_l2t(unsigned int l2t_capacity); -void t3_free_l2t(struct l2t_data *d); int cxgb3_ofld_send(struct t3cdev *dev, struct sk_buff *skb); diff --git a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c index 7ad43af6bde1..3103ef9b561d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c +++ b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c @@ -290,8 +290,8 @@ struct clip_tbl *t4_init_clip_tbl(unsigned int clipt_start, if (clipt_size < CLIPT_MIN_HASH_BUCKETS) return NULL; - ctbl = t4_alloc_mem(sizeof(*ctbl) + - clipt_size*sizeof(struct list_head)); + ctbl = kvzalloc(sizeof(*ctbl) + + clipt_size*sizeof(struct list_head), GFP_KERNEL); if (!ctbl) return NULL; @@ -305,9 +305,9 @@ struct clip_tbl *t4_init_clip_tbl(unsigned int clipt_start, for (i = 0; i < ctbl->clipt_size; ++i) INIT_LIST_HEAD(&ctbl->hash_list[i]); - cl_list = t4_alloc_mem(clipt_size*sizeof(struct clip_entry)); + cl_list = kvzalloc(clipt_size*sizeof(struct clip_entry), GFP_KERNEL); if (!cl_list) { - t4_free_mem(ctbl); + kvfree(ctbl); return NULL; } ctbl->cl_list = (void *)cl_list; @@ -326,8 +326,8 @@ void t4_cleanup_clip_tbl(struct adapter *adap) if (ctbl) { if (ctbl->cl_list) - t4_free_mem(ctbl->cl_list); - t4_free_mem(ctbl); + kvfree(ctbl->cl_list); + kvfree(ctbl); } } EXPORT_SYMBOL(t4_cleanup_clip_tbl); diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 163543b1ea0b..1d2be2dd19dd 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -1184,8 +1184,6 @@ extern const char cxgb4_driver_version[]; void t4_os_portmod_changed(const struct adapter *adap, int port_id); void t4_os_link_changed(struct adapter *adap, int port_id, int link_stat); -void *t4_alloc_mem(size_t size); - void t4_free_sge_resources(struct adapter *adap); void t4_free_ofld_rxqs(struct adapter *adap, int n, struct sge_ofld_rxq *q); irq_handler_t t4_intr_handler(struct adapter *adap); @@ -1557,7 +1555,6 @@ int t4_sched_params(struct adapter *adapter, int type, int level, int mode, int rateunit, int ratemode, int channel, int class, int minrate, int maxrate, int weight, int pktsize); void t4_sge_decode_idma_state(struct adapter *adapter, int state); -void t4_free_mem(void *addr); void t4_idma_monitor_init(struct adapter *adapter, struct sge_idma_monitor_state *idma); void t4_idma_monitor(struct adapter *adapter, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index f6e739da7bb7..1fa34b009891 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -2634,7 +2634,7 @@ static ssize_t mem_read(struct file *file, char __user *buf, size_t count, if (count > avail - pos) count = avail - pos; - data = t4_alloc_mem(count); + data = kvzalloc(count, GFP_KERNEL); if (!data) return -ENOMEM; @@ -2642,12 +2642,12 @@ static ssize_t mem_read(struct file *file, char __user *buf, size_t count, ret = t4_memory_rw(adap, 0, mem, pos, count, data, T4_MEMORY_READ); spin_unlock(&adap->win0_lock); if (ret) { - t4_free_mem(data); + kvfree(data); return ret; } ret = copy_to_user(buf, data, count); - t4_free_mem(data); + kvfree(data); if (ret) return -EFAULT; @@ -2753,7 +2753,7 @@ static ssize_t blocked_fl_read(struct file *filp, char __user *ubuf, adap->sge.egr_sz, adap->sge.blocked_fl); len += sprintf(buf + len, "\n"); size = simple_read_from_buffer(ubuf, count, ppos, buf, len); - t4_free_mem(buf); + kvfree(buf); return size; } @@ -2773,7 +2773,7 @@ static ssize_t blocked_fl_write(struct file *filp, const char __user *ubuf, return err; bitmap_copy(adap->sge.blocked_fl, t, adap->sge.egr_sz); - t4_free_mem(t); + kvfree(t); return count; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c index 02f80febeb91..0ba7866c8259 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c @@ -969,7 +969,7 @@ static int get_eeprom(struct net_device *dev, struct ethtool_eeprom *e, { int i, err = 0; struct adapter *adapter = netdev2adap(dev); - u8 *buf = t4_alloc_mem(EEPROMSIZE); + u8 *buf = kvzalloc(EEPROMSIZE, GFP_KERNEL); if (!buf) return -ENOMEM; @@ -980,7 +980,7 @@ static int get_eeprom(struct net_device *dev, struct ethtool_eeprom *e, if (!err) memcpy(data, buf + e->offset, e->len); - t4_free_mem(buf); + kvfree(buf); return err; } @@ -1009,7 +1009,7 @@ static int set_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, if (aligned_offset != eeprom->offset || aligned_len != eeprom->len) { /* RMW possibly needed for first or last words. */ - buf = t4_alloc_mem(aligned_len); + buf = kvzalloc(aligned_len, GFP_KERNEL); if (!buf) return -ENOMEM; err = eeprom_rd_phys(adapter, aligned_offset, (u32 *)buf); @@ -1037,7 +1037,7 @@ static int set_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, err = t4_seeprom_wp(adapter, true); out: if (buf != data) - t4_free_mem(buf); + kvfree(buf); return err; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index c12c4a3b82b5..38a5c6764bb5 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -880,27 +880,6 @@ freeout: return err; } -/* - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc. - * The allocated memory is cleared. - */ -void *t4_alloc_mem(size_t size) -{ - void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); - - if (!p) - p = vzalloc(size); - return p; -} - -/* - * Free memory allocated through alloc_mem(). - */ -void t4_free_mem(void *addr) -{ - kvfree(addr); -} - static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb, void *accel_priv, select_queue_fallback_t fallback) { @@ -1299,7 +1278,7 @@ static int tid_init(struct tid_info *t) max_ftids * sizeof(*t->ftid_tab) + ftid_bmap_size * sizeof(long); - t->tid_tab = t4_alloc_mem(size); + t->tid_tab = kvzalloc(size, GFP_KERNEL); if (!t->tid_tab) return -ENOMEM; @@ -3445,7 +3424,7 @@ static int adap_init0(struct adapter *adap) /* allocate memory to read the header of the firmware on the * card */ - card_fw = t4_alloc_mem(sizeof(*card_fw)); + card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL); /* Get FW from from /lib/firmware/ */ ret = request_firmware(&fw, fw_info->fw_mod_name, @@ -3465,7 +3444,7 @@ static int adap_init0(struct adapter *adap) /* Cleaning up */ release_firmware(fw); - t4_free_mem(card_fw); + kvfree(card_fw); if (ret < 0) goto bye; @@ -4470,9 +4449,9 @@ static void free_some_resources(struct adapter *adapter) { unsigned int i; - t4_free_mem(adapter->l2t); + kvfree(adapter->l2t); t4_cleanup_sched(adapter); - t4_free_mem(adapter->tids.tid_tab); + kvfree(adapter->tids.tid_tab); cxgb4_cleanup_tc_u32(adapter); kfree(adapter->sge.egr_map); kfree(adapter->sge.ingr_map); diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c index a1b19422b339..ef06ce8247ab 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c @@ -432,9 +432,9 @@ void cxgb4_cleanup_tc_u32(struct adapter *adap) for (i = 0; i < t->size; i++) { struct cxgb4_link *link = &t->table[i]; - t4_free_mem(link->tid_map); + kvfree(link->tid_map); } - t4_free_mem(adap->tc_u32); + kvfree(adap->tc_u32); } struct cxgb4_tc_u32_table *cxgb4_init_tc_u32(struct adapter *adap) @@ -446,8 +446,8 @@ struct cxgb4_tc_u32_table *cxgb4_init_tc_u32(struct adapter *adap) if (!max_tids) return NULL; - t = t4_alloc_mem(sizeof(*t) + - (max_tids * sizeof(struct cxgb4_link))); + t = kvzalloc(sizeof(*t) + + (max_tids * sizeof(struct cxgb4_link)), GFP_KERNEL); if (!t) return NULL; @@ -458,7 +458,7 @@ struct cxgb4_tc_u32_table *cxgb4_init_tc_u32(struct adapter *adap) unsigned int bmap_size; bmap_size = BITS_TO_LONGS(max_tids); - link->tid_map = t4_alloc_mem(sizeof(unsigned long) * bmap_size); + link->tid_map = kvzalloc(sizeof(unsigned long) * bmap_size, GFP_KERNEL); if (!link->tid_map) goto out_no_mem; bitmap_zero(link->tid_map, max_tids); @@ -471,11 +471,11 @@ out_no_mem: struct cxgb4_link *link = &t->table[i]; if (link->tid_map) - t4_free_mem(link->tid_map); + kvfree(link->tid_map); } if (t) - t4_free_mem(t); + kvfree(t); return NULL; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/l2t.c b/drivers/net/ethernet/chelsio/cxgb4/l2t.c index 7c8c5b9a3c22..6f3692db29af 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/l2t.c +++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.c @@ -646,7 +646,7 @@ struct l2t_data *t4_init_l2t(unsigned int l2t_start, unsigned int l2t_end) if (l2t_size < L2T_MIN_HASH_BUCKETS) return NULL; - d = t4_alloc_mem(sizeof(*d) + l2t_size * sizeof(struct l2t_entry)); + d = kvzalloc(sizeof(*d) + l2t_size * sizeof(struct l2t_entry), GFP_KERNEL); if (!d) return NULL; diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c index c9026352a842..02acff741f11 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sched.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c @@ -177,7 +177,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue *p) } list_del(&qe->list); - t4_free_mem(qe); + kvfree(qe); if (atomic_dec_and_test(&e->refcnt)) { e->state = SCHED_STATE_UNUSED; memset(&e->info, 0, sizeof(e->info)); @@ -201,7 +201,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p) if (p->queue < 0 || p->queue >= pi->nqsets) return -ERANGE; - qe = t4_alloc_mem(sizeof(struct sched_queue_entry)); + qe = kvzalloc(sizeof(struct sched_queue_entry), GFP_KERNEL); if (!qe) return -ENOMEM; @@ -211,7 +211,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p) /* Unbind queue from any existing class */ err = t4_sched_queue_unbind(pi, p); if (err) { - t4_free_mem(qe); + kvfree(qe); goto out; } @@ -224,7 +224,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p) spin_lock(&e->lock); err = t4_sched_bind_unbind_op(pi, (void *)qe, SCHED_QUEUE, true); if (err) { - t4_free_mem(qe); + kvfree(qe); spin_unlock(&e->lock); goto out; } @@ -512,7 +512,7 @@ struct sched_table *t4_init_sched(unsigned int sched_size) struct sched_table *s; unsigned int i; - s = t4_alloc_mem(sizeof(*s) + sched_size * sizeof(struct sched_class)); + s = kvzalloc(sizeof(*s) + sched_size * sizeof(struct sched_class), GFP_KERNEL); if (!s) return NULL; @@ -548,6 +548,6 @@ void t4_cleanup_sched(struct adapter *adap) t4_sched_class_free(pi, e); write_unlock(&s->rw_lock); } - t4_free_mem(s); + kvfree(s); } } diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 3ba89bc43d74..6ffd1849a604 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv, ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS; tmp = size * sizeof(struct mlx4_en_tx_info); - ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node); + ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node); if (!ring->tx_info) { - ring->tx_info = vmalloc(tmp); - if (!ring->tx_info) { - err = -ENOMEM; - goto err_ring; - } + err = -ENOMEM; + goto err_ring; } en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n", diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index db65f72879e9..ce852ca22a96 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order) for (i = 0; i <= buddy->max_order; ++i) { s = BITS_TO_LONGS(1 << (buddy->max_order - i)); - buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN); - if (!buddy->bits[i]) { - buddy->bits[i] = vzalloc(s * sizeof(long)); - if (!buddy->bits[i]) - goto err_out_free; - } + buddy->bits[i] = kvmalloc_array(s, sizeof(long), GFP_KERNEL | __GFP_ZERO); + if (!buddy->bits[i]) + goto err_out_free; } set_bit(0, buddy->bits[buddy->max_order]); diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index fac1e9fbd11d..9852a3355509 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -106,10 +106,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd) return -ENXIO; } - ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL); - if (!ndd->data) - ndd->data = vmalloc(ndd->nsarea.config_size); - + ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL); if (!ndd->data) return -ENOMEM; diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c index a6a76a681ea9..8f638267e704 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc); void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size, gfp_t flags) { - void *ret; - - ret = kzalloc_node(size, flags | __GFP_NOWARN, - cfs_cpt_spread_node(cptab, cpt)); - if (!ret) { - WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH))); - ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt)); - } - - return ret; + return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt)); } EXPORT_SYMBOL(libcfs_kvzalloc_cpt); diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index 6890897a6f30..10f1ef582659 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -87,18 +87,6 @@ struct user_evtchn { bool enabled; }; -static evtchn_port_t *evtchn_alloc_ring(unsigned int size) -{ - evtchn_port_t *ring; - size_t s = size * sizeof(*ring); - - ring = kmalloc(s, GFP_KERNEL); - if (!ring) - ring = vmalloc(s); - - return ring; -} - static void evtchn_free_ring(evtchn_port_t *ring) { kvfree(ring); @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u) else new_size = 2 * u->ring_size; - new_ring = evtchn_alloc_ring(new_size); + new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL); if (!new_ring) return -ENOMEM; diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 7dc8844037e0..1c3b6c54d5ee 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -5392,13 +5392,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root, goto out; } - tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN); + tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL); if (!tmp_buf) { - tmp_buf = vmalloc(fs_info->nodesize); - if (!tmp_buf) { - ret = -ENOMEM; - goto out; - } + ret = -ENOMEM; + goto out; } left_path->search_commit_root = 1; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index dabfc7ac48a6..922a66fce401 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3539,12 +3539,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode, u64 last_dest_end = destoff; ret = -ENOMEM; - buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN); - if (!buf) { - buf = vmalloc(fs_info->nodesize); - if (!buf) - return ret; - } + buf = kvmalloc(fs_info->nodesize, GFP_KERNEL); + if (!buf) + return ret; path = btrfs_alloc_path(); if (!path) { diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index a60d5bfb8a49..3f645cd67b54 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -6360,22 +6360,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) sctx->clone_roots_cnt = arg->clone_sources_count; sctx->send_max_size = BTRFS_SEND_BUF_SIZE; - sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN); + sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); if (!sctx->send_buf) { - sctx->send_buf = vmalloc(sctx->send_max_size); - if (!sctx->send_buf) { - ret = -ENOMEM; - goto out; - } + ret = -ENOMEM; + goto out; } - sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN); + sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL); if (!sctx->read_buf) { - sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE); - if (!sctx->read_buf) { - ret = -ENOMEM; - goto out; - } + ret = -ENOMEM; + goto out; } sctx->pending_dir_moves = RB_ROOT; @@ -6396,13 +6390,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources); if (arg->clone_sources_count) { - clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN); + clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL); if (!clone_sources_tmp) { - clone_sources_tmp = vmalloc(alloc_size); - if (!clone_sources_tmp) { - ret = -ENOMEM; - goto out; - } + ret = -ENOMEM; + goto out; } ret = copy_from_user(clone_sources_tmp, arg->clone_sources, diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 26cc95421cca..18c045e2ead6 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes, align = (unsigned long)(it->iov->iov_base + it->iov_offset) & (PAGE_SIZE - 1); npages = calc_pages_for(align, nbytes); - pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL); - if (!pages) { - pages = vmalloc(sizeof(*pages) * npages); - if (!pages) - return ERR_PTR(-ENOMEM); - } + pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL); + if (!pages) + return ERR_PTR(-ENOMEM); for (idx = 0; idx < npages; ) { size_t start; diff --git a/fs/select.c b/fs/select.c index bd4b2ccfd346..d6c652a31e99 100644 --- a/fs/select.c +++ b/fs/select.c @@ -633,10 +633,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, goto out_nofds; alloc_size = 6 * size; - bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN); - if (!bits && alloc_size > PAGE_SIZE) - bits = vmalloc(alloc_size); - + bits = kvmalloc(alloc_size, GFP_KERNEL); if (!bits) goto out_nofds; } diff --git a/fs/xattr.c b/fs/xattr.c index 94f49a082dd2..464c94bf65f9 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value, if (size) { if (size > XATTR_SIZE_MAX) return -E2BIG; - kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (!kvalue) { - kvalue = vmalloc(size); - if (!kvalue) - return -ENOMEM; - } + kvalue = kvmalloc(size, GFP_KERNEL); + if (!kvalue) + return -ENOMEM; if (copy_from_user(kvalue, value, size)) { error = -EFAULT; goto out; @@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value, if (size) { if (size > XATTR_SIZE_MAX) size = XATTR_SIZE_MAX; - kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (!kvalue) { - kvalue = vzalloc(size); - if (!kvalue) - return -ENOMEM; - } + kvalue = kvzalloc(size, GFP_KERNEL); + if (!kvalue) + return -ENOMEM; } error = vfs_getxattr(d, kname, kvalue, size); @@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size) if (size) { if (size > XATTR_LIST_MAX) size = XATTR_LIST_MAX; - klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); - if (!klist) { - klist = vmalloc(size); - if (!klist) - return -ENOMEM; - } + klist = kvmalloc(size, GFP_KERNEL); + if (!klist) + return -ENOMEM; } error = vfs_listxattr(d, klist, size); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 3fece51dcf13..18fc65b84b79 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -892,12 +892,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev) static inline void *mlx5_vzalloc(unsigned long size) { - void *rtn; - - rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (!rtn) - rtn = vzalloc(size); - return rtn; + return kvzalloc(size, GFP_KERNEL); } static inline u32 mlx5_base_mkey(const u32 key) diff --git a/include/linux/mm.h b/include/linux/mm.h index 08e2849d27ca..7cb17c6b97de 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -532,6 +532,14 @@ static inline void *kvzalloc(size_t size, gfp_t flags) return kvmalloc(size, flags | __GFP_ZERO); } +static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags) +{ + if (size != 0 && n > SIZE_MAX / size) + return NULL; + + return kvmalloc(n * size, flags); +} + extern void kvfree(const void *addr); static inline atomic_t *compound_mapcount_ptr(struct page *page) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 4952311422c1..ae82d9cea553 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1028,10 +1028,7 @@ EXPORT_SYMBOL(iov_iter_get_pages); static struct page **get_pages_array(size_t n) { - struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL); - if (!p) - p = vmalloc(n * sizeof(struct page *)); - return p; + return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL); } static ssize_t pipe_get_pages_alloc(struct iov_iter *i, diff --git a/mm/frame_vector.c b/mm/frame_vector.c index db77dcb38afd..72ebec18629c 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames) * Avoid higher order allocations, use vmalloc instead. It should * be rare anyway. */ - if (size <= PAGE_SIZE) - vec = kmalloc(size, GFP_KERNEL); - else - vec = vmalloc(size); + vec = kvmalloc(size, GFP_KERNEL); if (!vec) return NULL; vec->nr_allocated = nr_frames; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 8bea74298173..e9a59d2d91d4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -678,11 +678,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) /* no more locks than number of hash buckets */ nblocks = min(nblocks, hashinfo->ehash_mask + 1); - hashinfo->ehash_locks = kmalloc_array(nblocks, locksz, - GFP_KERNEL | __GFP_NOWARN); - if (!hashinfo->ehash_locks) - hashinfo->ehash_locks = vmalloc(nblocks * locksz); - + hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL); if (!hashinfo->ehash_locks) return -ENOMEM; diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 9d0d4f39e42b..653bbd67e3a3 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -1011,10 +1011,7 @@ static int __net_init tcp_net_metrics_init(struct net *net) tcp_metrics_hash_log = order_base_2(slots); size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log; - tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); - if (!tcp_metrics_hash) - tcp_metrics_hash = vzalloc(size); - + tcp_metrics_hash = kvzalloc(size, GFP_KERNEL); if (!tcp_metrics_hash) return -ENOMEM; diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 088e2b459d0f..257ec66009da 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -2005,10 +2005,7 @@ static int resize_platform_label_table(struct net *net, size_t limit) unsigned index; if (size) { - labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY); - if (!labels) - labels = vzalloc(size); - + labels = kvzalloc(size, GFP_KERNEL); if (!labels) goto nolabels; } diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index f134d384852f..3d0584665b5d 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -763,17 +763,8 @@ EXPORT_SYMBOL(xt_check_entry_offsets); */ unsigned int *xt_alloc_entry_offsets(unsigned int size) { - unsigned int *off; + return kvmalloc_array(size, sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO); - off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN); - - if (off) - return off; - - if (size < (SIZE_MAX / sizeof(unsigned int))) - off = vmalloc(size * sizeof(unsigned int)); - - return off; } EXPORT_SYMBOL(xt_alloc_entry_offsets); @@ -1116,7 +1107,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i) size = sizeof(void **) * nr_cpu_ids; if (size > PAGE_SIZE) - i->jumpstack = vzalloc(size); + i->jumpstack = kvzalloc(size, GFP_KERNEL); else i->jumpstack = kzalloc(size, GFP_KERNEL); if (i->jumpstack == NULL) @@ -1138,12 +1129,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i) */ size = sizeof(void *) * i->stacksize * 2u; for_each_possible_cpu(cpu) { - if (size > PAGE_SIZE) - i->jumpstack[cpu] = vmalloc_node(size, - cpu_to_node(cpu)); - else - i->jumpstack[cpu] = kmalloc_node(size, - GFP_KERNEL, cpu_to_node(cpu)); + i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL, + cpu_to_node(cpu)); if (i->jumpstack[cpu] == NULL) /* * Freeing will be done later on by the callers. The diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index 37d581a31cff..3f6c4fa78bdb 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par, } sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size; - if (sz <= PAGE_SIZE) - t = kzalloc(sz, GFP_KERNEL); - else - t = vzalloc(sz); + t = kvzalloc(sz, GFP_KERNEL); if (t == NULL) { ret = -ENOMEM; goto out; diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c index d00f4c7c2f3a..b30a2c70bd48 100644 --- a/net/sched/sch_choke.c +++ b/net/sched/sch_choke.c @@ -376,10 +376,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt) if (mask != q->tab_mask) { struct sk_buff **ntab; - ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), - GFP_KERNEL | __GFP_NOWARN); - if (!ntab) - ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *)); + ntab = kvmalloc_array((mask + 1), sizeof(struct sk_buff *), GFP_KERNEL | __GFP_ZERO); if (!ntab) return -ENOMEM; diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 18bbb5476c83..9201abce928c 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -446,27 +446,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt) return 0; } -static void *fq_codel_zalloc(size_t sz) -{ - void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN); - - if (!ptr) - ptr = vzalloc(sz); - return ptr; -} - -static void fq_codel_free(void *addr) -{ - kvfree(addr); -} - static void fq_codel_destroy(struct Qdisc *sch) { struct fq_codel_sched_data *q = qdisc_priv(sch); tcf_destroy_chain(&q->filter_list); - fq_codel_free(q->backlogs); - fq_codel_free(q->flows); + kvfree(q->backlogs); + kvfree(q->flows); } static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt) @@ -493,13 +479,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt) } if (!q->flows) { - q->flows = fq_codel_zalloc(q->flows_cnt * - sizeof(struct fq_codel_flow)); + q->flows = kvzalloc(q->flows_cnt * + sizeof(struct fq_codel_flow), GFP_KERNEL); if (!q->flows) return -ENOMEM; - q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32)); + q->backlogs = kvzalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL); if (!q->backlogs) { - fq_codel_free(q->flows); + kvfree(q->flows); return -ENOMEM; } for (i = 0; i < q->flows_cnt; i++) { diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c index c19d346e6c5a..51d3ba682af9 100644 --- a/net/sched/sch_hhf.c +++ b/net/sched/sch_hhf.c @@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch) rtnl_kfree_skbs(skb, skb); } -static void *hhf_zalloc(size_t sz) -{ - void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN); - - if (!ptr) - ptr = vzalloc(sz); - - return ptr; -} - -static void hhf_free(void *addr) -{ - kvfree(addr); -} - static void hhf_destroy(struct Qdisc *sch) { int i; struct hhf_sched_data *q = qdisc_priv(sch); for (i = 0; i < HHF_ARRAYS_CNT; i++) { - hhf_free(q->hhf_arrays[i]); - hhf_free(q->hhf_valid_bits[i]); + kvfree(q->hhf_arrays[i]); + kvfree(q->hhf_valid_bits[i]); } for (i = 0; i < HH_FLOWS_CNT; i++) { @@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch) kfree(flow); } } - hhf_free(q->hh_flows); + kvfree(q->hh_flows); } static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = { @@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt) if (!q->hh_flows) { /* Initialize heavy-hitter flow table. */ - q->hh_flows = hhf_zalloc(HH_FLOWS_CNT * - sizeof(struct list_head)); + q->hh_flows = kvzalloc(HH_FLOWS_CNT * + sizeof(struct list_head), GFP_KERNEL); if (!q->hh_flows) return -ENOMEM; for (i = 0; i < HH_FLOWS_CNT; i++) @@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt) /* Initialize heavy-hitter filter arrays. */ for (i = 0; i < HHF_ARRAYS_CNT; i++) { - q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN * - sizeof(u32)); + q->hhf_arrays[i] = kvzalloc(HHF_ARRAYS_LEN * + sizeof(u32), GFP_KERNEL); if (!q->hhf_arrays[i]) { /* Note: hhf_destroy() will be called * by our caller. @@ -637,8 +622,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt) /* Initialize valid bits of heavy-hitter filter arrays. */ for (i = 0; i < HHF_ARRAYS_CNT; i++) { - q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN / - BITS_PER_BYTE); + q->hhf_valid_bits[i] = kvzalloc(HHF_ARRAYS_LEN / + BITS_PER_BYTE, GFP_KERNEL); if (!q->hhf_valid_bits[i]) { /* Note: hhf_destroy() will be called * by our caller. diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index f0ce4780f395..1b3dd6190e93 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -702,15 +702,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr) spinlock_t *root_lock; struct disttable *d; int i; - size_t s; if (n > NETEM_DIST_MAX) return -EINVAL; - s = sizeof(struct disttable) + n * sizeof(s16); - d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN); - if (!d) - d = vmalloc(s); + d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL); if (!d) return -ENOMEM; diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index b00e02c139de..332d94be6e1c 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -685,11 +685,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) static void *sfq_alloc(size_t sz) { - void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN); - - if (!ptr) - ptr = vmalloc(sz); - return ptr; + return kvmalloc(sz, GFP_KERNEL); } static void sfq_free(void *addr) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 82a9e1851108..447a7d5cee0f 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -101,14 +101,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type, if (_payload) { ret = -ENOMEM; - payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN); - if (!payload) { - if (plen <= PAGE_SIZE) - goto error2; - payload = vmalloc(plen); - if (!payload) - goto error2; - } + payload = kvmalloc(plen, GFP_KERNEL); + if (!payload) + goto error2; ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -1071,14 +1066,9 @@ long keyctl_instantiate_key_common(key_serial_t id, if (from) { ret = -ENOMEM; - payload = kmalloc(plen, GFP_KERNEL); - if (!payload) { - if (plen <= PAGE_SIZE) - goto error; - payload = vmalloc(plen); - if (!payload) - goto error; - } + payload = kvmalloc(plen, GFP_KERNEL); + if (!payload) + goto error; ret = -EFAULT; if (!copy_from_iter_full(payload, plen, from)) -- cgit From da6bc57a8f02dd90d07071b4cd067f2de26c9192 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:31 -0700 Subject: net: use kvmalloc with __GFP_REPEAT rather than open coded variant fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc with vmalloc fallback. Use the kvmalloc variant instead. Keep the __GFP_REPEAT flag based on explanation from Eric: "At the time, tests on the hardware I had in my labs showed that vmalloc() could deliver pages spread all over the memory and that was a small penalty (once memory is fragmented enough, not at boot time)" The way how the code is constructed means, however, that we prefer to go and hit the OOM killer before we fall back to the vmalloc for requests <=32kB (with 4kB pages) in the current code. This is rather disruptive for something that can be achived with the fallback. On the other hand __GFP_REPEAT doesn't have any useful semantic for these requests. So the effect of this patch is that requests which fit into 32kB will fall back to vmalloc easier now. Link: http://lkml.kernel.org/r/20170306103327.2766-3-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Eric Dumazet Cc: David Miller Cc: Shakeel Butt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- net/core/dev.c | 24 +++++++++--------------- net/sched/sch_fq.c | 12 +----------- 2 files changed, 10 insertions(+), 26 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index d07aa5ffb511..99924d16f2bd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7264,12 +7264,10 @@ static int netif_alloc_rx_queues(struct net_device *dev) BUG_ON(count < 1); - rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - if (!rx) { - rx = vzalloc(sz); - if (!rx) - return -ENOMEM; - } + rx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT); + if (!rx) + return -ENOMEM; + dev->_rx = rx; for (i = 0; i < count; i++) @@ -7306,12 +7304,10 @@ static int netif_alloc_netdev_queues(struct net_device *dev) if (count < 1 || count > 0xffff) return -EINVAL; - tx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - if (!tx) { - tx = vzalloc(sz); - if (!tx) - return -ENOMEM; - } + tx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT); + if (!tx) + return -ENOMEM; + dev->_tx = tx; netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); @@ -7845,9 +7841,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, /* ensure 32-byte alignment of whole construct */ alloc_size += NETDEV_ALIGN - 1; - p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); - if (!p) - p = vzalloc(alloc_size); + p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT); if (!p) return NULL; diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index da4f67bda0ee..b488721a0059 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q, q->stat_gc_flows += fcnt; } -static void *fq_alloc_node(size_t sz, int node) -{ - void *ptr; - - ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node); - if (!ptr) - ptr = vmalloc_node(sz, node); - return ptr; -} - static void fq_free(void *addr) { kvfree(addr); @@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) return 0; /* If XPS was setup, we can allocate memory on right NUMA node */ - array = fq_alloc_node(sizeof(struct rb_root) << log, + array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL | __GFP_REPEAT, netdev_queue_numa_node_read(sch->dev_queue)); if (!array) return -ENOMEM; -- cgit From d224e938189771dbd1e3b68ee8603a949bee76bb Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:34 -0700 Subject: drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant copy_params uses kmalloc with vmalloc fallback. We already have a helper for that - kvmalloc. This caller requires GFP_NOIO semantic so it hasn't been converted with many others by previous patches. All we need to achieve this semantic is to use the scope memalloc_noio_{save,restore} around kvmalloc. Link: http://lkml.kernel.org/r/20170306103327.2766-4-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Mikulas Patocka Cc: Mike Snitzer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/md/dm-ioctl.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 2d5d7064acbf..0555b4410e05 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1691,6 +1691,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern struct dm_ioctl *dmi; int secure_data; const size_t minimum_data_size = offsetof(struct dm_ioctl, data); + unsigned noio_flag; if (copy_from_user(param_kernel, user, minimum_data_size)) return -EFAULT; @@ -1713,15 +1714,9 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern * Use kmalloc() rather than vmalloc() when we can. */ dmi = NULL; - if (param_kernel->data_size <= KMALLOC_MAX_SIZE) - dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); - - if (!dmi) { - unsigned noio_flag; - noio_flag = memalloc_noio_save(); - dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL); - memalloc_noio_restore(noio_flag); - } + noio_flag = memalloc_noio_save(); + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL); + memalloc_noio_restore(noio_flag); if (!dmi) { if (secure_data && clear_user(user, param_kernel->data_size)) -- cgit From bc4e54f6e966e9ca35064cd60f91b1478c07a1b2 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:37 -0700 Subject: drivers/md/bcache/super.c: use kvmalloc bcache_device_init uses kmalloc for small requests and vmalloc for those which are larger than 64 pages. This alone is a strange criterion. Moreover kmalloc can fallback to vmalloc on the failure. Let's simply use kvmalloc instead as it knows how to handle the fallback properly Link: http://lkml.kernel.org/r/20170306103327.2766-5-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Kent Overstreet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/md/bcache/super.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 85e3f21c2514..e57353e39168 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -767,16 +767,12 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size, } n = d->nr_stripes * sizeof(atomic_t); - d->stripe_sectors_dirty = n < PAGE_SIZE << 6 - ? kzalloc(n, GFP_KERNEL) - : vzalloc(n); + d->stripe_sectors_dirty = kvzalloc(n, GFP_KERNEL); if (!d->stripe_sectors_dirty) return -ENOMEM; n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long); - d->full_dirty_stripes = n < PAGE_SIZE << 6 - ? kzalloc(n, GFP_KERNEL) - : vzalloc(n); + d->full_dirty_stripes = kvzalloc(n, GFP_KERNEL); if (!d->full_dirty_stripes) return -ENOMEM; -- cgit From 54f180d3c181277457fb003dd9524c2aa1ef8160 Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Mon, 8 May 2017 15:57:40 -0700 Subject: mm, swap: use kvzalloc to allocate some swap data structures Now vzalloc() is used in swap code to allocate various data structures, such as swap cache, swap slots cache, cluster info, etc. Because the size may be too large on some system, so that normal kzalloc() may fail. But using kzalloc() has some advantages, for example, less memory fragmentation, less TLB pressure, etc. So change the data structure allocation in swap code to use kvzalloc() which will try kzalloc() firstly, and fallback to vzalloc() if kzalloc() failed. In general, although kmalloc() will reduce the number of high-order pages in short term, vmalloc() will cause more pain for memory fragmentation in the long term. And the swap data structure allocation that is changed in this patch is expected to be long term allocation. From Dave Hansen: "for example, we have a two-page data structure. vmalloc() takes two effectively random order-0 pages, probably from two different 2M pages and pins them. That "kills" two 2M pages. kmalloc(), allocating two *contiguous* pages, will not cross a 2M boundary. That means it will only "kill" the possibility of a single 2M page. More 2M pages == less fragmentation. The allocation in this patch occurs during swap on time, which is usually done during system boot, so usually we have high opportunity to allocate the contiguous pages successfully. The allocation for swap_map[] in struct swap_info_struct is not changed, because that is usually quite large and vmalloc_to_page() is used for it. That makes it a little harder to change. Link: http://lkml.kernel.org/r/20170407064911.25447-1-ying.huang@intel.com Signed-off-by: Huang Ying Acked-by: Tim Chen Acked-by: Michal Hocko Acked-by: Rik van Riel Cc: Dave Hansen Cc: Hugh Dickins Cc: Shaohua Li Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/swap_slots.c | 19 +++++++++++-------- mm/swap_state.c | 2 +- mm/swapfile.c | 10 ++++++---- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/mm/swap_slots.c b/mm/swap_slots.c index aa1c415f4abd..58f6c78f1dad 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -31,6 +31,7 @@ #include #include #include +#include #ifdef CONFIG_SWAP @@ -119,16 +120,18 @@ static int alloc_swap_slot_cache(unsigned int cpu) /* * Do allocation outside swap_slots_cache_mutex - * as vzalloc could trigger reclaim and get_swap_page, + * as kvzalloc could trigger reclaim and get_swap_page, * which can lock swap_slots_cache_mutex. */ - slots = vzalloc(sizeof(swp_entry_t) * SWAP_SLOTS_CACHE_SIZE); + slots = kvzalloc(sizeof(swp_entry_t) * SWAP_SLOTS_CACHE_SIZE, + GFP_KERNEL); if (!slots) return -ENOMEM; - slots_ret = vzalloc(sizeof(swp_entry_t) * SWAP_SLOTS_CACHE_SIZE); + slots_ret = kvzalloc(sizeof(swp_entry_t) * SWAP_SLOTS_CACHE_SIZE, + GFP_KERNEL); if (!slots_ret) { - vfree(slots); + kvfree(slots); return -ENOMEM; } @@ -152,9 +155,9 @@ static int alloc_swap_slot_cache(unsigned int cpu) out: mutex_unlock(&swap_slots_cache_mutex); if (slots) - vfree(slots); + kvfree(slots); if (slots_ret) - vfree(slots_ret); + kvfree(slots_ret); return 0; } @@ -171,7 +174,7 @@ static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, cache->cur = 0; cache->nr = 0; if (free_slots && cache->slots) { - vfree(cache->slots); + kvfree(cache->slots); cache->slots = NULL; } mutex_unlock(&cache->alloc_lock); @@ -186,7 +189,7 @@ static void drain_slots_cache_cpu(unsigned int cpu, unsigned int type, } spin_unlock_irq(&cache->free_lock); if (slots) - vfree(slots); + kvfree(slots); } } diff --git a/mm/swap_state.c b/mm/swap_state.c index 7bfb9bd1ca21..539b8885e3d1 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -523,7 +523,7 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages) unsigned int i, nr; nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); - spaces = vzalloc(sizeof(struct address_space) * nr); + spaces = kvzalloc(sizeof(struct address_space) * nr, GFP_KERNEL); if (!spaces) return -ENOMEM; for (i = 0; i < nr; i++) { diff --git a/mm/swapfile.c b/mm/swapfile.c index b86b2aca3fb9..4f6cba1b6632 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2270,8 +2270,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; vfree(swap_map); - vfree(cluster_info); - vfree(frontswap_map); + kvfree(cluster_info); + kvfree(frontswap_map); /* Destroy swap account information */ swap_cgroup_swapoff(p->type); exit_swap_address_space(p->type); @@ -2794,7 +2794,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) p->cluster_next = 1 + (prandom_u32() % p->highest_bit); nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); - cluster_info = vzalloc(nr_cluster * sizeof(*cluster_info)); + cluster_info = kvzalloc(nr_cluster * sizeof(*cluster_info), + GFP_KERNEL); if (!cluster_info) { error = -ENOMEM; goto bad_swap; @@ -2827,7 +2828,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) } /* frontswap enabled? set up bit-per-page map for frontswap */ if (IS_ENABLED(CONFIG_FRONTSWAP)) - frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long)); + frontswap_map = kvzalloc(BITS_TO_LONGS(maxpages) * sizeof(long), + GFP_KERNEL); if (p->bdev &&(swap_flags & SWAP_FLAG_DISCARD) && swap_discardable(p)) { /* -- cgit From 19809c2da28aee5860ad9a2eff760730a0710df0 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 8 May 2017 15:57:44 -0700 Subject: mm, vmalloc: use __GFP_HIGHMEM implicitly __vmalloc* allows users to provide gfp flags for the underlying allocation. This API is quite popular $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l 77 The only problem is that many people are not aware that they really want to give __GFP_HIGHMEM along with other flags because there is really no reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages which are mapped to the kernel vmalloc space. About half of users don't use this flag, though. This signals that we make the API unnecessarily too complex. This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM are simplified and drop the flag. Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Matthew Wilcox Cc: Al Viro Cc: Vlastimil Babka Cc: David Rientjes Cc: Cristopher Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/parisc/kernel/module.c | 2 +- arch/x86/kernel/module.c | 2 +- drivers/block/drbd/drbd_bitmap.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 ++-- drivers/md/dm-bufio.c | 2 +- fs/btrfs/free-space-tree.c | 3 +-- fs/file.c | 2 +- fs/xfs/kmem.c | 2 +- include/drm/drm_mem_util.h | 9 +++------ kernel/bpf/core.c | 9 +++------ kernel/bpf/syscall.c | 3 +-- kernel/fork.c | 2 +- kernel/groups.c | 2 +- kernel/module.c | 2 +- mm/kasan/kasan.c | 2 +- mm/nommu.c | 3 +-- mm/util.c | 2 +- mm/vmalloc.c | 14 +++++++------- net/ceph/ceph_common.c | 2 +- net/netfilter/x_tables.c | 3 +-- 20 files changed, 31 insertions(+), 41 deletions(-) diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index c66c943d9322..f1a76935a314 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -218,7 +218,7 @@ void *module_alloc(unsigned long size) * easier than trying to map the text, data, init_text and * init_data correctly */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, - GFP_KERNEL | __GFP_HIGHMEM, + GFP_KERNEL, PAGE_KERNEL_RWX, 0, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 477ae806c2fa..f67bd3205df7 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -85,7 +85,7 @@ void *module_alloc(unsigned long size) p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR + get_module_load_offset(), - MODULES_END, GFP_KERNEL | __GFP_HIGHMEM, + MODULES_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (p && (kasan_module_alloc(p, size) < 0)) { diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c index dece26f119d4..a804a4107fbc 100644 --- a/drivers/block/drbd/drbd_bitmap.c +++ b/drivers/block/drbd/drbd_bitmap.c @@ -409,7 +409,7 @@ static struct page **bm_realloc_pages(struct drbd_bitmap *b, unsigned long want) new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN); if (!new_pages) { new_pages = __vmalloc(bytes, - GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO, + GFP_NOIO | __GFP_ZERO, PAGE_KERNEL); if (!new_pages) return NULL; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c index d019b5e311cc..2d955d7d7b6d 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c @@ -161,8 +161,8 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) file_size += sizeof(*iter.hdr) * n_obj; /* Allocate the file in vmalloc memory, it's likely to be big */ - iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_HIGHMEM | - __GFP_NOWARN | __GFP_NORETRY, PAGE_KERNEL); + iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, + PAGE_KERNEL); if (!iter.start) { dev_warn(gpu->dev, "failed to allocate devcoredump file\n"); return; diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index c92c31b23e54..5db11a405129 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -406,7 +406,7 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask, if (gfp_mask & __GFP_NORETRY) noio_flag = memalloc_noio_save(); - ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL); + ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); if (gfp_mask & __GFP_NORETRY) memalloc_noio_restore(noio_flag); diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index dd7fb22a955a..fc0bd8406758 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -167,8 +167,7 @@ static u8 *alloc_bitmap(u32 bitmap_size) if (mem) return mem; - return __vmalloc(bitmap_size, GFP_NOFS | __GFP_HIGHMEM | __GFP_ZERO, - PAGE_KERNEL); + return __vmalloc(bitmap_size, GFP_NOFS | __GFP_ZERO, PAGE_KERNEL); } int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans, diff --git a/fs/file.c b/fs/file.c index ad6f094f2eff..1c2972e3a405 100644 --- a/fs/file.c +++ b/fs/file.c @@ -42,7 +42,7 @@ static void *alloc_fdmem(size_t size) if (data != NULL) return data; } - return __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_HIGHMEM, PAGE_KERNEL); + return __vmalloc(size, GFP_KERNEL_ACCOUNT, PAGE_KERNEL); } static void __free_fdtable(struct fdtable *fdt) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 780fc8986dab..393b6849aeb3 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -67,7 +67,7 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags) nofs_flag = memalloc_nofs_save(); lflags = kmem_flags_convert(flags); - ptr = __vmalloc(size, lflags | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL); + ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL); if (flags & KM_NOFS) memalloc_nofs_restore(nofs_flag); diff --git a/include/drm/drm_mem_util.h b/include/drm/drm_mem_util.h index 70d4e221a3ad..d0f6cf2e5324 100644 --- a/include/drm/drm_mem_util.h +++ b/include/drm/drm_mem_util.h @@ -37,8 +37,7 @@ static __inline__ void *drm_calloc_large(size_t nmemb, size_t size) if (size * nmemb <= PAGE_SIZE) return kcalloc(nmemb, size, GFP_KERNEL); - return __vmalloc(size * nmemb, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL); + return vzalloc(size * nmemb); } /* Modeled after cairo's malloc_ab, it's like calloc but without the zeroing. */ @@ -50,8 +49,7 @@ static __inline__ void *drm_malloc_ab(size_t nmemb, size_t size) if (size * nmemb <= PAGE_SIZE) return kmalloc(nmemb * size, GFP_KERNEL); - return __vmalloc(size * nmemb, - GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL); + return vmalloc(size * nmemb); } static __inline__ void *drm_malloc_gfp(size_t nmemb, size_t size, gfp_t gfp) @@ -69,8 +67,7 @@ static __inline__ void *drm_malloc_gfp(size_t nmemb, size_t size, gfp_t gfp) return ptr; } - return __vmalloc(size * nmemb, - gfp | __GFP_HIGHMEM, PAGE_KERNEL); + return __vmalloc(size * nmemb, gfp, PAGE_KERNEL); } static __inline void drm_free_large(void *ptr) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 6f81e0f5a0fa..dedf367f59bb 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -76,8 +76,7 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) { - gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | - gfp_extra_flags; + gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags; struct bpf_prog_aux *aux; struct bpf_prog *fp; @@ -107,8 +106,7 @@ EXPORT_SYMBOL_GPL(bpf_prog_alloc); struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size, gfp_t gfp_extra_flags) { - gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | - gfp_extra_flags; + gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags; struct bpf_prog *fp; u32 pages, delta; int ret; @@ -655,8 +653,7 @@ out: static struct bpf_prog *bpf_prog_clone_create(struct bpf_prog *fp_other, gfp_t gfp_extra_flags) { - gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | - gfp_extra_flags; + gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags; struct bpf_prog *fp; fp = __vmalloc(fp_other->pages * PAGE_SIZE, gfp_flags, PAGE_KERNEL); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 13642c73dca0..fd2411fd6914 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -67,8 +67,7 @@ void *bpf_map_area_alloc(size_t size) return area; } - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, - PAGE_KERNEL); + return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL); } void bpf_map_area_free(void *area) diff --git a/kernel/fork.c b/kernel/fork.c index 55e325f4b457..08ba696aa561 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -221,7 +221,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE, VMALLOC_START, VMALLOC_END, - THREADINFO_GFP | __GFP_HIGHMEM, + THREADINFO_GFP, PAGE_KERNEL, 0, node, __builtin_return_address(0)); diff --git a/kernel/groups.c b/kernel/groups.c index 8dd7a61b7115..d09727692a2a 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -18,7 +18,7 @@ struct group_info *groups_alloc(int gidsetsize) len = sizeof(struct group_info) + sizeof(kgid_t) * gidsetsize; gi = kmalloc(len, GFP_KERNEL_ACCOUNT|__GFP_NOWARN|__GFP_NORETRY); if (!gi) - gi = __vmalloc(len, GFP_KERNEL_ACCOUNT|__GFP_HIGHMEM, PAGE_KERNEL); + gi = __vmalloc(len, GFP_KERNEL_ACCOUNT, PAGE_KERNEL); if (!gi) return NULL; diff --git a/kernel/module.c b/kernel/module.c index f37308b733d8..2b316b954828 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2864,7 +2864,7 @@ static int copy_module_from_user(const void __user *umod, unsigned long len, /* Suck in entire file: we'll want most of it. */ info->hdr = __vmalloc(info->len, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_NOWARN, PAGE_KERNEL); + GFP_KERNEL | __GFP_NOWARN, PAGE_KERNEL); if (!info->hdr) return -ENOMEM; diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index 9348d27088c1..b10da59cf765 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -691,7 +691,7 @@ int kasan_module_alloc(void *addr, size_t size) ret = __vmalloc_node_range(shadow_size, 1, shadow_start, shadow_start + shadow_size, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, VM_NO_GUARD, NUMA_NO_NODE, __builtin_return_address(0)); diff --git a/mm/nommu.c b/mm/nommu.c index a80411d258fc..fc184f597d59 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -246,8 +246,7 @@ void *vmalloc_user(unsigned long size) { void *ret; - ret = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, - PAGE_KERNEL); + ret = __vmalloc(size, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL); if (ret) { struct vm_area_struct *vma; diff --git a/mm/util.c b/mm/util.c index f4e590b2c0da..718154debc87 100644 --- a/mm/util.c +++ b/mm/util.c @@ -382,7 +382,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) if (ret || size <= PAGE_SIZE) return ret; - return __vmalloc_node_flags(size, node, flags | __GFP_HIGHMEM); + return __vmalloc_node_flags(size, node, flags); } EXPORT_SYMBOL(kvmalloc_node); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 717b1e8b942c..1dda6d8a200a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1655,7 +1655,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page **pages; unsigned int nr_pages, array_size, i; const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN; + const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN; nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); @@ -1818,7 +1818,7 @@ EXPORT_SYMBOL(__vmalloc); void *vmalloc(unsigned long size) { return __vmalloc_node_flags(size, NUMA_NO_NODE, - GFP_KERNEL | __GFP_HIGHMEM); + GFP_KERNEL); } EXPORT_SYMBOL(vmalloc); @@ -1835,7 +1835,7 @@ EXPORT_SYMBOL(vmalloc); void *vzalloc(unsigned long size) { return __vmalloc_node_flags(size, NUMA_NO_NODE, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); + GFP_KERNEL | __GFP_ZERO); } EXPORT_SYMBOL(vzalloc); @@ -1852,7 +1852,7 @@ void *vmalloc_user(unsigned long size) void *ret; ret = __vmalloc_node(size, SHMLBA, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, NUMA_NO_NODE, __builtin_return_address(0)); if (ret) { @@ -1876,7 +1876,7 @@ EXPORT_SYMBOL(vmalloc_user); */ void *vmalloc_node(unsigned long size, int node) { - return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL, + return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL, node, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_node); @@ -1896,7 +1896,7 @@ EXPORT_SYMBOL(vmalloc_node); void *vzalloc_node(unsigned long size, int node) { return __vmalloc_node_flags(size, node, - GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); + GFP_KERNEL | __GFP_ZERO); } EXPORT_SYMBOL(vzalloc_node); @@ -1918,7 +1918,7 @@ EXPORT_SYMBOL(vzalloc_node); void *vmalloc_exec(unsigned long size) { - return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC, + return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL_EXEC, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index 108533859a53..4eb773ccce11 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -187,7 +187,7 @@ void *ceph_kvmalloc(size_t size, gfp_t flags) return ptr; } - return __vmalloc(size, flags | __GFP_HIGHMEM, PAGE_KERNEL); + return __vmalloc(size, flags, PAGE_KERNEL); } diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 3d0584665b5d..8876b7da6884 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -998,8 +998,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY); if (!info) { - info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | - __GFP_NORETRY | __GFP_HIGHMEM, + info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, PAGE_KERNEL); if (!info) return NULL; -- cgit From d1b7c9344b628f8bbb55a0775667f33b8eafac82 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Mon, 8 May 2017 15:57:47 -0700 Subject: scripts/spelling.txt: add "memory" pattern and fix typos Fix typos and add the following to the scripts/spelling.txt: momery||memory Link: http://lkml.kernel.org/r/20170317011131.6881-1-sboyd@codeaurora.org Signed-off-by: Stephen Boyd Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2 +- drivers/leds/leds-lp5521.c | 2 +- drivers/leds/leds-lp5523.c | 2 +- drivers/leds/leds-lp5562.c | 2 +- scripts/spelling.txt | 1 + 5 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c index 014c8262bfff..157adf381c18 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -1721,7 +1721,7 @@ int hns_roce_v1_write_mtpt(void *mb_buf, struct hns_roce_mr *mr, roce_set_field(mpt_entry->mpt_byte_64, MPT_BYTE_64_L_KEY_IDX_H_M, MPT_BYTE_64_L_KEY_IDX_H_S, mtpt_idx >> MTPT_IDX_SHIFT); - /* DMA momery regsiter */ + /* DMA memory regsiter */ if (mr->type == MR_TYPE_DMA) return 0; diff --git a/drivers/leds/leds-lp5521.c b/drivers/leds/leds-lp5521.c index 549b315ca8fe..f53c8cda1bde 100644 --- a/drivers/leds/leds-lp5521.c +++ b/drivers/leds/leds-lp5521.c @@ -281,7 +281,7 @@ static void lp5521_firmware_loaded(struct lp55xx_chip *chip) } /* - * Program momery sequence + * Program memory sequence * 1) set engine mode to "LOAD" * 2) write firmware data into program memory */ diff --git a/drivers/leds/leds-lp5523.c b/drivers/leds/leds-lp5523.c index c5b30f06218a..e9ba8cd32d66 100644 --- a/drivers/leds/leds-lp5523.c +++ b/drivers/leds/leds-lp5523.c @@ -387,7 +387,7 @@ static void lp5523_firmware_loaded(struct lp55xx_chip *chip) } /* - * Program momery sequence + * Program memory sequence * 1) set engine mode to "LOAD" * 2) write firmware data into program memory */ diff --git a/drivers/leds/leds-lp5562.c b/drivers/leds/leds-lp5562.c index b75333803a63..90892585bcb5 100644 --- a/drivers/leds/leds-lp5562.c +++ b/drivers/leds/leds-lp5562.c @@ -270,7 +270,7 @@ static void lp5562_firmware_loaded(struct lp55xx_chip *chip) } /* - * Program momery sequence + * Program memory sequence * 1) set engine mode to "LOAD" * 2) write firmware data into program memory */ diff --git a/scripts/spelling.txt b/scripts/spelling.txt index b67e74b22826..d778d7b42a78 100644 --- a/scripts/spelling.txt +++ b/scripts/spelling.txt @@ -691,6 +691,7 @@ miximum||maximum mmnemonic||mnemonic mnay||many modulues||modules +momery||memory monochorome||monochrome monochromo||monochrome monocrome||monochrome -- cgit From ad61dd303a0f2439bb104349e2d2ec91a3010ce0 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Mon, 8 May 2017 15:57:50 -0700 Subject: scripts/spelling.txt: add regsiter -> register spelling mistake This typo is quite common. Fix it and add it to the spelling file so that checkpatch catches it earlier. Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org Signed-off-by: Stephen Boyd Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arc/kernel/unwind.c | 2 +- arch/arm/kernel/kgdb.c | 2 +- arch/arm/mach-ixp4xx/common-pci.c | 4 ++-- arch/m68k/ifpsp060/src/ilsp.S | 2 +- arch/m68k/ifpsp060/src/isp.S | 2 +- arch/mips/cavium-octeon/executive/cvmx-helper-rgmii.c | 2 +- arch/mips/include/asm/octeon/cvmx-helper-rgmii.h | 2 +- arch/parisc/kernel/entry.S | 2 +- arch/powerpc/mm/icswx.c | 2 +- drivers/acpi/cppc_acpi.c | 2 +- drivers/clk/qcom/common.c | 2 +- drivers/cpufreq/sti-cpufreq.c | 4 ++-- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2 +- drivers/infiniband/hw/hns/hns_roce_mr.c | 2 +- drivers/net/can/rcar/rcar_canfd.c | 2 +- drivers/net/ethernet/amd/amd8111e.h | 4 ++-- drivers/net/ethernet/atheros/atl1c/atl1c_hw.c | 2 +- drivers/net/ethernet/intel/igb/e1000_phy.c | 2 +- drivers/scsi/isci/registers.h | 4 ++-- drivers/scsi/mpt3sas/mpt3sas_base.h | 2 +- include/linux/bcma/bcma_driver_pci.h | 2 +- include/linux/ftrace.h | 2 +- include/uapi/linux/ipmi.h | 2 +- scripts/spelling.txt | 1 + sound/soc/soc-core.c | 2 +- 25 files changed, 29 insertions(+), 28 deletions(-) diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c index b6e4f7a7419b..333daab7def0 100644 --- a/arch/arc/kernel/unwind.c +++ b/arch/arc/kernel/unwind.c @@ -845,7 +845,7 @@ static int processCFI(const u8 *start, const u8 *end, unsigned long targetLoc, * state->dataAlign; break; case DW_CFA_def_cfa_register: - unw_debug("cfa_def_cfa_regsiter: "); + unw_debug("cfa_def_cfa_register: "); state->cfa.reg = get_uleb128(&ptr.p8, end); break; /*todo case DW_CFA_def_cfa_expression: */ diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c index 9232caee7060..1bb4c40a3135 100644 --- a/arch/arm/kernel/kgdb.c +++ b/arch/arm/kernel/kgdb.c @@ -269,7 +269,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) /* * Register our undef instruction hooks with ARM undef core. - * We regsiter a hook specifically looking for the KGB break inst + * We register a hook specifically looking for the KGB break inst * and we handle the normal undef case within the do_undefinstr * handler. */ diff --git a/arch/arm/mach-ixp4xx/common-pci.c b/arch/arm/mach-ixp4xx/common-pci.c index 4977296f0c78..bcf3df59f71b 100644 --- a/arch/arm/mach-ixp4xx/common-pci.c +++ b/arch/arm/mach-ixp4xx/common-pci.c @@ -43,14 +43,14 @@ int (*ixp4xx_pci_read)(u32 addr, u32 cmd, u32* data); /* - * Base address for PCI regsiter region + * Base address for PCI register region */ unsigned long ixp4xx_pci_reg_base = 0; /* * PCI cfg an I/O routines are done by programming a * command/byte enable register, and then read/writing - * the data from a data regsiter. We need to ensure + * the data from a data register. We need to ensure * these transactions are atomic or we will end up * with corrupt data on the bus or in a driver. */ diff --git a/arch/m68k/ifpsp060/src/ilsp.S b/arch/m68k/ifpsp060/src/ilsp.S index 970abaf3303e..dd5b2c357e95 100644 --- a/arch/m68k/ifpsp060/src/ilsp.S +++ b/arch/m68k/ifpsp060/src/ilsp.S @@ -776,7 +776,7 @@ muls64_zero: # ALGORITHM *********************************************************** # # In the interest of simplicity, all operands are converted to # # longword size whether the operation is byte, word, or long. The # -# bounds are sign extended accordingly. If Rn is a data regsiter, Rn is # +# bounds are sign extended accordingly. If Rn is a data register, Rn is # # also sign extended. If Rn is an address register, it need not be sign # # extended since the full register is always used. # # The condition codes are set correctly before the final "rts". # diff --git a/arch/m68k/ifpsp060/src/isp.S b/arch/m68k/ifpsp060/src/isp.S index b865c1a052ba..29a9f8629b9d 100644 --- a/arch/m68k/ifpsp060/src/isp.S +++ b/arch/m68k/ifpsp060/src/isp.S @@ -1876,7 +1876,7 @@ movp_read_err: # word, or longword sized operands. Then, in the interest of # # simplicity, all operands are converted to longword size whether the # # operation is byte, word, or long. The bounds are sign extended # -# accordingly. If Rn is a data regsiter, Rn is also sign extended. If # +# accordingly. If Rn is a data register, Rn is also sign extended. If # # Rn is an address register, it need not be sign extended since the # # full register is always used. # # The comparisons are made and the condition codes calculated. # diff --git a/arch/mips/cavium-octeon/executive/cvmx-helper-rgmii.c b/arch/mips/cavium-octeon/executive/cvmx-helper-rgmii.c index ba4753c23b03..d18ed5af62f4 100644 --- a/arch/mips/cavium-octeon/executive/cvmx-helper-rgmii.c +++ b/arch/mips/cavium-octeon/executive/cvmx-helper-rgmii.c @@ -152,7 +152,7 @@ static int __cvmx_helper_errata_asx_pass1(int interface, int port, } /** - * Configure all of the ASX, GMX, and PKO regsiters required + * Configure all of the ASX, GMX, and PKO registers required * to get RGMII to function on the supplied interface. * * @interface: PKO Interface to configure (0 or 1) diff --git a/arch/mips/include/asm/octeon/cvmx-helper-rgmii.h b/arch/mips/include/asm/octeon/cvmx-helper-rgmii.h index f89775be7654..f7a95d7de140 100644 --- a/arch/mips/include/asm/octeon/cvmx-helper-rgmii.h +++ b/arch/mips/include/asm/octeon/cvmx-helper-rgmii.h @@ -55,7 +55,7 @@ extern int __cvmx_helper_rgmii_probe(int interface); extern void cvmx_helper_rgmii_internal_loopback(int port); /** - * Configure all of the ASX, GMX, and PKO regsiters required + * Configure all of the ASX, GMX, and PKO registers required * to get RGMII to function on the supplied interface. * * @interface: PKO Interface to configure (0 or 1) diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index ad4cb1613c57..a4fd296c958e 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -1369,7 +1369,7 @@ nadtlb_nullify: /* When there is no translation for the probe address then we - must nullify the insn and return zero in the target regsiter. + must nullify the insn and return zero in the target register. This will indicate to the calling code that it does not have write/read privileges to this address. diff --git a/arch/powerpc/mm/icswx.c b/arch/powerpc/mm/icswx.c index 915412e4d5ba..1fa794d7d59f 100644 --- a/arch/powerpc/mm/icswx.c +++ b/arch/powerpc/mm/icswx.c @@ -186,7 +186,7 @@ static u32 acop_get_inst(struct pt_regs *regs) } /** - * @regs: regsiters at time of interrupt + * @regs: registers at time of interrupt * @address: storage address * @error_code: Fault code, usually the DSISR or ESR depending on * processor type diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 6cbe6036da99..e5b47f032d9a 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -95,7 +95,7 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr); /* pcc mapped address + header size + offset within PCC subspace */ #define GET_PCC_VADDR(offs) (pcc_data.pcc_comm_addr + 0x8 + (offs)) -/* Check if a CPC regsiter is in PCC */ +/* Check if a CPC register is in PCC */ #define CPC_IN_PCC(cpc) ((cpc)->type == ACPI_TYPE_BUFFER && \ (cpc)->cpc_entry.reg.space_id == \ ACPI_ADR_SPACE_PLATFORM_COMM) diff --git a/drivers/clk/qcom/common.c b/drivers/clk/qcom/common.c index 03f9d316f969..d523991c945f 100644 --- a/drivers/clk/qcom/common.c +++ b/drivers/clk/qcom/common.c @@ -128,7 +128,7 @@ static void qcom_cc_gdsc_unregister(void *data) /* * Backwards compatibility with old DTs. Register a pass-through factor 1/1 - * clock to translate 'path' clk into 'name' clk and regsiter the 'path' + * clock to translate 'path' clk into 'name' clk and register the 'path' * clk as a fixed rate clock if it isn't present. */ static int _qcom_cc_register_board_clk(struct device *dev, const char *path, diff --git a/drivers/cpufreq/sti-cpufreq.c b/drivers/cpufreq/sti-cpufreq.c index a7db9011d5fe..d2d0430d09d4 100644 --- a/drivers/cpufreq/sti-cpufreq.c +++ b/drivers/cpufreq/sti-cpufreq.c @@ -236,7 +236,7 @@ use_defaults: return 0; } -static int sti_cpufreq_fetch_syscon_regsiters(void) +static int sti_cpufreq_fetch_syscon_registers(void) { struct device *dev = ddata.cpu; struct device_node *np = dev->of_node; @@ -275,7 +275,7 @@ static int sti_cpufreq_init(void) goto skip_voltage_scaling; } - ret = sti_cpufreq_fetch_syscon_regsiters(); + ret = sti_cpufreq_fetch_syscon_registers(); if (ret) goto skip_voltage_scaling; diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c index 157adf381c18..37d5d29597a4 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -1721,7 +1721,7 @@ int hns_roce_v1_write_mtpt(void *mb_buf, struct hns_roce_mr *mr, roce_set_field(mpt_entry->mpt_byte_64, MPT_BYTE_64_L_KEY_IDX_H_M, MPT_BYTE_64_L_KEY_IDX_H_S, mtpt_idx >> MTPT_IDX_SHIFT); - /* DMA memory regsiter */ + /* DMA memory register */ if (mr->type == MR_TYPE_DMA) return 0; diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index dc5c97c8f070..80fc01ffd8bd 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -205,7 +205,7 @@ int hns_roce_mtt_init(struct hns_roce_dev *hr_dev, int npages, int page_shift, return 0; } - /* Note: if page_shift is zero, FAST memory regsiter */ + /* Note: if page_shift is zero, FAST memory register */ mtt->page_shift = page_shift; /* Compute MTT entry necessary */ diff --git a/drivers/net/can/rcar/rcar_canfd.c b/drivers/net/can/rcar/rcar_canfd.c index 4ef07d97156d..602c19e23f05 100644 --- a/drivers/net/can/rcar/rcar_canfd.c +++ b/drivers/net/can/rcar/rcar_canfd.c @@ -413,7 +413,7 @@ /* RSCFDnRPGACCr */ #define RCANFD_C_RPGACC(r) (0x1900 + (0x04 * (r))) -/* CAN FD mode specific regsiter map */ +/* CAN FD mode specific register map */ /* RSCFDnCFDCmXXX -> RCANFD_F_XXX(m) */ #define RCANFD_F_DCFG(m) (0x0500 + (0x20 * (m))) diff --git a/drivers/net/ethernet/amd/amd8111e.h b/drivers/net/ethernet/amd/amd8111e.h index 7cdb18512407..2a57b46fd6a6 100644 --- a/drivers/net/ethernet/amd/amd8111e.h +++ b/drivers/net/ethernet/amd/amd8111e.h @@ -48,7 +48,7 @@ eg., if the value 10011010b is written into the least significant byte of a comm /* 32 bit registers */ #define ASF_STAT 0x00 /* ASF status register */ -#define CHIPID 0x04 /* Chip ID regsiter */ +#define CHIPID 0x04 /* Chip ID register */ #define MIB_DATA 0x10 /* MIB data register */ #define MIB_ADDR 0x14 /* MIB address register */ #define STAT0 0x30 /* Status0 register */ @@ -648,7 +648,7 @@ typedef enum { /* driver ioctl parameters */ #define AMD8111E_REG_DUMP_LEN 13*sizeof(u32) -/* amd8111e desriptor format */ +/* amd8111e descriptor format */ struct amd8111e_tx_dr{ diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c index a8b80c56ac25..73efdb05a490 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c @@ -307,7 +307,7 @@ void atl1c_start_phy_polling(struct atl1c_hw *hw, u16 clk_sel) /* * atl1c_read_phy_core - * core function to read register in PHY via MDIO control regsiter. + * core function to read register in PHY via MDIO control register. * ext: extension register (see IEEE 802.3) * dev: device address (see IEEE 802.3 DEVAD, PRTAD is fixed to 0) * reg: reg to read diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c index 68812d783f33..413025bdcb50 100644 --- a/drivers/net/ethernet/intel/igb/e1000_phy.c +++ b/drivers/net/ethernet/intel/igb/e1000_phy.c @@ -127,7 +127,7 @@ out: * @offset: register offset to be read * @data: pointer to the read data * - * Reads the MDI control regsiter in the PHY at offset and stores the + * Reads the MDI control register in the PHY at offset and stores the * information read to data. **/ s32 igb_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) diff --git a/drivers/scsi/isci/registers.h b/drivers/scsi/isci/registers.h index 97f3ceb8d724..63468cfe3e4a 100644 --- a/drivers/scsi/isci/registers.h +++ b/drivers/scsi/isci/registers.h @@ -652,7 +652,7 @@ struct scu_iit_entry { /* - * TODO: Where is the SAS_LNKTOV regsiter? + * TODO: Where is the SAS_LNKTOV register? * TODO: Where is the SAS_PHYTOV register? */ #define SCU_SAS_TRANSMIT_IDENTIFICATION_SMP_TARGET_SHIFT (1) @@ -1827,7 +1827,7 @@ struct scu_peg_registers { }; /** - * struct scu_registers - SCU regsiters including both PEG registers if we turn + * struct scu_registers - SCU registers including both PEG registers if we turn * on that compile option. All of these registers are in the memory mapped * space returned from BAR1. * diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 8981806fb13f..099ab4ca7edf 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -1421,7 +1421,7 @@ void mpt3sas_ctl_add_to_event_log(struct MPT3SAS_ADAPTER *ioc, Mpi2EventNotificationReply_t *mpi_reply); void mpt3sas_enable_diag_buffer(struct MPT3SAS_ADAPTER *ioc, - u8 bits_to_regsiter); + u8 bits_to_register); int mpt3sas_send_diag_release(struct MPT3SAS_ADAPTER *ioc, u8 buffer_type, u8 *issue_reset); diff --git a/include/linux/bcma/bcma_driver_pci.h b/include/linux/bcma/bcma_driver_pci.h index 9657f11d48a7..bca6a5e4ca3d 100644 --- a/include/linux/bcma/bcma_driver_pci.h +++ b/include/linux/bcma/bcma_driver_pci.h @@ -80,7 +80,7 @@ struct pci_dev; #define BCMA_CORE_PCI_MDIODATA_DEV_TX 0x1e /* SERDES TX Dev */ #define BCMA_CORE_PCI_MDIODATA_DEV_RX 0x1f /* SERDES RX Dev */ #define BCMA_CORE_PCI_PCIEIND_ADDR 0x0130 /* indirect access to the internal register */ -#define BCMA_CORE_PCI_PCIEIND_DATA 0x0134 /* Data to/from the internal regsiter */ +#define BCMA_CORE_PCI_PCIEIND_DATA 0x0134 /* Data to/from the internal register */ #define BCMA_CORE_PCI_CLKREQENCTRL 0x0138 /* >= rev 6, Clkreq rdma control */ #define BCMA_CORE_PCI_PCICFG0 0x0400 /* PCI config space 0 (rev >= 8) */ #define BCMA_CORE_PCI_PCICFG1 0x0500 /* PCI config space 1 (rev >= 8) */ diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 6d2a63e4ea52..473f088aabea 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -72,7 +72,7 @@ ftrace_func_t ftrace_ops_get_func(struct ftrace_ops *ops); * CONTROL, SAVE_REGS, SAVE_REGS_IF_SUPPORTED, RECURSION_SAFE, STUB and * IPMODIFY are a kind of attribute flags which can be set only before * registering the ftrace_ops, and can not be modified while registered. - * Changing those attribute flags after regsitering ftrace_ops will + * Changing those attribute flags after registering ftrace_ops will * cause unexpected results. * * ENABLED - set/unset when ftrace_ops is registered/unregistered diff --git a/include/uapi/linux/ipmi.h b/include/uapi/linux/ipmi.h index 7b26a62e5707..b9095a27a08a 100644 --- a/include/uapi/linux/ipmi.h +++ b/include/uapi/linux/ipmi.h @@ -355,7 +355,7 @@ struct ipmi_cmdspec { #define IPMICTL_REGISTER_FOR_CMD _IOR(IPMI_IOC_MAGIC, 14, \ struct ipmi_cmdspec) /* - * Unregister a regsitered command. error values: + * Unregister a registered command. error values: * - EFAULT - an address supplied was invalid. * - ENOENT - The netfn/cmd was not found registered for this user. */ diff --git a/scripts/spelling.txt b/scripts/spelling.txt index d778d7b42a78..7b6d25a86202 100644 --- a/scripts/spelling.txt +++ b/scripts/spelling.txt @@ -891,6 +891,7 @@ registerd||registered registeresd||registered registes||registers registraration||registration +regsiter||register regster||register regualar||regular reguator||regulator diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 525f2f397b4c..aae099c0e502 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -936,7 +936,7 @@ static struct snd_soc_component *soc_find_component( * * @dlc: name of the DAI and optional component info to match * - * This function will search all regsitered components and their DAIs to + * This function will search all registered components and their DAIs to * find the DAI of the same name. The component's of_node and name * should also match if being specified. * -- cgit From 6e7c2b4dd36d8336f876c66a31d9b84616f3b2c4 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 8 May 2017 15:57:53 -0700 Subject: scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances Fix typos and add the following to the scripts/spelling.txt: intialisation||initialisation intialised||initialised intialise||initialise This commit does not intend to change the British spelling itself. Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- certs/blacklist.c | 2 +- drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 8 ++++---- drivers/video/fbdev/intelfb/intelfbdrv.c | 2 +- fs/inode.c | 2 +- fs/xfs/xfs_log_recover.c | 2 +- scripts/spelling.txt | 3 +++ 6 files changed, 11 insertions(+), 8 deletions(-) diff --git a/certs/blacklist.c b/certs/blacklist.c index 3eddce0e307a..3a507b9e2568 100644 --- a/certs/blacklist.c +++ b/certs/blacklist.c @@ -140,7 +140,7 @@ int is_hash_blacklisted(const u8 *hash, size_t hash_len, const char *type) EXPORT_SYMBOL_GPL(is_hash_blacklisted); /* - * Intialise the blacklist + * Initialise the blacklist */ static int __init blacklist_init(void) { diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c index d54490d3f7ad..1e594351a60f 100644 --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c @@ -387,7 +387,7 @@ static void sxgbe_free_rx_buffers(struct net_device *dev, /** * init_tx_ring - init the TX descriptor ring * @dev: net device structure - * @tx_ring: ring to be intialised + * @tx_ring: ring to be initialised * @tx_rsize: ring size * Description: this function initializes the DMA TX descriptor */ @@ -437,7 +437,7 @@ dmamem_err: /** * free_rx_ring - free the RX descriptor ring * @dev: net device structure - * @rx_ring: ring to be intialised + * @rx_ring: ring to be initialised * @rx_rsize: ring size * Description: this function initializes the DMA RX descriptor */ @@ -453,7 +453,7 @@ static void free_rx_ring(struct device *dev, struct sxgbe_rx_queue *rx_ring, /** * init_rx_ring - init the RX descriptor ring * @dev: net device structure - * @rx_ring: ring to be intialised + * @rx_ring: ring to be initialised * @rx_rsize: ring size * Description: this function initializes the DMA RX descriptor */ @@ -539,7 +539,7 @@ err_free_dma_rx: /** * free_tx_ring - free the TX descriptor ring * @dev: net device structure - * @tx_ring: ring to be intialised + * @tx_ring: ring to be initialised * @tx_rsize: ring size * Description: this function initializes the DMA TX descriptor */ diff --git a/drivers/video/fbdev/intelfb/intelfbdrv.c b/drivers/video/fbdev/intelfb/intelfbdrv.c index ff2a5d2023e1..6b444400a86c 100644 --- a/drivers/video/fbdev/intelfb/intelfbdrv.c +++ b/drivers/video/fbdev/intelfb/intelfbdrv.c @@ -934,7 +934,7 @@ static __inline__ int var_to_refresh(const struct fb_var_screeninfo *var) } /*************************************************************** - * Various intialisation functions * + * Various initialisation functions * ***************************************************************/ static void get_initial_mode(struct intelfb_info *dinfo) diff --git a/fs/inode.c b/fs/inode.c index 131b2bcebc48..6ad1edb52045 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -119,7 +119,7 @@ static int no_open(struct inode *inode, struct file *file) } /** - * inode_init_always - perform inode structure intialisation + * inode_init_always - perform inode structure initialisation * @sb: superblock inode belongs to * @inode: inode to initialise * diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 4a98762ec8b4..cd0b077deb35 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3796,7 +3796,7 @@ xlog_recover_bud_pass2( * This routine is called when an inode create format structure is found in a * committed transaction in the log. It's purpose is to initialise the inodes * being allocated on disk. This requires us to get inode cluster buffers that - * match the range to be intialised, stamped with inode templates and written + * match the range to be initialised, stamped with inode templates and written * by delayed write so that subsequent modifications will hit the cached buffer * and only need writing out at the end of recovery. */ diff --git a/scripts/spelling.txt b/scripts/spelling.txt index 7b6d25a86202..aeca2c25de32 100644 --- a/scripts/spelling.txt +++ b/scripts/spelling.txt @@ -605,6 +605,9 @@ interruptted||interrupted interupted||interrupted interupt||interrupt intial||initial +intialisation||initialisation +intialised||initialised +intialise||initialise intialization||initialization intialized||initialized intialize||initialize -- cgit From 8ac1ed791401790968fd00ca63ca4fa814677199 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 8 May 2017 15:57:56 -0700 Subject: treewide: spelling: correct diffrent[iate] and banlance typos Add these misspellings to scripts/spelling.txt too Link: http://lkml.kernel.org/r/962aace119675e5fe87be2a88ddac1a5486f8e60.1490931810.git.joe@perches.com Signed-off-by: Joe Perches Acked-by: Mauro Carvalho Chehab Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/media/dvb-frontends/drx39xyj/drx_dap_fasi.h | 2 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 2 +- include/linux/mlx4/device.h | 2 +- scripts/spelling.txt | 3 +++ 7 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/media/dvb-frontends/drx39xyj/drx_dap_fasi.h b/drivers/media/dvb-frontends/drx39xyj/drx_dap_fasi.h index 354ec07eae87..23ae72468025 100644 --- a/drivers/media/dvb-frontends/drx39xyj/drx_dap_fasi.h +++ b/drivers/media/dvb-frontends/drx39xyj/drx_dap_fasi.h @@ -70,7 +70,7 @@ * (3) both long and short but short preferred and long only when necesarry * * These modes must be selected compile time via compile switches. -* Compile switch settings for the diffrent modes: +* Compile switch settings for the different modes: * (1) DRXDAPFASI_LONG_ADDR_ALLOWED=0, DRXDAPFASI_SHORT_ADDR_ALLOWED=1 * (2) DRXDAPFASI_LONG_ADDR_ALLOWED=1, DRXDAPFASI_SHORT_ADDR_ALLOWED=0 * (3) DRXDAPFASI_LONG_ADDR_ALLOWED=1, DRXDAPFASI_SHORT_ADDR_ALLOWED=1 diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index cea6bdcde33f..8baf9d3eb4b1 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -1591,7 +1591,7 @@ static int __bnx2x_vlan_mac_execute_step(struct bnx2x *bp, if (rc != 0) { __bnx2x_vlan_mac_h_pend(bp, o, *ramrod_flags); - /* Calling function should not diffrentiate between this case + /* Calling function should not differentiate between this case * and the case in which there is already a pending ramrod */ rc = 1; diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 0ed24d6e6c65..40f057edeafc 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -3058,7 +3058,7 @@ int qed_int_igu_read_cam(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) /* There's a possibility the igu_sb_cnt_iov doesn't properly reflect * the number of VF SBs [especially for first VF on engine, as we can't - * diffrentiate between empty entries and its entries]. + * differentiate between empty entries and its entries]. * Since we don't really support more SBs than VFs today, prevent any * such configuration by sanitizing the number of SBs to equal the * number of VFs. diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index b7ad36b91e12..c67ff1411799 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -978,7 +978,7 @@ static int qed_slowpath_start(struct qed_dev *cdev, if (rc) goto err2; - /* First Dword used to diffrentiate between various sources */ + /* First Dword used to differentiate between various sources */ data = cdev->firmware->data + sizeof(u32); qed_dbg_pf_init(cdev); diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c index d5df29f787c5..f5ed54d611ec 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -625,7 +625,7 @@ int qed_iov_hw_info(struct qed_hwfn *p_hwfn) * - If !ARI, VFs would start on next device. * so offset - (256 - pf_id) would provide the number. * Utilize the fact that (256 - pf_id) is achieved only by later - * to diffrentiate between the two. + * to differentiate between the two. */ if (p_hwfn->cdev->p_iov_info->offset < (256 - p_hwfn->abs_pf_id)) { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 74b765ce48ab..d5bed0875d30 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -108,7 +108,7 @@ enum { MLX4_MFUNC_EQE_MASK = (MLX4_MFUNC_MAX_EQES - 1) }; -/* Driver supports 3 diffrent device methods to manage traffic steering: +/* Driver supports 3 different device methods to manage traffic steering: * -device managed - High level API for ib and eth flow steering. FW is * managing flow steering tables. * - B0 steering mode - Common low level API for ib and (if supported) eth. diff --git a/scripts/spelling.txt b/scripts/spelling.txt index aeca2c25de32..eb38f49d4b75 100644 --- a/scripts/spelling.txt +++ b/scripts/spelling.txt @@ -179,6 +179,7 @@ bakup||backup baloon||balloon baloons||balloons bandwith||bandwidth +banlance||balance batery||battery beacuse||because becasue||because @@ -375,6 +376,8 @@ dictionnary||dictionary didnt||didn't diferent||different differrence||difference +diffrent||different +diffrentiate||differentiate difinition||definition diplay||display direectly||directly -- cgit From 299878bac326c890699c696ebba26f56fe93fc75 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:57:59 -0700 Subject: treewide: move set_memory_* functions away from cacheflush.h Patch series "set_memory_* functions header refactor", v3. The set_memory_* APIs came out of a desire to have a better way to change memory attributes. Many of these attributes were linked to cache functionality so the prototypes were put in cacheflush.h. These days, the APIs have grown and have a much wider use than just cache APIs. To support this growth, split off set_memory_* and friends into a separate header file to avoid growing cacheflush.h for APIs that have nothing to do with caches. Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Russell King Acked-by: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm/include/asm/cacheflush.h | 21 +-------- arch/arm/include/asm/set_memory.h | 32 ++++++++++++++ arch/arm64/include/asm/Kbuild | 1 + arch/arm64/include/asm/cacheflush.h | 5 +-- arch/s390/include/asm/cacheflush.h | 28 +----------- arch/s390/include/asm/set_memory.h | 31 +++++++++++++ arch/x86/include/asm/cacheflush.h | 86 +----------------------------------- arch/x86/include/asm/set_memory.h | 87 +++++++++++++++++++++++++++++++++++++ include/asm-generic/set_memory.h | 12 +++++ 9 files changed, 167 insertions(+), 136 deletions(-) create mode 100644 arch/arm/include/asm/set_memory.h create mode 100644 arch/s390/include/asm/set_memory.h create mode 100644 arch/x86/include/asm/set_memory.h create mode 100644 include/asm-generic/set_memory.h diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 02454fa15d2c..1cb9d118bb16 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -16,6 +16,7 @@ #include #include #include +#include #define CACHE_COLOUR(vaddr) ((vaddr & (SHMLBA - 1)) >> PAGE_SHIFT) @@ -478,26 +479,6 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size) : : : "r0","r1","r2","r3","r4","r5","r6","r7", \ "r9","r10","lr","memory" ) -#ifdef CONFIG_MMU -int set_memory_ro(unsigned long addr, int numpages); -int set_memory_rw(unsigned long addr, int numpages); -int set_memory_x(unsigned long addr, int numpages); -int set_memory_nx(unsigned long addr, int numpages); -#else -static inline int set_memory_ro(unsigned long addr, int numpages) { return 0; } -static inline int set_memory_rw(unsigned long addr, int numpages) { return 0; } -static inline int set_memory_x(unsigned long addr, int numpages) { return 0; } -static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } -#endif - -#ifdef CONFIG_STRICT_KERNEL_RWX -void set_kernel_text_rw(void); -void set_kernel_text_ro(void); -#else -static inline void set_kernel_text_rw(void) { } -static inline void set_kernel_text_ro(void) { } -#endif - void flush_uprobe_xol_access(struct page *page, unsigned long uaddr, void *kaddr, unsigned long len); diff --git a/arch/arm/include/asm/set_memory.h b/arch/arm/include/asm/set_memory.h new file mode 100644 index 000000000000..5aa4315abe91 --- /dev/null +++ b/arch/arm/include/asm/set_memory.h @@ -0,0 +1,32 @@ +/* + * Copyright (C) 1999-2002 Russell King + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _ASMARM_SET_MEMORY_H +#define _ASMARM_SET_MEMORY_H + +#ifdef CONFIG_MMU +int set_memory_ro(unsigned long addr, int numpages); +int set_memory_rw(unsigned long addr, int numpages); +int set_memory_x(unsigned long addr, int numpages); +int set_memory_nx(unsigned long addr, int numpages); +#else +static inline int set_memory_ro(unsigned long addr, int numpages) { return 0; } +static inline int set_memory_rw(unsigned long addr, int numpages) { return 0; } +static inline int set_memory_x(unsigned long addr, int numpages) { return 0; } +static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } +#endif + +#ifdef CONFIG_STRICT_KERNEL_RWX +void set_kernel_text_rw(void); +void set_kernel_text_ro(void); +#else +static inline void set_kernel_text_rw(void) { } +static inline void set_kernel_text_ro(void) { } +#endif + +#endif diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index a12f1afc95a3..a7a97a608033 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -29,6 +29,7 @@ generic-y += rwsem.h generic-y += segment.h generic-y += sembuf.h generic-y += serial.h +generic-y += set_memory.h generic-y += shmbuf.h generic-y += simd.h generic-y += sizes.h diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index 728f933cef8c..0927f47607e2 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -20,6 +20,7 @@ #define __ASM_CACHEFLUSH_H #include +#include /* * This flag is used to indicate that the page pointed to by a pte is clean @@ -150,10 +151,6 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end) { } -int set_memory_ro(unsigned long addr, int numpages); -int set_memory_rw(unsigned long addr, int numpages); -int set_memory_x(unsigned long addr, int numpages); -int set_memory_nx(unsigned long addr, int numpages); int set_memory_valid(unsigned long addr, unsigned long size, int enable); #endif diff --git a/arch/s390/include/asm/cacheflush.h b/arch/s390/include/asm/cacheflush.h index 0499334f9473..afe296515f76 100644 --- a/arch/s390/include/asm/cacheflush.h +++ b/arch/s390/include/asm/cacheflush.h @@ -3,32 +3,6 @@ /* Caches aren't brain-dead on the s390. */ #include - -#define SET_MEMORY_RO 1UL -#define SET_MEMORY_RW 2UL -#define SET_MEMORY_NX 4UL -#define SET_MEMORY_X 8UL - -int __set_memory(unsigned long addr, int numpages, unsigned long flags); - -static inline int set_memory_ro(unsigned long addr, int numpages) -{ - return __set_memory(addr, numpages, SET_MEMORY_RO); -} - -static inline int set_memory_rw(unsigned long addr, int numpages) -{ - return __set_memory(addr, numpages, SET_MEMORY_RW); -} - -static inline int set_memory_nx(unsigned long addr, int numpages) -{ - return __set_memory(addr, numpages, SET_MEMORY_NX); -} - -static inline int set_memory_x(unsigned long addr, int numpages) -{ - return __set_memory(addr, numpages, SET_MEMORY_X); -} +#include #endif /* _S390_CACHEFLUSH_H */ diff --git a/arch/s390/include/asm/set_memory.h b/arch/s390/include/asm/set_memory.h new file mode 100644 index 000000000000..46a4db44c47a --- /dev/null +++ b/arch/s390/include/asm/set_memory.h @@ -0,0 +1,31 @@ +#ifndef _ASMS390_SET_MEMORY_H +#define _ASMS390_SET_MEMORY_H + +#define SET_MEMORY_RO 1UL +#define SET_MEMORY_RW 2UL +#define SET_MEMORY_NX 4UL +#define SET_MEMORY_X 8UL + +int __set_memory(unsigned long addr, int numpages, unsigned long flags); + +static inline int set_memory_ro(unsigned long addr, int numpages) +{ + return __set_memory(addr, numpages, SET_MEMORY_RO); +} + +static inline int set_memory_rw(unsigned long addr, int numpages) +{ + return __set_memory(addr, numpages, SET_MEMORY_RW); +} + +static inline int set_memory_nx(unsigned long addr, int numpages) +{ + return __set_memory(addr, numpages, SET_MEMORY_NX); +} + +static inline int set_memory_x(unsigned long addr, int numpages) +{ + return __set_memory(addr, numpages, SET_MEMORY_X); +} + +#endif diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h index e7e1942edff7..3d7db6f35aeb 100644 --- a/arch/x86/include/asm/cacheflush.h +++ b/arch/x86/include/asm/cacheflush.h @@ -4,94 +4,10 @@ /* Caches aren't brain-dead on the intel. */ #include #include - -/* - * The set_memory_* API can be used to change various attributes of a virtual - * address range. The attributes include: - * Cachability : UnCached, WriteCombining, WriteThrough, WriteBack - * Executability : eXeutable, NoteXecutable - * Read/Write : ReadOnly, ReadWrite - * Presence : NotPresent - * - * Within a category, the attributes are mutually exclusive. - * - * The implementation of this API will take care of various aspects that - * are associated with changing such attributes, such as: - * - Flushing TLBs - * - Flushing CPU caches - * - Making sure aliases of the memory behind the mapping don't violate - * coherency rules as defined by the CPU in the system. - * - * What this API does not do: - * - Provide exclusion between various callers - including callers that - * operation on other mappings of the same physical page - * - Restore default attributes when a page is freed - * - Guarantee that mappings other than the requested one are - * in any state, other than that these do not violate rules for - * the CPU you have. Do not depend on any effects on other mappings, - * CPUs other than the one you have may have more relaxed rules. - * The caller is required to take care of these. - */ - -int _set_memory_uc(unsigned long addr, int numpages); -int _set_memory_wc(unsigned long addr, int numpages); -int _set_memory_wt(unsigned long addr, int numpages); -int _set_memory_wb(unsigned long addr, int numpages); -int set_memory_uc(unsigned long addr, int numpages); -int set_memory_wc(unsigned long addr, int numpages); -int set_memory_wt(unsigned long addr, int numpages); -int set_memory_wb(unsigned long addr, int numpages); -int set_memory_x(unsigned long addr, int numpages); -int set_memory_nx(unsigned long addr, int numpages); -int set_memory_ro(unsigned long addr, int numpages); -int set_memory_rw(unsigned long addr, int numpages); -int set_memory_np(unsigned long addr, int numpages); -int set_memory_4k(unsigned long addr, int numpages); - -int set_memory_array_uc(unsigned long *addr, int addrinarray); -int set_memory_array_wc(unsigned long *addr, int addrinarray); -int set_memory_array_wt(unsigned long *addr, int addrinarray); -int set_memory_array_wb(unsigned long *addr, int addrinarray); - -int set_pages_array_uc(struct page **pages, int addrinarray); -int set_pages_array_wc(struct page **pages, int addrinarray); -int set_pages_array_wt(struct page **pages, int addrinarray); -int set_pages_array_wb(struct page **pages, int addrinarray); - -/* - * For legacy compatibility with the old APIs, a few functions - * are provided that work on a "struct page". - * These functions operate ONLY on the 1:1 kernel mapping of the - * memory that the struct page represents, and internally just - * call the set_memory_* function. See the description of the - * set_memory_* function for more details on conventions. - * - * These APIs should be considered *deprecated* and are likely going to - * be removed in the future. - * The reason for this is the implicit operation on the 1:1 mapping only, - * making this not a generally useful API. - * - * Specifically, many users of the old APIs had a virtual address, - * called virt_to_page() or vmalloc_to_page() on that address to - * get a struct page* that the old API required. - * To convert these cases, use set_memory_*() on the original - * virtual address, do not use these functions. - */ - -int set_pages_uc(struct page *page, int numpages); -int set_pages_wb(struct page *page, int numpages); -int set_pages_x(struct page *page, int numpages); -int set_pages_nx(struct page *page, int numpages); -int set_pages_ro(struct page *page, int numpages); -int set_pages_rw(struct page *page, int numpages); - +#include void clflush_cache_range(void *addr, unsigned int size); #define mmio_flush_range(addr, size) clflush_cache_range(addr, size) -extern int kernel_set_to_readonly; -void set_kernel_text_rw(void); -void set_kernel_text_ro(void); - #endif /* _ASM_X86_CACHEFLUSH_H */ diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h new file mode 100644 index 000000000000..eaec6c364e42 --- /dev/null +++ b/arch/x86/include/asm/set_memory.h @@ -0,0 +1,87 @@ +#ifndef _ASM_X86_SET_MEMORY_H +#define _ASM_X86_SET_MEMORY_H + +#include +#include + +/* + * The set_memory_* API can be used to change various attributes of a virtual + * address range. The attributes include: + * Cachability : UnCached, WriteCombining, WriteThrough, WriteBack + * Executability : eXeutable, NoteXecutable + * Read/Write : ReadOnly, ReadWrite + * Presence : NotPresent + * + * Within a category, the attributes are mutually exclusive. + * + * The implementation of this API will take care of various aspects that + * are associated with changing such attributes, such as: + * - Flushing TLBs + * - Flushing CPU caches + * - Making sure aliases of the memory behind the mapping don't violate + * coherency rules as defined by the CPU in the system. + * + * What this API does not do: + * - Provide exclusion between various callers - including callers that + * operation on other mappings of the same physical page + * - Restore default attributes when a page is freed + * - Guarantee that mappings other than the requested one are + * in any state, other than that these do not violate rules for + * the CPU you have. Do not depend on any effects on other mappings, + * CPUs other than the one you have may have more relaxed rules. + * The caller is required to take care of these. + */ + +int _set_memory_uc(unsigned long addr, int numpages); +int _set_memory_wc(unsigned long addr, int numpages); +int _set_memory_wt(unsigned long addr, int numpages); +int _set_memory_wb(unsigned long addr, int numpages); +int set_memory_uc(unsigned long addr, int numpages); +int set_memory_wc(unsigned long addr, int numpages); +int set_memory_wt(unsigned long addr, int numpages); +int set_memory_wb(unsigned long addr, int numpages); +int set_memory_np(unsigned long addr, int numpages); +int set_memory_4k(unsigned long addr, int numpages); + +int set_memory_array_uc(unsigned long *addr, int addrinarray); +int set_memory_array_wc(unsigned long *addr, int addrinarray); +int set_memory_array_wt(unsigned long *addr, int addrinarray); +int set_memory_array_wb(unsigned long *addr, int addrinarray); + +int set_pages_array_uc(struct page **pages, int addrinarray); +int set_pages_array_wc(struct page **pages, int addrinarray); +int set_pages_array_wt(struct page **pages, int addrinarray); +int set_pages_array_wb(struct page **pages, int addrinarray); + +/* + * For legacy compatibility with the old APIs, a few functions + * are provided that work on a "struct page". + * These functions operate ONLY on the 1:1 kernel mapping of the + * memory that the struct page represents, and internally just + * call the set_memory_* function. See the description of the + * set_memory_* function for more details on conventions. + * + * These APIs should be considered *deprecated* and are likely going to + * be removed in the future. + * The reason for this is the implicit operation on the 1:1 mapping only, + * making this not a generally useful API. + * + * Specifically, many users of the old APIs had a virtual address, + * called virt_to_page() or vmalloc_to_page() on that address to + * get a struct page* that the old API required. + * To convert these cases, use set_memory_*() on the original + * virtual address, do not use these functions. + */ + +int set_pages_uc(struct page *page, int numpages); +int set_pages_wb(struct page *page, int numpages); +int set_pages_x(struct page *page, int numpages); +int set_pages_nx(struct page *page, int numpages); +int set_pages_ro(struct page *page, int numpages); +int set_pages_rw(struct page *page, int numpages); + +extern int kernel_set_to_readonly; +void set_kernel_text_rw(void); +void set_kernel_text_ro(void); + +#endif /* _ASM_X86_SET_MEMORY_H */ diff --git a/include/asm-generic/set_memory.h b/include/asm-generic/set_memory.h new file mode 100644 index 000000000000..83e81f8996b2 --- /dev/null +++ b/include/asm-generic/set_memory.h @@ -0,0 +1,12 @@ +#ifndef __ASM_SET_MEMORY_H +#define __ASM_SET_MEMORY_H + +/* + * Functions to change memory attributes. + */ +int set_memory_ro(unsigned long addr, int numpages); +int set_memory_rw(unsigned long addr, int numpages); +int set_memory_x(unsigned long addr, int numpages); +int set_memory_nx(unsigned long addr, int numpages); + +#endif -- cgit From 74d86a70636a0a5eec76efcff24bee9681e01804 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:02 -0700 Subject: arm: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly Link: http://lkml.kernel.org/r/1488920133-27229-3-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Russell King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm/kernel/ftrace.c | 1 + arch/arm/kernel/machine_kexec.c | 1 + arch/arm/mm/pageattr.c | 1 + arch/arm/net/bpf_jit_32.c | 1 + 4 files changed, 4 insertions(+) diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c index 3f1759411d51..dea3e965fe88 100644 --- a/arch/arm/kernel/ftrace.c +++ b/arch/arm/kernel/ftrace.c @@ -21,6 +21,7 @@ #include #include #include +#include #ifdef CONFIG_THUMB2_KERNEL #define NOP 0xf85deb04 /* pop.w {lr} */ diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index b18c1ea56bed..15495887ca14 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -18,6 +18,7 @@ #include #include #include +#include extern void relocate_new_kernel(void); extern const unsigned int relocate_new_kernel_size; diff --git a/arch/arm/mm/pageattr.c b/arch/arm/mm/pageattr.c index 3b69f2642513..1403cb4a0c3d 100644 --- a/arch/arm/mm/pageattr.c +++ b/arch/arm/mm/pageattr.c @@ -15,6 +15,7 @@ #include #include +#include struct page_change_data { pgprot_t set_mask; diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 93d0b6d0b63e..d5b9fa19b684 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -18,6 +18,7 @@ #include #include +#include #include #include -- cgit From d4bbc30bb059ec6bbb17edd3d3f98c5edeee7494 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:05 -0700 Subject: arm64: use set_memory.h header The set_memory_* functions have moved to set_memory.h. Use that header explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-4-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Catalin Marinas Acked-by: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm64/mm/pageattr.c | 1 + arch/arm64/net/bpf_jit_comp.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 3212ee0558f6..a682a0a2a0fa 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -17,6 +17,7 @@ #include #include +#include #include struct page_change_data { diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index d68abde52740..c6e53580aefe 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "bpf_jit.h" -- cgit From e6c7c63001920a57f23c8f5d6f652bfc4bea327b Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:08 -0700 Subject: s390: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/s390/kernel/ftrace.c | 1 + arch/s390/kernel/kprobes.c | 2 +- arch/s390/kernel/machine_kexec.c | 1 + arch/s390/mm/init.c | 1 + arch/s390/mm/pageattr.c | 1 + arch/s390/mm/vmem.c | 1 + arch/s390/net/bpf_jit_comp.c | 1 + 7 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index 60a8a4e207ed..27477f34cc0a 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "entry.h" /* diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c index 76f9eda1d7c0..3d6a99746454 100644 --- a/arch/s390/kernel/kprobes.c +++ b/arch/s390/kernel/kprobes.c @@ -31,7 +31,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c index db5658daf994..49a6bd45957b 100644 --- a/arch/s390/kernel/machine_kexec.c +++ b/arch/s390/kernel/machine_kexec.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index ee5066718b21..ee6a1d3d4983 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -39,6 +39,7 @@ #include #include #include +#include pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir); diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index fc321c5ec30e..49e721f3645e 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -8,6 +8,7 @@ #include #include #include +#include static inline unsigned long sske_frame(unsigned long addr, unsigned char skey) { diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 60d38993f232..c33c94b4be60 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -17,6 +17,7 @@ #include #include #include +#include static DEFINE_MUTEX(vmem_mutex); diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 4ecf6d687509..6e97a2e3fd8d 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "bpf_jit.h" int bpf_jit_enable __read_mostly; -- cgit From d11636511ed97ceda66a08ecff99f100e1107b76 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:11 -0700 Subject: x86: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/kernel/amd_gart_64.c | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kernel/ftrace.c | 2 +- arch/x86/kernel/machine_kexec_32.c | 2 +- arch/x86/kernel/machine_kexec_64.c | 1 + arch/x86/mm/init.c | 2 +- arch/x86/mm/init_32.c | 2 +- arch/x86/mm/init_64.c | 2 +- arch/x86/mm/ioremap.c | 2 +- arch/x86/mm/pageattr.c | 1 + arch/x86/net/bpf_jit_comp.c | 1 + arch/x86/pci/pcbios.c | 2 +- arch/x86/platform/efi/efi.c | 2 +- arch/x86/realmode/init.c | 2 +- 15 files changed, 15 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index df083efe6ee0..815dd63f49d0 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -36,7 +36,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c36140d788fe..ee8f11800295 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -16,7 +16,7 @@ #ifdef CONFIG_X86_64 # include -# include +# include #endif #include "cpu.h" diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index a44ef52184df..0af86d9242da 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include void __init check_bugs(void) { diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 8ee76dce9140..0651e974dcb3 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -24,7 +24,7 @@ #include -#include +#include #include #include #include diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 5f43cec296c5..8c53c5d7a1bc 100644 --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -23,7 +23,7 @@ #include #include #include -#include +#include #include static void set_idt(void *newidt, __u16 limit) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 085c3b300d32..ce640428d6fe 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -27,6 +27,7 @@ #include #include #include +#include #ifdef CONFIG_KEXEC_FILE static struct kexec_file_ops *kexec_file_loaders[] = { diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 138bad2fb6bc..cbc87ea98751 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -5,7 +5,7 @@ #include #include /* for max_low_pfn */ -#include +#include #include #include #include diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index f34d275ee201..99fb83819a5f 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -48,7 +48,7 @@ #include #include #include -#include +#include #include #include diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 745e5e183169..41270b96403d 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -50,7 +50,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index e4f7b25df18e..bbc558b88a88 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -14,7 +14,7 @@ #include #include -#include +#include #include #include #include diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 56b22fa504df..1dcd2be4cce4 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -24,6 +24,7 @@ #include #include #include +#include /* * The current flushing context - we pass it instead of 5 arguments: diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 14f840df1d95..f58939393eef 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include #include #include +#include #include int bpf_jit_enable __read_mostly; diff --git a/arch/x86/pci/pcbios.c b/arch/x86/pci/pcbios.c index 29e9ba6ace9d..c1bdb9edcae7 100644 --- a/arch/x86/pci/pcbios.c +++ b/arch/x86/pci/pcbios.c @@ -11,7 +11,7 @@ #include #include #include -#include +#include /* BIOS32 signature: "_32_" */ #define BIOS32_SIGNATURE (('_' << 0) + ('3' << 8) + ('2' << 16) + ('_' << 24)) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index a15cf815ac4e..7e76a4d8304b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -49,7 +49,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 5db706f14111..a163a90af4aa 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -2,7 +2,7 @@ #include #include -#include +#include #include #include #include -- cgit From e47036b45a3f02d35648d4683b9e26f26a60e231 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:14 -0700 Subject: agp: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-7-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/agp/amd-k7-agp.c | 1 + drivers/char/agp/ati-agp.c | 1 + drivers/char/agp/generic.c | 4 +++- drivers/char/agp/intel-gtt.c | 1 + drivers/char/agp/sworks-agp.c | 1 + 5 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c index 3661a51e93e2..5fbd333e4c6d 100644 --- a/drivers/char/agp/amd-k7-agp.c +++ b/drivers/char/agp/amd-k7-agp.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "agp.h" #define AMD_MMBASE_BAR 1 diff --git a/drivers/char/agp/ati-agp.c b/drivers/char/agp/ati-agp.c index 75a9786a77e6..0b5ec7af2414 100644 --- a/drivers/char/agp/ati-agp.c +++ b/drivers/char/agp/ati-agp.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "agp.h" #define ATI_GART_MMBASE_BAR 1 diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index bdf418cac8ef..658664a5a5aa 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -39,7 +39,9 @@ #include #include #include -#include +#ifdef CONFIG_X86 +#include +#endif #include #include "agp.h" diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 7fcc2a9d1d5a..9b6b6023193b 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -25,6 +25,7 @@ #include "agp.h" #include "intel-agp.h" #include +#include /* * If we have Intel graphics, we're not going to have anything other than diff --git a/drivers/char/agp/sworks-agp.c b/drivers/char/agp/sworks-agp.c index 9b163b49d976..03be4ac79b0d 100644 --- a/drivers/char/agp/sworks-agp.c +++ b/drivers/char/agp/sworks-agp.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "agp.h" #define SVWRKS_COMMAND 0x04 -- cgit From ed3ba07946631f5c3a091fb37b018f7570f242b1 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:17 -0700 Subject: drm: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. [akpm@linux-foundation.org: track drivers/gpu/drm/i915/i915_gem_gtt.c linux-next changes] Link: http://lkml.kernel.org/r/1488920133-27229-8-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +++ drivers/gpu/drm/gma500/gtt.c | 1 + drivers/gpu/drm/gma500/psb_drv.c | 1 + drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ drivers/gpu/drm/radeon/radeon_gart.c | 3 +++ drivers/gpu/drm/ttm/ttm_page_alloc.c | 3 +++ drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 3 +++ drivers/gpu/drm/ttm/ttm_tt.c | 3 +++ 8 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 6d691abe889c..2ee327d69775 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -27,6 +27,9 @@ */ #include #include +#ifdef CONFIG_X86 +#include +#endif #include "amdgpu.h" /* diff --git a/drivers/gpu/drm/gma500/gtt.c b/drivers/gpu/drm/gma500/gtt.c index 3f4f424196b2..3949b0990916 100644 --- a/drivers/gpu/drm/gma500/gtt.c +++ b/drivers/gpu/drm/gma500/gtt.c @@ -21,6 +21,7 @@ #include #include +#include #include "psb_drv.h" #include "blitter.h" diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c index 5ee93ff55608..1f9b35afefee 100644 --- a/drivers/gpu/drm/gma500/psb_drv.c +++ b/drivers/gpu/drm/gma500/psb_drv.c @@ -35,6 +35,7 @@ #include #include #include +#include static struct drm_driver driver; static int psb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 8bab4aea63e6..2aa6b97fd22f 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -31,6 +31,8 @@ #include #include +#include + #include #include diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c4777c8d0312..0b3ec35515f3 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -27,6 +27,9 @@ */ #include #include +#ifdef CONFIG_X86 +#include +#endif #include "radeon.h" /* diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index a37de5db5731..eeddc1e48409 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -51,6 +51,9 @@ #if IS_ENABLED(CONFIG_AGP) #include #endif +#ifdef CONFIG_X86 +#include +#endif #define NUM_PAGES_TO_ALLOC (PAGE_SIZE/sizeof(struct page *)) #define SMALL_ALLOCATION 16 diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index cec4b4baa179..90ddbdca93bd 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -53,6 +53,9 @@ #if IS_ENABLED(CONFIG_AGP) #include #endif +#ifdef CONFIG_X86 +#include +#endif #define NUM_PAGES_TO_ALLOC (PAGE_SIZE/sizeof(struct page *)) #define SMALL_ALLOCATION 4 diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index aee3c00f836e..5260179d788a 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -44,6 +44,9 @@ #include #include #include +#ifdef CONFIG_X86 +#include +#endif /** * Allocates storage for pointers to the pages that back the ttm. -- cgit From 0c14dac9a411d20ca59e3b39724c1393f27635bc Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:20 -0700 Subject: drivers/hwtracing/intel_th/msu.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-9-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Alexander Shishkin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/hwtracing/intel_th/msu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c index e88afe1a435c..dbbe31df74df 100644 --- a/drivers/hwtracing/intel_th/msu.c +++ b/drivers/hwtracing/intel_th/msu.c @@ -27,7 +27,9 @@ #include #include -#include +#ifdef CONFIG_X86 +#include +#endif #include "intel_th.h" #include "msu.h" -- cgit From 23f19a563b2f15d9bf391cb9d3c1829bee56a44b Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:23 -0700 Subject: drivers/watchdog/hpwdt.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-10-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Guenter Roeck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/watchdog/hpwdt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c index 70c7194e2810..67fbe35ce7cf 100644 --- a/drivers/watchdog/hpwdt.c +++ b/drivers/watchdog/hpwdt.c @@ -34,7 +34,7 @@ #include #include #include -#include +#include #endif /* CONFIG_HPWDT_NMI_DECODING */ #include #include -- cgit From 2d0bde57f3527ffac9279b4c8ba61060ba395b1a Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:26 -0700 Subject: include/linux/filter.h: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-11-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Daniel Borkmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/filter.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 9a7786db14fa..56197f82af45 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -19,7 +19,9 @@ #include -#include +#ifdef CONFIG_ARCH_HAS_SET_MEMORY +#include +#endif #include #include -- cgit From bbca07c307166a753155e79a874d07023f4edd20 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:29 -0700 Subject: kernel/module.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-12-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Jessica Yu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/module.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index 2b316b954828..4a3665f8f837 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -49,6 +49,9 @@ #include #include #include +#ifdef CONFIG_STRICT_MODULE_RWX +#include +#endif #include #include #include -- cgit From 50327ddfbc926e68da1958e4fac51f1106f5e730 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:32 -0700 Subject: kernel/power/snapshot.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-13-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/power/snapshot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index d79a38de425a..3b1e0f3ad07f 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -36,6 +36,9 @@ #include #include #include +#ifdef CONFIG_STRICT_KERNEL_RWX +#include +#endif #include "power.h" -- cgit From 7f80f513588dac3d9cbf886a3f2f6354d821695e Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:35 -0700 Subject: alsa: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-14-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Takashi Iwai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- sound/pci/hda/hda_intel.c | 2 +- sound/pci/intel8x0.c | 4 +++- sound/x86/intel_hdmi_audio.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index b786fbab029f..1770f085c2a6 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -53,7 +53,7 @@ #ifdef CONFIG_X86 /* for snoop control */ #include -#include +#include #include #endif #include diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 9720a30dbfff..6d17b171c17b 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -40,7 +40,9 @@ #include /* for 440MX workaround */ #include -#include +#ifdef CONFIG_X86 +#include +#endif MODULE_AUTHOR("Jaroslav Kysela "); MODULE_DESCRIPTION("Intel 82801AA,82901AB,i810,i820,i830,i840,i845,MX440; SiS 7012; Ali 5455"); diff --git a/sound/x86/intel_hdmi_audio.c b/sound/x86/intel_hdmi_audio.c index c505b019e09c..664b7fe206d6 100644 --- a/sound/x86/intel_hdmi_audio.c +++ b/sound/x86/intel_hdmi_audio.c @@ -30,7 +30,7 @@ #include #include #include -#include +#include #include #include #include -- cgit From 056d16b2141d66611437fa12c2016c1c99cd7e3d Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:38 -0700 Subject: drivers/misc/sram-exec.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-15-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/misc/sram-exec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/misc/sram-exec.c b/drivers/misc/sram-exec.c index ac522417c462..3d528a13b8fc 100644 --- a/drivers/misc/sram-exec.c +++ b/drivers/misc/sram-exec.c @@ -16,9 +16,10 @@ #include #include +#include #include -#include +#include #include "sram.h" -- cgit From 8d5a1181f32f555f1182ba4da56d9db0ce2c60c9 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:41 -0700 Subject: drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-16-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Acked-by: Bartlomiej Zolnierkiewicz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/fbdev/vermilion/vermilion.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/vermilion/vermilion.c b/drivers/video/fbdev/vermilion/vermilion.c index 1c1e95a0b8fa..ce4c4729a5e8 100644 --- a/drivers/video/fbdev/vermilion/vermilion.c +++ b/drivers/video/fbdev/vermilion/vermilion.c @@ -37,7 +37,7 @@ #include #include #include -#include +#include #include #include -- cgit From 880d5a36efe7479287c36194bffdcd6d7a183bab Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 8 May 2017 15:58:44 -0700 Subject: drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h Cc: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c | 3 ++- drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_dynamic_pool.c | 2 +- drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_reserved_pool.c | 3 ++- drivers/staging/media/atomisp/pci/atomisp2/mmu/isp_mmu.c | 5 ++++- 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c index 40ac3582fb7a..11162f595fc7 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c @@ -36,12 +36,13 @@ #include #include #include -#include #include #include #include #include +#include + #include "atomisp_internal.h" #include "hmm/hmm_common.h" #include "hmm/hmm_pool.h" diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_dynamic_pool.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_dynamic_pool.c index 639b8cdf7a5e..19e0e9ee37de 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_dynamic_pool.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_dynamic_pool.c @@ -27,7 +27,7 @@ #include #include -#include "asm/cacheflush.h" +#include #include "atomisp_internal.h" diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_reserved_pool.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_reserved_pool.c index 4000c05652e1..bf6586805f7f 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_reserved_pool.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_reserved_pool.c @@ -27,7 +27,8 @@ #include #include -#include "asm/cacheflush.h" +#include + #include "atomisp_internal.h" #include "hmm/hmm_pool.h" diff --git a/drivers/staging/media/atomisp/pci/atomisp2/mmu/isp_mmu.c b/drivers/staging/media/atomisp/pci/atomisp2/mmu/isp_mmu.c index 2009e3a11b86..706bd43e8b1b 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/mmu/isp_mmu.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/mmu/isp_mmu.c @@ -30,13 +30,16 @@ #include /* for kmalloc */ #include #include -#include #include #include #include #include #include +#ifdef CONFIG_X86 +#include +#endif + #include "atomisp_internal.h" #include "mmu/isp_mmu.h" -- cgit From e6ccbff0e90cf4bf012bf369dbdaf84c6faaedaa Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 15:58:47 -0700 Subject: treewide: decouple cacheflush.h and set_memory.h Now that all call sites, completely decouple cacheflush.h and set_memory.h [sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling] Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott Signed-off-by: Stephen Rothwell Acked-by: Catalin Marinas Acked-by: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm/include/asm/cacheflush.h | 1 - arch/arm64/include/asm/cacheflush.h | 1 - arch/s390/include/asm/Kbuild | 1 + arch/s390/include/asm/cacheflush.h | 8 -------- arch/x86/include/asm/cacheflush.h | 1 - arch/x86/kernel/kprobes/core.c | 1 + arch/x86/kernel/kprobes/opt.c | 1 + 7 files changed, 3 insertions(+), 11 deletions(-) delete mode 100644 arch/s390/include/asm/cacheflush.h diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 1cb9d118bb16..d69bebf697e7 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -16,7 +16,6 @@ #include #include #include -#include #define CACHE_COLOUR(vaddr) ((vaddr & (SHMLBA - 1)) >> PAGE_SHIFT) diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index 0927f47607e2..d74a284abdc2 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -20,7 +20,6 @@ #define __ASM_CACHEFLUSH_H #include -#include /* * This flag is used to indicate that the page pointed to by a pte is clean diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 7e3481eb2174..45092b12f54f 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -1,4 +1,5 @@ generic-y += asm-offsets.h +generic-y += cacheflush.h generic-y += clkdev.h generic-y += dma-contiguous.h generic-y += div64.h diff --git a/arch/s390/include/asm/cacheflush.h b/arch/s390/include/asm/cacheflush.h deleted file mode 100644 index afe296515f76..000000000000 --- a/arch/s390/include/asm/cacheflush.h +++ /dev/null @@ -1,8 +0,0 @@ -#ifndef _S390_CACHEFLUSH_H -#define _S390_CACHEFLUSH_H - -/* Caches aren't brain-dead on the s390. */ -#include -#include - -#endif /* _S390_CACHEFLUSH_H */ diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h index 3d7db6f35aeb..8b4140f6724f 100644 --- a/arch/x86/include/asm/cacheflush.h +++ b/arch/x86/include/asm/cacheflush.h @@ -4,7 +4,6 @@ /* Caches aren't brain-dead on the intel. */ #include #include -#include void clflush_cache_range(void *addr, unsigned int size); diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 19e1f2a6d7b0..5b2bbfbb3712 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -61,6 +61,7 @@ #include #include #include +#include #include "common.h" diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 9aadff3d0902..901c640d152f 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "common.h" -- cgit From ec48c940da6cb96c4be6638d0f2efade24d5242a Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 8 May 2017 15:58:50 -0700 Subject: kref: remove WARN_ON for NULL release functions The kref functions check for NULL release functions. This WARN_ON seems rather pointless. We will eventually release and then just crash nicely. It is also somewhat expensive because these functions are inlined in a lot of places. Removing the WARN_ONs saves around 2.3k in this kernel (likely more in others with more drivers) text data bss dec hex filename 9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg 9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg Link: http://lkml.kernel.org/r/20170315021431.13107-5-andi@firstfloor.org Signed-off-by: Andi Kleen Acked-by: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kref.h | 6 ------ 1 file changed, 6 deletions(-) diff --git a/include/linux/kref.h b/include/linux/kref.h index f4156f88f557..29220724bf1c 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -66,8 +66,6 @@ static inline void kref_get(struct kref *kref) */ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)) { - WARN_ON(release == NULL); - if (refcount_dec_and_test(&kref->refcount)) { release(kref); return 1; @@ -79,8 +77,6 @@ static inline int kref_put_mutex(struct kref *kref, void (*release)(struct kref *kref), struct mutex *lock) { - WARN_ON(release == NULL); - if (refcount_dec_and_mutex_lock(&kref->refcount, lock)) { release(kref); return 1; @@ -92,8 +88,6 @@ static inline int kref_put_lock(struct kref *kref, void (*release)(struct kref *kref), spinlock_t *lock) { - WARN_ON(release == NULL); - if (refcount_dec_and_lock(&kref->refcount, lock)) { release(kref); return 1; -- cgit From 68b43744c1fb3f86238527a696c0dc5f2bd6ea1b Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 8 May 2017 15:58:53 -0700 Subject: drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd Remove an inline from a fairly big function that is used often. It's unlikely that calling or not calling it makes a lot of difference. Saves around 8k text in my kernel. text data bss dec hex filename 9047801 5367568 11116544 25531913 1859609 vmlinux-before-megasas 9039417 5367568 11116544 25523529 1857549 vmlinux-megasas Link: http://lkml.kernel.org/r/20170315021431.13107-7-andi@firstfloor.org Signed-off-by: Andi Kleen Cc: Kashyap Desai Cc: Sumit Saxena Cc: James Bottomley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/scsi/megaraid/megaraid_sas_base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 0016f12cc563..316c3df0c3fd 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -244,7 +244,7 @@ struct megasas_cmd *megasas_get_cmd(struct megasas_instance * @instance: Adapter soft state * @cmd: Command packet to be returned to free command pool */ -inline void +void megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd) { unsigned long flags; -- cgit From f44a2920c84af809883ecbbd08d47fb5fe47c8ad Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 8 May 2017 15:58:56 -0700 Subject: include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec pagefault_disabled_dec is frequently used inline, and it has a WARN_ON for underflow that expands to about 6.5k of extra code. The warning doesn't seem to be that useful and worth so much code so remove it. If it was needed could make it depending on some debug kernel option. Saves ~6.5k in my kernel text data bss dec hex filename 9039417 5367568 11116544 25523529 1857549 vmlinux-before-pf 9032805 5367568 11116544 25516917 1855b75 vmlinux-pf Link: http://lkml.kernel.org/r/20170315021431.13107-8-andi@firstfloor.org Signed-off-by: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/uaccess.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index e0cbfb09e60f..201418d5e15c 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -203,7 +203,6 @@ static __always_inline void pagefault_disabled_inc(void) static __always_inline void pagefault_disabled_dec(void) { current->pagefault_disabled--; - WARN_ON(current->pagefault_disabled < 0); } /* -- cgit From c718a97514e4d77c97a35734b728aaf541a0621b Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 8 May 2017 15:58:59 -0700 Subject: fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag Commit afddba49d18f ("fs: introduce write_begin, write_end, and perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was checked in pagecache_write_begin(), but that check was removed by 4e02ed4b4a2f ("fs: remove prepare_write/commit_write"). Between these two commits, commit d9414774dc0c ("cifs: Convert cifs to new aops.") added a check in cifs_write_begin(), but that check was soon removed by commit a98ee8c1c707 ("[CIFS] fix regression in cifs_write_begin/cifs_write_end"). Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere. Let's remove this flag. This patch has no functionality changes. Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa Reviewed-by: Jeff Layton Reviewed-by: Christoph Hellwig Cc: Nick Piggin Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/vfs.txt | 3 +-- fs/buffer.c | 13 +++++-------- fs/exofs/dir.c | 3 +-- fs/hfs/extent.c | 4 ++-- fs/hfsplus/extents.c | 5 ++--- fs/iomap.c | 13 +++---------- fs/namei.c | 2 +- include/linux/fs.h | 5 ++--- mm/filemap.c | 6 ------ 9 files changed, 17 insertions(+), 37 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 94dd27ef4a76..f42b90687d40 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -694,8 +694,7 @@ struct address_space_operations { write_end: After a successful write_begin, and data copy, write_end must be called. len is the original len passed to write_begin, and copied - is the amount that was able to be copied (copied == len is always true - if write_begin was called with the AOP_FLAG_UNINTERRUPTIBLE flag). + is the amount that was able to be copied. The filesystem must take care of unlocking the page and releasing it refcount, and updating i_size. diff --git a/fs/buffer.c b/fs/buffer.c index 9196f2a270da..c3c7455efa3f 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2379,8 +2379,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size) goto out; err = pagecache_write_begin(NULL, mapping, size, 0, - AOP_FLAG_UNINTERRUPTIBLE|AOP_FLAG_CONT_EXPAND, - &page, &fsdata); + AOP_FLAG_CONT_EXPAND, &page, &fsdata); if (err) goto out; @@ -2415,9 +2414,8 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = PAGE_SIZE - zerofrom; - err = pagecache_write_begin(file, mapping, curpos, len, - AOP_FLAG_UNINTERRUPTIBLE, - &page, &fsdata); + err = pagecache_write_begin(file, mapping, curpos, len, 0, + &page, &fsdata); if (err) goto out; zero_user(page, zerofrom, len); @@ -2449,9 +2447,8 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = offset - zerofrom; - err = pagecache_write_begin(file, mapping, curpos, len, - AOP_FLAG_UNINTERRUPTIBLE, - &page, &fsdata); + err = pagecache_write_begin(file, mapping, curpos, len, 0, + &page, &fsdata); if (err) goto out; zero_user(page, zerofrom, len); diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c index 42f9a0a0c4ca..8eeb694332fe 100644 --- a/fs/exofs/dir.c +++ b/fs/exofs/dir.c @@ -405,8 +405,7 @@ int exofs_set_link(struct inode *dir, struct exofs_dir_entry *de, int err; lock_page(page); - err = exofs_write_begin(NULL, page->mapping, pos, len, - AOP_FLAG_UNINTERRUPTIBLE, &page, NULL); + err = exofs_write_begin(NULL, page->mapping, pos, len, 0, &page, NULL); if (err) EXOFS_ERR("exofs_set_link: exofs_write_begin FAILED => %d\n", err); diff --git a/fs/hfs/extent.c b/fs/hfs/extent.c index e33a0d36a93e..5d0182654580 100644 --- a/fs/hfs/extent.c +++ b/fs/hfs/extent.c @@ -485,8 +485,8 @@ void hfs_file_truncate(struct inode *inode) /* XXX: Can use generic_cont_expand? */ size = inode->i_size - 1; - res = pagecache_write_begin(NULL, mapping, size+1, 0, - AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata); + res = pagecache_write_begin(NULL, mapping, size+1, 0, 0, + &page, &fsdata); if (!res) { res = pagecache_write_end(NULL, mapping, size+1, 0, 0, page, fsdata); diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c index feca524ce2a5..a3eb640b4f8f 100644 --- a/fs/hfsplus/extents.c +++ b/fs/hfsplus/extents.c @@ -545,9 +545,8 @@ void hfsplus_file_truncate(struct inode *inode) void *fsdata; loff_t size = inode->i_size; - res = pagecache_write_begin(NULL, mapping, size, 0, - AOP_FLAG_UNINTERRUPTIBLE, - &page, &fsdata); + res = pagecache_write_begin(NULL, mapping, size, 0, 0, + &page, &fsdata); if (res) return; res = pagecache_write_end(NULL, mapping, size, diff --git a/fs/iomap.c b/fs/iomap.c index 1faabe09b8fd..4b10892967a5 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -158,12 +158,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, ssize_t written = 0; unsigned int flags = AOP_FLAG_NOFS; - /* - * Copies from kernel address space cannot fail (NFSD is a big user). - */ - if (!iter_is_iovec(i)) - flags |= AOP_FLAG_UNINTERRUPTIBLE; - do { struct page *page; unsigned long offset; /* Offset into pagecache page */ @@ -291,8 +285,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return PTR_ERR(rpage); status = iomap_write_begin(inode, pos, bytes, - AOP_FLAG_NOFS | AOP_FLAG_UNINTERRUPTIBLE, - &page, iomap); + AOP_FLAG_NOFS, &page, iomap); put_page(rpage); if (unlikely(status)) return status; @@ -343,8 +336,8 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, struct page *page; int status; - status = iomap_write_begin(inode, pos, bytes, - AOP_FLAG_UNINTERRUPTIBLE | AOP_FLAG_NOFS, &page, iomap); + status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page, + iomap); if (status) return status; diff --git a/fs/namei.c b/fs/namei.c index 9a7f8bd748d8..7286f87ce863 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4766,7 +4766,7 @@ int __page_symlink(struct inode *inode, const char *symname, int len, int nofs) struct page *page; void *fsdata; int err; - unsigned int flags = AOP_FLAG_UNINTERRUPTIBLE; + unsigned int flags = 0; if (nofs) flags |= AOP_FLAG_NOFS; diff --git a/include/linux/fs.h b/include/linux/fs.h index 5d62d2c47939..249dad4e8d26 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -250,9 +250,8 @@ enum positive_aop_returns { AOP_TRUNCATED_PAGE = 0x80001, }; -#define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */ -#define AOP_FLAG_CONT_EXPAND 0x0002 /* called from cont_expand */ -#define AOP_FLAG_NOFS 0x0004 /* used by filesystem to direct +#define AOP_FLAG_CONT_EXPAND 0x0001 /* called from cont_expand */ +#define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct * helper code (eg buffer layer) * to clear GFP_FS from alloc */ diff --git a/mm/filemap.c b/mm/filemap.c index 681da61080bc..b7b973b47d8d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2791,12 +2791,6 @@ ssize_t generic_perform_write(struct file *file, ssize_t written = 0; unsigned int flags = 0; - /* - * Copies from kernel address space cannot fail (NFSD is a big user). - */ - if (!iter_is_iovec(i)) - flags |= AOP_FLAG_UNINTERRUPTIBLE; - do { struct page *page; unsigned long offset; /* Offset into pagecache page */ -- cgit From 929f9d285a212fde8703b2c0d540d3a79e93b2cd Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 8 May 2017 15:59:02 -0700 Subject: Documentation/vm/transhuge.txt: fix trivial typos [akpm@linux-foundation.org: fixes per Randy] Link: http://lkml.kernel.org/r/20170405210259.2067-1-sj38.park@gmail.com Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/transhuge.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index cd28d5ee5273..4dde03b44ad1 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt @@ -266,7 +266,7 @@ for each mapping. The number of file transparent huge pages mapped to userspace is available by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo. -To identify what applications are mapping file transparent huge pages, it +To identify what applications are mapping file transparent huge pages, it is necessary to read /proc/PID/smaps and count the FileHugeMapped fields for each mapping. @@ -292,7 +292,7 @@ thp_collapse_alloc_failed is incremented if khugepaged found a range the allocation. thp_file_alloc is incremented every time a file huge page is successfully -i allocated. + allocated. thp_file_mapped is incremented every time a file huge page is mapped into user address space. @@ -501,7 +501,7 @@ scanner can get reference to a page is get_page_unless_zero(). All tail pages have zero ->_refcount until atomic_add(). This prevents the scanner from getting a reference to the tail page up to that point. After the -atomic_add() we don't care about the ->_refcount value. We already known how +atomic_add() we don't care about the ->_refcount value. We already known how many references should be uncharged from the head page. For head page get_page_unless_zero() will succeed and we don't mind. It's @@ -519,8 +519,8 @@ comes. Splitting will free up unused subpages. Splitting the page right away is not an option due to locking context in the place where we can detect partial unmap. It's also might be -counterproductive since in many cases partial unmap unmap happens during -exit(2) if an THP crosses VMA boundary. +counterproductive since in many cases partial unmap happens during exit(2) if +a THP crosses a VMA boundary. Function deferred_split_huge_page() is used to queue page for splitting. The splitting itself will happen when we get memory pressure via shrinker -- cgit From 063246641d4a9e9de84a2466fbad50112faf88dc Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 8 May 2017 15:59:05 -0700 Subject: format-security: move static strings to const While examining output from trial builds with -Wformat-security enabled, many strings were found that should be defined as "const", or as a char array instead of char pointer. This makes some static analysis easier, by producing fewer false positives. As these are all trivial changes, it seemed best to put them all in a single patch rather than chopping them up per maintainer. Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast Signed-off-by: Kees Cook Acked-by: Jes Sorensen [runner.c] Cc: Tony Lindgren Cc: Russell King Cc: "Maciej W. Rozycki" Cc: Ralf Baechle Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: Daniel Vetter Cc: Jani Nikula Cc: Sean Paul Cc: David Airlie Cc: Yisen Zhuang Cc: Salil Mehta Cc: Thomas Bogendoerfer Cc: Jiri Slaby Cc: Patrice Chotard Cc: "David S. Miller" Cc: James Hogan Cc: Paul Burton Cc: Matt Redfearn Cc: Paolo Bonzini Cc: Ingo Molnar Cc: Rasmus Villemoes Cc: Mugunthan V N Cc: Felipe Balbi Cc: Jarod Wilson Cc: Florian Westphal Cc: Antonio Quartulli Cc: Dmitry Torokhov Cc: Kejian Yan Cc: Daode Huang Cc: Qianqian Xie Cc: Philippe Reynes Cc: Colin Ian King Cc: Eric Dumazet Cc: Christian Gromm Cc: Andrey Shvetsov Cc: Jason Litzinger Cc: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm/mach-omap2/board-n8x0.c | 2 +- arch/mips/dec/prom/init.c | 6 +++--- arch/mips/kernel/traps.c | 4 ++-- drivers/char/dsp56k.c | 2 +- drivers/cpufreq/powernow-k8.c | 3 ++- drivers/gpu/drm/drm_fb_helper.c | 2 +- drivers/net/ethernet/amd/atarilance.c | 4 ++-- drivers/net/ethernet/amd/declance.c | 2 +- drivers/net/ethernet/amd/sun3lance.c | 3 ++- drivers/net/ethernet/cirrus/mac89x0.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 2 +- drivers/net/ethernet/natsemi/sonic.h | 2 +- drivers/net/ethernet/toshiba/tc35815.c | 2 +- drivers/net/fddi/defxx.c | 2 +- drivers/net/hippi/rrunner.c | 3 ++- drivers/staging/most/mostcore/core.c | 2 +- drivers/tty/n_hdlc.c | 10 +++++----- drivers/tty/serial/st-asc.c | 2 +- net/decnet/af_decnet.c | 3 ++- 19 files changed, 31 insertions(+), 27 deletions(-) diff --git a/arch/arm/mach-omap2/board-n8x0.c b/arch/arm/mach-omap2/board-n8x0.c index 6b6fda65fb3b..91272db09fa3 100644 --- a/arch/arm/mach-omap2/board-n8x0.c +++ b/arch/arm/mach-omap2/board-n8x0.c @@ -117,7 +117,7 @@ static struct musb_hdrc_platform_data tusb_data = { static void __init n8x0_usb_init(void) { int ret = 0; - static char announce[] __initdata = KERN_INFO "TUSB 6010\n"; + static const char announce[] __initconst = KERN_INFO "TUSB 6010\n"; /* PM companion chip power control pin */ ret = gpio_request_one(TUSB6010_GPIO_ENABLE, GPIOF_OUT_INIT_LOW, diff --git a/arch/mips/dec/prom/init.c b/arch/mips/dec/prom/init.c index 4e1761e0a09a..d88eb7a6662b 100644 --- a/arch/mips/dec/prom/init.c +++ b/arch/mips/dec/prom/init.c @@ -88,7 +88,7 @@ void __init which_prom(s32 magic, s32 *prom_vec) void __init prom_init(void) { extern void dec_machine_halt(void); - static char cpu_msg[] __initdata = + static const char cpu_msg[] __initconst = "Sorry, this kernel is compiled for a wrong CPU type!\n"; s32 argc = fw_arg0; s32 *argv = (void *)fw_arg1; @@ -111,7 +111,7 @@ void __init prom_init(void) #if defined(CONFIG_CPU_R3000) if ((current_cpu_type() == CPU_R4000SC) || (current_cpu_type() == CPU_R4400SC)) { - static char r4k_msg[] __initdata = + static const char r4k_msg[] __initconst = "Please recompile with \"CONFIG_CPU_R4x00 = y\".\n"; printk(cpu_msg); printk(r4k_msg); @@ -122,7 +122,7 @@ void __init prom_init(void) #if defined(CONFIG_CPU_R4X00) if ((current_cpu_type() == CPU_R3000) || (current_cpu_type() == CPU_R3000A)) { - static char r3k_msg[] __initdata = + static const char r3k_msg[] __initconst = "Please recompile with \"CONFIG_CPU_R3000 = y\".\n"; printk(cpu_msg); printk(r3k_msg); diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index b49e7bf9f950..9681b5877140 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2256,8 +2256,8 @@ void set_handler(unsigned long offset, void *addr, unsigned long size) local_flush_icache_range(ebase + offset, ebase + offset + size); } -static char panic_null_cerr[] = - "Trying to set NULL cache error exception handler"; +static const char panic_null_cerr[] = + "Trying to set NULL cache error exception handler\n"; /* * Install uncached CPU exception handler. diff --git a/drivers/char/dsp56k.c b/drivers/char/dsp56k.c index 50aa9ba91f25..0d7b577e0ff0 100644 --- a/drivers/char/dsp56k.c +++ b/drivers/char/dsp56k.c @@ -489,7 +489,7 @@ static const struct file_operations dsp56k_fops = { /****** Init and module functions ******/ -static char banner[] __initdata = KERN_INFO "DSP56k driver installed\n"; +static const char banner[] __initconst = KERN_INFO "DSP56k driver installed\n"; static int __init dsp56k_init_driver(void) { diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c index 0b5bf135b090..062d71434e47 100644 --- a/drivers/cpufreq/powernow-k8.c +++ b/drivers/cpufreq/powernow-k8.c @@ -1171,7 +1171,8 @@ static struct cpufreq_driver cpufreq_amd64_driver = { static void __request_acpi_cpufreq(void) { - const char *cur_drv, *drv = "acpi-cpufreq"; + const char drv[] = "acpi-cpufreq"; + const char *cur_drv; cur_drv = cpufreq_get_current_driver(); if (!cur_drv) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index a0ea3241c651..1f178b878e42 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -2446,7 +2446,7 @@ EXPORT_SYMBOL(drm_fb_helper_hotplug_event); int __init drm_fb_helper_modinit(void) { #if defined(CONFIG_FRAMEBUFFER_CONSOLE_MODULE) && !defined(CONFIG_EXPERT) - const char *name = "fbcon"; + const char name[] = "fbcon"; struct module *fbcon; mutex_lock(&module_mutex); diff --git a/drivers/net/ethernet/amd/atarilance.c b/drivers/net/ethernet/amd/atarilance.c index 796c37a5bbde..c5b81268c284 100644 --- a/drivers/net/ethernet/amd/atarilance.c +++ b/drivers/net/ethernet/amd/atarilance.c @@ -42,8 +42,8 @@ */ -static char version[] = "atarilance.c: v1.3 04/04/96 " - "Roman.Hodek@informatik.uni-erlangen.de\n"; +static const char version[] = "atarilance.c: v1.3 04/04/96 " + "Roman.Hodek@informatik.uni-erlangen.de\n"; #include #include diff --git a/drivers/net/ethernet/amd/declance.c b/drivers/net/ethernet/amd/declance.c index 6c98901f1b89..82cc81385033 100644 --- a/drivers/net/ethernet/amd/declance.c +++ b/drivers/net/ethernet/amd/declance.c @@ -72,7 +72,7 @@ #include #include -static char version[] = +static const char version[] = "declance.c: v0.011 by Linux MIPS DECstation task force\n"; MODULE_AUTHOR("Linux MIPS DECstation task force"); diff --git a/drivers/net/ethernet/amd/sun3lance.c b/drivers/net/ethernet/amd/sun3lance.c index 12bb4f1489fc..77b1db267730 100644 --- a/drivers/net/ethernet/amd/sun3lance.c +++ b/drivers/net/ethernet/amd/sun3lance.c @@ -21,7 +21,8 @@ */ -static char *version = "sun3lance.c: v1.2 1/12/2001 Sam Creasey (sammy@sammy.net)\n"; +static const char version[] = +"sun3lance.c: v1.2 1/12/2001 Sam Creasey (sammy@sammy.net)\n"; #include #include diff --git a/drivers/net/ethernet/cirrus/mac89x0.c b/drivers/net/ethernet/cirrus/mac89x0.c index b600fbbbf679..f910f0f386d6 100644 --- a/drivers/net/ethernet/cirrus/mac89x0.c +++ b/drivers/net/ethernet/cirrus/mac89x0.c @@ -56,7 +56,7 @@ local_irq_{dis,en}able() */ -static char *version = +static const char version[] = "cs89x0.c:v1.02 11/26/96 Russell Nelson \n"; /* ======================= configure the driver here ======================= */ diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h index 24dfba53a0f2..bbc0a98e7ca3 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h @@ -405,7 +405,7 @@ struct mac_driver { }; struct mac_stats_string { - char desc[ETH_GSTRING_LEN]; + const char desc[ETH_GSTRING_LEN]; unsigned long offset; }; diff --git a/drivers/net/ethernet/natsemi/sonic.h b/drivers/net/ethernet/natsemi/sonic.h index 07091dd27e5d..7b0a8db57af9 100644 --- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -444,7 +444,7 @@ static inline __u16 sonic_rra_get(struct net_device* dev, int entry, (entry * SIZEOF_SONIC_RR) + offset); } -static const char *version = +static const char version[] = "sonic.c:v0.92 20.9.98 tsbogend@alpha.franken.de\n"; #endif /* SONIC_H */ diff --git a/drivers/net/ethernet/toshiba/tc35815.c b/drivers/net/ethernet/toshiba/tc35815.c index 3dadee1080b9..d9db8a06afd2 100644 --- a/drivers/net/ethernet/toshiba/tc35815.c +++ b/drivers/net/ethernet/toshiba/tc35815.c @@ -23,7 +23,7 @@ */ #define DRV_VERSION "1.39" -static const char *version = "tc35815.c:v" DRV_VERSION "\n"; +static const char version[] = "tc35815.c:v" DRV_VERSION "\n"; #define MODNAME "tc35815" #include diff --git a/drivers/net/fddi/defxx.c b/drivers/net/fddi/defxx.c index b0de8ecd7fe8..f4a816cf012a 100644 --- a/drivers/net/fddi/defxx.c +++ b/drivers/net/fddi/defxx.c @@ -228,7 +228,7 @@ #define DRV_VERSION "v1.11" #define DRV_RELDATE "2014/07/01" -static char version[] = +static const char version[] = DRV_NAME ": " DRV_VERSION " " DRV_RELDATE " Lawrence V. Stefani and others\n"; diff --git a/drivers/net/hippi/rrunner.c b/drivers/net/hippi/rrunner.c index dd7fc6659ad4..9b0d6148e994 100644 --- a/drivers/net/hippi/rrunner.c +++ b/drivers/net/hippi/rrunner.c @@ -60,7 +60,8 @@ MODULE_AUTHOR("Jes Sorensen "); MODULE_DESCRIPTION("Essential RoadRunner HIPPI driver"); MODULE_LICENSE("GPL"); -static char version[] = "rrunner.c: v0.50 11/11/2002 Jes Sorensen (jes@wildopensource.com)\n"; +static const char version[] = +"rrunner.c: v0.50 11/11/2002 Jes Sorensen (jes@wildopensource.com)\n"; static const struct net_device_ops rr_netdev_ops = { diff --git a/drivers/staging/most/mostcore/core.c b/drivers/staging/most/mostcore/core.c index 675b2a9e66c1..069269db394c 100644 --- a/drivers/staging/most/mostcore/core.c +++ b/drivers/staging/most/mostcore/core.c @@ -82,7 +82,7 @@ struct most_inst_obj { static const struct { int most_ch_data_type; - char *name; + const char *name; } ch_data_type[] = { { MOST_CH_CONTROL, "control\n" }, { MOST_CH_ASYNC, "async\n" }, diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index e94aea8c0d05..7b2a466616d6 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -939,11 +939,11 @@ static struct n_hdlc_buf *n_hdlc_buf_get(struct n_hdlc_buf_list *buf_list) return buf; } /* end of n_hdlc_buf_get() */ -static char hdlc_banner[] __initdata = +static const char hdlc_banner[] __initconst = KERN_INFO "HDLC line discipline maxframe=%u\n"; -static char hdlc_register_ok[] __initdata = +static const char hdlc_register_ok[] __initconst = KERN_INFO "N_HDLC line discipline registered.\n"; -static char hdlc_register_fail[] __initdata = +static const char hdlc_register_fail[] __initconst = KERN_ERR "error registering line discipline: %d\n"; static int __init n_hdlc_init(void) @@ -968,9 +968,9 @@ static int __init n_hdlc_init(void) } /* end of init_module() */ -static char hdlc_unregister_ok[] __exitdata = +static const char hdlc_unregister_ok[] __exitdata = KERN_INFO "N_HDLC: line discipline unregistered\n"; -static char hdlc_unregister_fail[] __exitdata = +static const char hdlc_unregister_fail[] __exitdata = KERN_ERR "N_HDLC: can't unregister line discipline (err = %d)\n"; static void __exit n_hdlc_exit(void) diff --git a/drivers/tty/serial/st-asc.c b/drivers/tty/serial/st-asc.c index c334bcc59c64..a93a3167a9c6 100644 --- a/drivers/tty/serial/st-asc.c +++ b/drivers/tty/serial/st-asc.c @@ -986,7 +986,7 @@ static struct platform_driver asc_serial_driver = { static int __init asc_init(void) { int ret; - static char banner[] __initdata = + static const char banner[] __initconst = KERN_INFO "STMicroelectronics ASC driver initialized\n"; printk(banner); diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 9afa2a5030b2..405483a07efc 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -2361,7 +2361,8 @@ MODULE_AUTHOR("Linux DECnet Project Team"); MODULE_LICENSE("GPL"); MODULE_ALIAS_NETPROTO(PF_DECnet); -static char banner[] __initdata = KERN_INFO "NET4: DECnet for Linux: V.2.5.68s (C) 1995-2003 Linux DECnet Project Team\n"; +static const char banner[] __initconst = KERN_INFO +"NET4: DECnet for Linux: V.2.5.68s (C) 1995-2003 Linux DECnet Project Team\n"; static int __init decnet_init(void) { -- cgit From 48fbfe50f1d5fef51bac98d105d2a28df42a1205 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:10 -0700 Subject: fs: f2fs: use ktime_get_real_seconds for sit_info times CURRENT_TIME_SEC is not y2038 safe. Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment timestamps used by GC algorithm including the segment mtime timestamps. Link: http://lkml.kernel.org/r/1491613030-11599-2-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann Cc: Jaegeuk Kim Cc: Chao Yu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/f2fs/segment.c | 2 +- fs/f2fs/segment.h | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 13806f642ab5..87c962705550 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2573,7 +2573,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi) sit_i->dirty_sentries = 0; sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK; sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time); - sit_i->mounted_time = CURRENT_TIME_SEC.tv_sec; + sit_i->mounted_time = ktime_get_real_seconds(); mutex_init(&sit_i->sentry_lock); return 0; } diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 5e8ad4280a50..313b99040aa6 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -691,8 +691,9 @@ static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start) static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi) { struct sit_info *sit_i = SIT_I(sbi); - return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec - - sit_i->mounted_time; + time64_t now = ktime_get_real_seconds(); + + return sit_i->elapsed_time + now - sit_i->mounted_time; } static inline void set_summary(struct f2fs_summary *sum, nid_t nid, -- cgit From 51aad0aee5b70e26347e4d891d568518909f3452 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:13 -0700 Subject: trace: make trace_hwlat timestamp y2038 safe struct timespec is not y2038 safe on 32 bit machines and needs to be replaced by struct timespec64 in order to represent times beyond year 2038 on such machines. Fix all the timestamp representation in struct trace_hwlat and all the corresponding implementations. Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Acked-by: Steven Rostedt (VMware) Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/trace/trace_entries.h | 6 +++--- kernel/trace/trace_hwlat.c | 14 +++++++------- kernel/trace/trace_output.c | 9 ++++----- 3 files changed, 14 insertions(+), 15 deletions(-) diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index c203ac4df791..adcdbbeae010 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -348,14 +348,14 @@ FTRACE_ENTRY(hwlat, hwlat_entry, __field( u64, duration ) __field( u64, outer_duration ) __field( u64, nmi_total_ts ) - __field_struct( struct timespec, timestamp ) - __field_desc( long, timestamp, tv_sec ) + __field_struct( struct timespec64, timestamp ) + __field_desc( s64, timestamp, tv_sec ) __field_desc( long, timestamp, tv_nsec ) __field( unsigned int, nmi_count ) __field( unsigned int, seqnum ) ), - F_printk("cnt:%u\tts:%010lu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n", + F_printk("cnt:%u\tts:%010llu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n", __entry->seqnum, __entry->tv_sec, __entry->tv_nsec, diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 21ea6ae77d93..d7c8e4ec3d9d 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -79,12 +79,12 @@ static u64 last_tracing_thresh = DEFAULT_LAT_THRESHOLD * NSEC_PER_USEC; /* Individual latency samples are stored here when detected. */ struct hwlat_sample { - u64 seqnum; /* unique sequence */ - u64 duration; /* delta */ - u64 outer_duration; /* delta (outer loop) */ - u64 nmi_total_ts; /* Total time spent in NMIs */ - struct timespec timestamp; /* wall time */ - int nmi_count; /* # NMIs during this sample */ + u64 seqnum; /* unique sequence */ + u64 duration; /* delta */ + u64 outer_duration; /* delta (outer loop) */ + u64 nmi_total_ts; /* Total time spent in NMIs */ + struct timespec64 timestamp; /* wall time */ + int nmi_count; /* # NMIs during this sample */ }; /* keep the global state somewhere. */ @@ -250,7 +250,7 @@ static int get_sample(void) s.seqnum = hwlat_data.count; s.duration = sample; s.outer_duration = outer_sample; - s.timestamp = CURRENT_TIME; + ktime_get_real_ts64(&s.timestamp); s.nmi_total_ts = nmi_total_ts; s.nmi_count = nmi_count; trace_hwlat_sample(&s); diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 02a4aeb22c47..08f9bab8089e 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -4,7 +4,6 @@ * Copyright (C) 2008 Red Hat Inc, Steven Rostedt * */ - #include #include #include @@ -1161,11 +1160,11 @@ trace_hwlat_print(struct trace_iterator *iter, int flags, trace_assign_type(field, entry); - trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%ld.%09ld", + trace_seq_printf(s, "#%-5u inner/outer(us): %4llu/%-5llu ts:%lld.%09ld", field->seqnum, field->duration, field->outer_duration, - field->timestamp.tv_sec, + (long long)field->timestamp.tv_sec, field->timestamp.tv_nsec); if (field->nmi_count) { @@ -1195,10 +1194,10 @@ trace_hwlat_raw(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - trace_seq_printf(s, "%llu %lld %ld %09ld %u\n", + trace_seq_printf(s, "%llu %lld %lld %09ld %u\n", field->duration, field->outer_duration, - field->timestamp.tv_sec, + (long long)field->timestamp.tv_sec, field->timestamp.tv_nsec, field->seqnum); -- cgit From e37fea58f771c2674709099a09ddafd058fef634 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:16 -0700 Subject: fs: cifs: replace CURRENT_TIME by other appropriate apis CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time() for filesystem times, and ktime_get_* functions for authentication timestamps and timezone calculations. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. CURRENT_TIME macro will be deleted before merging the aforementioned change. The inode timestamps read from the server are assumed to have correct granularity and range. The patch also assumes that the difference between server and client times lie in the range INT_MIN..INT_MAX. This is valid because this is the difference between current times between server and client, and the largest timezone difference is in the range of one day. All cifs timestamps currently use timespec representation internally. Authentication and timezone timestamps can also be transitioned into using timespec64 when all other timestamps for cifs is transitioned to use timespec64. Link: http://lkml.kernel.org/r/1491613030-11599-4-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann Cc: Steve French Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/cifs/cifsencrypt.c | 4 +++- fs/cifs/cifssmb.c | 10 +++++----- fs/cifs/inode.c | 28 +++++++++++++++------------- 3 files changed, 23 insertions(+), 19 deletions(-) diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 058ac9b36f04..68abbb0db608 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -478,6 +478,7 @@ find_timestamp(struct cifs_ses *ses) unsigned char *blobptr; unsigned char *blobend; struct ntlmssp2_name *attrptr; + struct timespec ts; if (!ses->auth_key.len || !ses->auth_key.response) return 0; @@ -502,7 +503,8 @@ find_timestamp(struct cifs_ses *ses) blobptr += attrsize; /* advance attr value */ } - return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME)); + ktime_get_real_ts(&ts); + return cpu_to_le64(cifs_UnixTimeToNT(ts)); } static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash, diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 205fd94f52fd..4c01b3f9abf0 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -478,14 +478,14 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, NEGOTIATE_RSP *pSMBr) * this requirement. */ int val, seconds, remain, result; - struct timespec ts, utc; - utc = CURRENT_TIME; + struct timespec ts; + unsigned long utc = ktime_get_real_seconds(); ts = cnvrtDosUnixTm(rsp->SrvTime.Date, rsp->SrvTime.Time, 0); cifs_dbg(FYI, "SrvTime %d sec since 1970 (utc: %d) diff: %d\n", - (int)ts.tv_sec, (int)utc.tv_sec, - (int)(utc.tv_sec - ts.tv_sec)); - val = (int)(utc.tv_sec - ts.tv_sec); + (int)ts.tv_sec, (int)utc, + (int)(utc - ts.tv_sec)); + val = (int)(utc - ts.tv_sec); seconds = abs(val); result = (seconds / MIN_TZ_ADJ) * MIN_TZ_ADJ; remain = seconds % MIN_TZ_ADJ; diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index b261db34103c..c3b2fa0b2ec8 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -322,9 +322,9 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) fattr->cf_mode = S_IFDIR | S_IXUGO | S_IRWXU; fattr->cf_uid = cifs_sb->mnt_uid; fattr->cf_gid = cifs_sb->mnt_gid; - fattr->cf_atime = CURRENT_TIME; - fattr->cf_ctime = CURRENT_TIME; - fattr->cf_mtime = CURRENT_TIME; + ktime_get_real_ts(&fattr->cf_mtime); + fattr->cf_mtime = timespec_trunc(fattr->cf_mtime, sb->s_time_gran); + fattr->cf_atime = fattr->cf_ctime = fattr->cf_mtime; fattr->cf_nlink = 2; fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; } @@ -586,9 +586,10 @@ static int cifs_sfu_mode(struct cifs_fattr *fattr, const unsigned char *path, /* Fill a cifs_fattr struct with info from FILE_ALL_INFO */ static void cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, - struct cifs_sb_info *cifs_sb, bool adjust_tz, + struct super_block *sb, bool adjust_tz, bool symlink) { + struct cifs_sb_info *cifs_sb = CIFS_SB(sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); memset(fattr, 0, sizeof(*fattr)); @@ -598,8 +599,10 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, if (info->LastAccessTime) fattr->cf_atime = cifs_NTtimeToUnix(info->LastAccessTime); - else - fattr->cf_atime = CURRENT_TIME; + else { + ktime_get_real_ts(&fattr->cf_atime); + fattr->cf_atime = timespec_trunc(fattr->cf_atime, sb->s_time_gran); + } fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime); fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime); @@ -659,7 +662,6 @@ cifs_get_file_info(struct file *filp) FILE_ALL_INFO find_data; struct cifs_fattr fattr; struct inode *inode = file_inode(filp); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); struct cifsFileInfo *cfile = filp->private_data; struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); struct TCP_Server_Info *server = tcon->ses->server; @@ -671,7 +673,7 @@ cifs_get_file_info(struct file *filp) rc = server->ops->query_file_info(xid, tcon, &cfile->fid, &find_data); switch (rc) { case 0: - cifs_all_info_to_fattr(&fattr, &find_data, cifs_sb, false, + cifs_all_info_to_fattr(&fattr, &find_data, inode->i_sb, false, false); break; case -EREMOTE: @@ -753,7 +755,7 @@ cifs_get_inode_info(struct inode **inode, const char *full_path, } if (!rc) { - cifs_all_info_to_fattr(&fattr, data, cifs_sb, adjust_tz, + cifs_all_info_to_fattr(&fattr, data, sb, adjust_tz, symlink); } else if (rc == -EREMOTE) { cifs_create_dfs_fattr(&fattr, sb); @@ -1363,9 +1365,9 @@ out_reval: cifs_inode = CIFS_I(inode); cifs_inode->time = 0; /* will force revalidate to get info when needed */ - inode->i_ctime = current_fs_time(sb); + inode->i_ctime = current_time(inode); } - dir->i_ctime = dir->i_mtime = current_fs_time(sb); + dir->i_ctime = dir->i_mtime = current_time(dir); cifs_inode = CIFS_I(dir); CIFS_I(dir)->time = 0; /* force revalidate of dir as well */ unlink_out: @@ -1633,7 +1635,7 @@ int cifs_rmdir(struct inode *inode, struct dentry *direntry) cifsInode->time = 0; d_inode(direntry)->i_ctime = inode->i_ctime = inode->i_mtime = - current_fs_time(inode->i_sb); + current_time(inode); rmdir_exit: kfree(full_path); @@ -1806,7 +1808,7 @@ unlink_target: CIFS_I(source_dir)->time = CIFS_I(target_dir)->time = 0; source_dir->i_ctime = source_dir->i_mtime = target_dir->i_ctime = - target_dir->i_mtime = current_fs_time(source_dir->i_sb); + target_dir->i_mtime = current_time(source_dir); cifs_rename_exit: kfree(info_buf_source); -- cgit From 1134e091006a61d7ea4c33748b598972d1edc5c4 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:19 -0700 Subject: fs: ceph: CURRENT_TIME with ktime_get_real_ts() CURRENT_TIME is not y2038 safe. The macro will be deleted and all the references to it will be replaced by ktime_get_* apis. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as ceph uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. The current_fs_time() api is being changed to use vfs struct inode* as an argument instead of struct super_block*. Set the new mds client request r_stamp field using ktime_get_real_ts() instead of using current_fs_time(). Also, since r_stamp is used as mtime on the server, use timespec_trunc() to truncate the timestamp, using the right granularity from the superblock. This api will be transitioned to be y2038 safe along with vfs. Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann M: Ilya Dryomov M: "Yan, Zheng" M: Sage Weil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/rbd.c | 2 +- fs/ceph/mds_client.c | 4 +++- net/ceph/messenger.c | 6 ++++-- net/ceph/osd_client.c | 4 ++-- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 3670e8dd03fe..26812c1ed0cf 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) { struct ceph_osd_request *osd_req = obj_request->osd_req; - osd_req->r_mtime = CURRENT_TIME; + ktime_get_real_ts(&osd_req->r_mtime); osd_req->r_data_offset = obj_request->offset; } diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c681762d76e6..1d3fa90d40b9 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1666,6 +1666,7 @@ struct ceph_mds_request * ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) { struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS); + struct timespec ts; if (!req) return ERR_PTR(-ENOMEM); @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item); - req->r_stamp = current_fs_time(mdsc->fsc->sb); + ktime_get_real_ts(&ts); + req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran); req->r_op = op; req->r_direct_mode = mode; diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index f76bb3332613..5766a6c896c4 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1386,8 +1386,9 @@ static void prepare_write_keepalive(struct ceph_connection *con) dout("prepare_write_keepalive %p\n", con); con_out_kvec_reset(con); if (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2) { - struct timespec now = CURRENT_TIME; + struct timespec now; + ktime_get_real_ts(&now); con_out_kvec_add(con, sizeof(tag_keepalive2), &tag_keepalive2); ceph_encode_timespec(&con->out_temp_keepalive2, &now); con_out_kvec_add(con, sizeof(con->out_temp_keepalive2), @@ -3176,8 +3177,9 @@ bool ceph_con_keepalive_expired(struct ceph_connection *con, { if (interval > 0 && (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2)) { - struct timespec now = CURRENT_TIME; + struct timespec now; struct timespec ts; + ktime_get_real_ts(&now); jiffies_to_timespec(interval, &ts); ts = timespec_add(con->last_keepalive_ack, ts); return timespec_compare(&now, &ts) >= 0; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index e15ea9e4c495..242d7c0d92f8 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3574,7 +3574,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc, ceph_oid_copy(&lreq->t.base_oid, oid); ceph_oloc_copy(&lreq->t.base_oloc, oloc); lreq->t.flags = CEPH_OSD_FLAG_WRITE; - lreq->mtime = CURRENT_TIME; + ktime_get_real_ts(&lreq->mtime); lreq->reg_req = alloc_linger_request(lreq); if (!lreq->reg_req) { @@ -3632,7 +3632,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc, ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid); ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc); req->r_flags = CEPH_OSD_FLAG_WRITE; - req->r_mtime = CURRENT_TIME; + ktime_get_real_ts(&req->r_mtime); osd_req_op_watch_init(req, 0, lreq->linger_id, CEPH_OSD_WATCH_OP_UNWATCH); -- cgit From a88e99e976582814cf73acd04134f52a620f3416 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:22 -0700 Subject: fs: ufs: use ktime_get_real_ts64() for birthtime CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64(). Inode time formats are already 64 bit long and accommodates time64_t. Link: http://lkml.kernel.org/r/1491613030-11599-6-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Cc: Evgeniy Dushistov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ufs/ialloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ufs/ialloc.c b/fs/ufs/ialloc.c index 9774555b3721..d1dd8cc33179 100644 --- a/fs/ufs/ialloc.c +++ b/fs/ufs/ialloc.c @@ -176,6 +176,7 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode) struct ufs_cg_private_info * ucpi; struct ufs_cylinder_group * ucg; struct inode * inode; + struct timespec64 ts; unsigned cg, bit, i, j, start; struct ufs_inode_info *ufsi; int err = -ENOSPC; @@ -323,8 +324,9 @@ cg_found: lock_buffer(bh); ufs2_inode = (struct ufs2_inode *)bh->b_data; ufs2_inode += ufs_inotofsbo(inode->i_ino); - ufs2_inode->ui_birthtime = cpu_to_fs64(sb, CURRENT_TIME.tv_sec); - ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, CURRENT_TIME.tv_nsec); + ktime_get_real_ts64(&ts); + ufs2_inode->ui_birthtime = cpu_to_fs64(sb, ts.tv_sec); + ufs2_inode->ui_birthnsec = cpu_to_fs32(sb, ts.tv_nsec); mark_buffer_dirty(bh); unlock_buffer(bh); if (sb->s_flags & MS_SYNCHRONOUS) -- cgit From 607a11ad947794d0f4f2c0f73c654876d1abb9b1 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:25 -0700 Subject: fs: ubifs: replace CURRENT_TIME_SEC with current_time CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned to use 64 bit time along with vfs in a separate patch. There is no plan to transition CURRENT_TIME_SEC to use y2038 safe time interfaces. current_time() returns timestamps according to the granularities set in the inode's super_block. The granularity check to call current_fs_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to update inode timestamp. Use timespec_trunc during file system creation, before the first inode is created. Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann Cc: Richard Weinberger Cc: Artem Bityutskiy Cc: Adrian Hunter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ubifs/dir.c | 12 ++++++------ fs/ubifs/file.c | 12 ++++++------ fs/ubifs/ioctl.c | 2 +- fs/ubifs/misc.h | 10 ---------- fs/ubifs/sb.c | 14 ++++++++++---- fs/ubifs/xattr.c | 6 +++--- 6 files changed, 26 insertions(+), 30 deletions(-) diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index b777bddaa1dd..19fcc9a3364e 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -121,7 +121,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir, inode_init_owner(inode, dir, mode); inode->i_mtime = inode->i_atime = inode->i_ctime = - ubifs_current_time(inode); + current_time(inode); inode->i_mapping->nrpages = 0; switch (mode & S_IFMT) { @@ -755,7 +755,7 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir, inc_nlink(inode); ihold(inode); - inode->i_ctime = ubifs_current_time(inode); + inode->i_ctime = current_time(inode); dir->i_size += sz_change; dir_ui->ui_size = dir->i_size; dir->i_mtime = dir->i_ctime = inode->i_ctime; @@ -830,7 +830,7 @@ static int ubifs_unlink(struct inode *dir, struct dentry *dentry) } lock_2_inodes(dir, inode); - inode->i_ctime = ubifs_current_time(dir); + inode->i_ctime = current_time(dir); drop_nlink(inode); dir->i_size -= sz_change; dir_ui->ui_size = dir->i_size; @@ -934,7 +934,7 @@ static int ubifs_rmdir(struct inode *dir, struct dentry *dentry) } lock_2_inodes(dir, inode); - inode->i_ctime = ubifs_current_time(dir); + inode->i_ctime = current_time(dir); clear_nlink(inode); drop_nlink(dir); dir->i_size -= sz_change; @@ -1411,7 +1411,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry, * Like most other Unix systems, set the @i_ctime for inodes on a * rename. */ - time = ubifs_current_time(old_dir); + time = current_time(old_dir); old_inode->i_ctime = time; /* We must adjust parent link count when renaming directories */ @@ -1584,7 +1584,7 @@ static int ubifs_xrename(struct inode *old_dir, struct dentry *old_dentry, lock_4_inodes(old_dir, new_dir, NULL, NULL); - time = ubifs_current_time(old_dir); + time = current_time(old_dir); fst_inode->i_ctime = time; snd_inode->i_ctime = time; old_dir->i_mtime = old_dir->i_ctime = time; diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index d9ae86f96df7..2cda3d67e2d0 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1196,7 +1196,7 @@ static int do_truncation(struct ubifs_info *c, struct inode *inode, mutex_lock(&ui->ui_mutex); ui->ui_size = inode->i_size; /* Truncation changes inode [mc]time */ - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); /* Other attributes may be changed at the same time as well */ do_attr_changes(inode, attr); err = ubifs_jnl_truncate(c, inode, old_size, new_size); @@ -1243,7 +1243,7 @@ static int do_setattr(struct ubifs_info *c, struct inode *inode, mutex_lock(&ui->ui_mutex); if (attr->ia_valid & ATTR_SIZE) { /* Truncation changes inode [mc]time */ - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); /* 'truncate_setsize()' changed @i_size, update @ui_size */ ui->ui_size = inode->i_size; } @@ -1420,7 +1420,7 @@ int ubifs_update_time(struct inode *inode, struct timespec *time, */ static int update_mctime(struct inode *inode) { - struct timespec now = ubifs_current_time(inode); + struct timespec now = current_time(inode); struct ubifs_inode *ui = ubifs_inode(inode); struct ubifs_info *c = inode->i_sb->s_fs_info; @@ -1434,7 +1434,7 @@ static int update_mctime(struct inode *inode) return err; mutex_lock(&ui->ui_mutex); - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); release = ui->dirty; mark_inode_dirty_sync(inode); mutex_unlock(&ui->ui_mutex); @@ -1511,7 +1511,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf) struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); struct ubifs_info *c = inode->i_sb->s_fs_info; - struct timespec now = ubifs_current_time(inode); + struct timespec now = current_time(inode); struct ubifs_budget_req req = { .new_page = 1 }; int err, update_time; @@ -1579,7 +1579,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf) struct ubifs_inode *ui = ubifs_inode(inode); mutex_lock(&ui->ui_mutex); - inode->i_mtime = inode->i_ctime = ubifs_current_time(inode); + inode->i_mtime = inode->i_ctime = current_time(inode); release = ui->dirty; mark_inode_dirty_sync(inode); mutex_unlock(&ui->ui_mutex); diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c index da519ba205f6..12b9eb5005ff 100644 --- a/fs/ubifs/ioctl.c +++ b/fs/ubifs/ioctl.c @@ -126,7 +126,7 @@ static int setflags(struct inode *inode, int flags) ui->flags = ioctl2ubifs(flags); ubifs_set_inode_flags(inode); - inode->i_ctime = ubifs_current_time(inode); + inode->i_ctime = current_time(inode); release = ui->dirty; mark_inode_dirty_sync(inode); mutex_unlock(&ui->ui_mutex); diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h index 8ece6ca58c0b..caf83d68fb38 100644 --- a/fs/ubifs/misc.h +++ b/fs/ubifs/misc.h @@ -224,16 +224,6 @@ static inline void *ubifs_idx_key(const struct ubifs_info *c, return (void *)((struct ubifs_branch *)idx->branches)->key; } -/** - * ubifs_current_time - round current time to time granularity. - * @inode: inode - */ -static inline struct timespec ubifs_current_time(struct inode *inode) -{ - return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ? - current_fs_time(inode->i_sb) : CURRENT_TIME_SEC; -} - /** * ubifs_tnc_lookup - look up a file-system node. * @c: UBIFS file-system description object diff --git a/fs/ubifs/sb.c b/fs/ubifs/sb.c index 7f1ead29e727..8c25081a5109 100644 --- a/fs/ubifs/sb.c +++ b/fs/ubifs/sb.c @@ -84,6 +84,8 @@ static int create_default_filesystem(struct ubifs_info *c) int min_leb_cnt = UBIFS_MIN_LEB_CNT; long long tmp64, main_bytes; __le64 tmp_le64; + __le32 tmp_le32; + struct timespec ts; /* Some functions called from here depend on the @c->key_len filed */ c->key_len = UBIFS_SK_LEN; @@ -298,13 +300,17 @@ static int create_default_filesystem(struct ubifs_info *c) ino->ch.node_type = UBIFS_INO_NODE; ino->creat_sqnum = cpu_to_le64(++c->max_sqnum); ino->nlink = cpu_to_le32(2); - tmp_le64 = cpu_to_le64(CURRENT_TIME_SEC.tv_sec); + + ktime_get_real_ts(&ts); + ts = timespec_trunc(ts, DEFAULT_TIME_GRAN); + tmp_le64 = cpu_to_le64(ts.tv_sec); ino->atime_sec = tmp_le64; ino->ctime_sec = tmp_le64; ino->mtime_sec = tmp_le64; - ino->atime_nsec = 0; - ino->ctime_nsec = 0; - ino->mtime_nsec = 0; + tmp_le32 = cpu_to_le32(ts.tv_nsec); + ino->atime_nsec = tmp_le32; + ino->ctime_nsec = tmp_le32; + ino->mtime_nsec = tmp_le32; ino->mode = cpu_to_le32(S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO); ino->size = cpu_to_le64(UBIFS_INO_NODE_SZ); diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c index efe00fcb8b75..3e53fdbf7997 100644 --- a/fs/ubifs/xattr.c +++ b/fs/ubifs/xattr.c @@ -152,7 +152,7 @@ static int create_xattr(struct ubifs_info *c, struct inode *host, ui->data_len = size; mutex_lock(&host_ui->ui_mutex); - host->i_ctime = ubifs_current_time(host); + host->i_ctime = current_time(host); host_ui->xattr_cnt += 1; host_ui->xattr_size += CALC_DENT_SIZE(fname_len(nm)); host_ui->xattr_size += CALC_XATTR_BYTES(size); @@ -234,7 +234,7 @@ static int change_xattr(struct ubifs_info *c, struct inode *host, mutex_unlock(&ui->ui_mutex); mutex_lock(&host_ui->ui_mutex); - host->i_ctime = ubifs_current_time(host); + host->i_ctime = current_time(host); host_ui->xattr_size -= CALC_XATTR_BYTES(old_size); host_ui->xattr_size += CALC_XATTR_BYTES(size); @@ -488,7 +488,7 @@ static int remove_xattr(struct ubifs_info *c, struct inode *host, return err; mutex_lock(&host_ui->ui_mutex); - host->i_ctime = ubifs_current_time(host); + host->i_ctime = current_time(host); host_ui->xattr_cnt -= 1; host_ui->xattr_size -= CALC_DENT_SIZE(fname_len(nm)); host_ui->xattr_size -= CALC_XATTR_BYTES(ui->data_len); -- cgit From 47f38c539e9a42344ff5a664942075bd4df93876 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:28 -0700 Subject: lustre: replace CURRENT_TIME macro CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time() for filesystem times, and ktime_get_* functions for others. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as lustre uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Link: http://lkml.kernel.org/r/1491613030-11599-10-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Cc: Oleg Drokin Cc: Andreas Dilger Cc: James Simmons Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/staging/lustre/lustre/llite/llite_lib.c | 6 +++--- drivers/staging/lustre/lustre/osc/osc_io.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c index 11b5a8d36415..ca5040c69217 100644 --- a/drivers/staging/lustre/lustre/llite/llite_lib.c +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c @@ -1454,17 +1454,17 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, bool hsm_import) /* We mark all of the fields "set" so MDS/OST does not re-set them */ if (attr->ia_valid & ATTR_CTIME) { - attr->ia_ctime = CURRENT_TIME; + attr->ia_ctime = current_time(inode); attr->ia_valid |= ATTR_CTIME_SET; } if (!(attr->ia_valid & ATTR_ATIME_SET) && (attr->ia_valid & ATTR_ATIME)) { - attr->ia_atime = CURRENT_TIME; + attr->ia_atime = current_time(inode); attr->ia_valid |= ATTR_ATIME_SET; } if (!(attr->ia_valid & ATTR_MTIME_SET) && (attr->ia_valid & ATTR_MTIME)) { - attr->ia_mtime = CURRENT_TIME; + attr->ia_mtime = current_time(inode); attr->ia_valid |= ATTR_MTIME_SET; } diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c index f991bee81b37..cbab80092442 100644 --- a/drivers/staging/lustre/lustre/osc/osc_io.c +++ b/drivers/staging/lustre/lustre/osc/osc_io.c @@ -216,7 +216,7 @@ static int osc_io_submit(const struct lu_env *env, struct cl_object *obj = ios->cis_obj; cl_object_attr_lock(obj); - attr->cat_mtime = LTIME_S(CURRENT_TIME); + attr->cat_mtime = ktime_get_real_seconds(); attr->cat_ctime = attr->cat_mtime; cl_object_attr_update(env, obj, attr, CAT_MTIME | CAT_CTIME); cl_object_attr_unlock(obj); @@ -256,7 +256,7 @@ static void osc_page_touch_at(const struct lu_env *env, kms > loi->loi_kms ? "" : "not ", loi->loi_kms, kms, loi->loi_lvb.lvb_size); - attr->cat_ctime = LTIME_S(CURRENT_TIME); + attr->cat_ctime = ktime_get_real_seconds(); attr->cat_mtime = attr->cat_ctime; valid = CAT_MTIME | CAT_CTIME; if (kms > loi->loi_kms) { -- cgit From 24d0d03c2edcd24906cf04fe4f41fa619f1fe632 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:31 -0700 Subject: apparmorfs: replace CURRENT_TIME with current_time() CURRENT_TIME macro is not y2038 safe on 32 bit systems. The patch replaces all the uses of CURRENT_TIME by current_time(). This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. current_time() is also planned to be transitioned to y2038 safe behavior along with this change. CURRENT_TIME macro will be deleted before merging the aforementioned change. Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Acked-by: John Johansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- security/apparmor/apparmorfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index be0b49897a67..4f6ac9dbc65d 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -1357,7 +1357,7 @@ static int aa_mk_null_file(struct dentry *parent) inode->i_ino = get_next_ino(); inode->i_mode = S_IFCHR | S_IRUGO | S_IWUGO; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); init_special_inode(inode, S_IFCHR | S_IRUGO | S_IWUGO, MKDEV(MEM_MAJOR, 3)); d_instantiate(dentry, inode); -- cgit From b32c8c7648d2fa6ed689fc688ed74baa22f12ca0 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Mon, 8 May 2017 15:59:34 -0700 Subject: gfs2: replace CURRENT_TIME with current_time Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/gfs2/bmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 3814a60e0aea..4d810be532dd 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -1072,7 +1072,7 @@ out_unlock: /* Every transaction boundary, we rewrite the dinode to keep its di_blocks current in case of failure. */ ip->i_inode.i_mtime = ip->i_inode.i_ctime = - CURRENT_TIME; + current_time(&ip->i_inode); gfs2_trans_add_meta(ip->i_gl, dibh); gfs2_dinode_out(ip, dibh->b_data); up_write(&ip->i_rw_mutex); @@ -1293,7 +1293,7 @@ static int trunc_dealloc(struct gfs2_inode *ip, u64 newsize) gfs2_statfs_change(sdp, 0, +btotal, 0); gfs2_quota_change(ip, -(s64)btotal, ip->i_inode.i_uid, ip->i_inode.i_gid); - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = current_time(&ip->i_inode); gfs2_trans_add_meta(ip->i_gl, dibh); gfs2_dinode_out(ip, dibh->b_data); up_write(&ip->i_rw_mutex); -- cgit From bfe1c566453a0979c0b3cd3728d0de962272f034 Mon Sep 17 00:00:00 2001 From: Deepa Dinamani Date: Mon, 8 May 2017 15:59:37 -0700 Subject: time: delete CURRENT_TIME_SEC and CURRENT_TIME All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced by other time functions. These macros are also not y2038 safe. And, all their use cases can be fulfilled by y2038 safe ktime_get_* variants. Link: http://lkml.kernel.org/r/1491613030-11599-12-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani Reviewed-by: Arnd Bergmann Acked-by: John Stultz Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/time.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/linux/time.h b/include/linux/time.h index 23f0f5ce3090..c0543f5f25de 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -151,9 +151,6 @@ static inline bool timespec_inject_offset_valid(const struct timespec *ts) return true; } -#define CURRENT_TIME (current_kernel_time()) -#define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 }) - /* Some architectures do not supply their own clocksource. * This is mainly the case in architectures that get their * inter-tick times by reading the counter on their interval -- cgit From c14a6eb44d8a59337433961d181ca953fb20d083 Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Mon, 8 May 2017 15:59:40 -0700 Subject: mm/huge_memory.c: use zap_deposited_table() more Depending on the flags of the PMD being zapped there may or may not be a deposited pgtable to be freed. In two of the three cases this is open coded while the third uses the zap_deposited_table() helper. This patch converts the others to use the helper to clean things up a bit. Link: http://lkml.kernel.org/r/20170411174233.21902-2-oohall@gmail.com Cc: Reza Arbab Cc: Balbir Singh Cc: linux-nvdimm@ml01.01.org Cc: Oliver O'Halloran Cc: Aneesh Kumar K.V Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b787c4cfda0e..aa01dd47cc65 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); } else if (is_huge_zero_pmd(orig_pmd)) { - pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd)); - atomic_long_dec(&tlb->mm->nr_ptes); + zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); } else { @@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); if (PageAnon(page)) { - pgtable_t pgtable; - pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd); - pte_free(tlb->mm, pgtable); - atomic_long_dec(&tlb->mm->nr_ptes); + zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { if (arch_needs_pgtable_deposit()) -- cgit From 3b6521f53572d7fc1b40c93931948716a53a82ab Mon Sep 17 00:00:00 2001 From: Oliver O'Halloran Date: Mon, 8 May 2017 15:59:43 -0700 Subject: mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required Although all architectures use a deposited page table for THP on anonymous VMAs, some architectures (s390 and powerpc) require the deposited storage even for file backed VMAs due to quirks of their MMUs. This patch adds support for depositing a table in DAX PMD fault handling path for archs that require it. Other architectures should see no functional changes. Link: http://lkml.kernel.org/r/20170411174233.21902-3-oohall@gmail.com Signed-off-by: Oliver O'Halloran Cc: Reza Arbab Cc: Balbir Singh Cc: linux-nvdimm@ml01.01.org Cc: Oliver O'Halloran Cc: Aneesh Kumar K.V Cc: "Kirill A. Shutemov" Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aa01dd47cc65..a84909cf20d3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf) } static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write) + pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write, + pgtable_t pgtable) { struct mm_struct *mm = vma->vm_mm; pmd_t entry; @@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, entry = pmd_mkyoung(pmd_mkdirty(entry)); entry = maybe_pmd_mkwrite(entry, vma); } + + if (pgtable) { + pgtable_trans_huge_deposit(mm, pmd, pgtable); + atomic_long_inc(&mm->nr_ptes); + } + set_pmd_at(mm, addr, pmd, entry); update_mmu_cache_pmd(vma, addr, pmd); spin_unlock(ptl); @@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, pfn_t pfn, bool write) { pgprot_t pgprot = vma->vm_page_prot; + pgtable_t pgtable = NULL; /* * If we had pmd_special, we could avoid all these restrictions, * but we need to be consistent with PTEs and architectures that @@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, if (addr < vma->vm_start || addr >= vma->vm_end) return VM_FAULT_SIGBUS; + if (arch_needs_pgtable_deposit()) { + pgtable = pte_alloc_one(vma->vm_mm, addr); + if (!pgtable) + return VM_FAULT_OOM; + } + track_pfn_insert(vma, &pgprot, pfn); - insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write); + insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd); @@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, tlb->fullmm); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_dax(vma)) { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); -- cgit From 62be1511b1db8066220b18b7d4da2e6b9fdc69fb Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:59:46 -0700 Subject: mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC Patch series "more robust PF_MEMALLOC handling" This series aims to unify the setting and clearing of PF_MEMALLOC, which prevents recursive reclaim. There are some places that clear the flag unconditionally from current->flags, which may result in clearing a pre-existing flag. This already resulted in a bug report that Patch 1 fixes (without the new helpers, to make backporting easier). Patch 2 introduces the new helpers, modelled after existing memalloc_noio_* and memalloc_nofs_* helpers, and converts mm core to use them. Patches 3 and 4 convert non-mm code. This patch (of 4): __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock during page migration by lock_page() (see the comment in __unmap_and_move()). Then it unconditionally clears the flag, which can clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim. This was not a problem until commit a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath"), because direct compation was called only after direct reclaim, which was skipped when PF_MEMALLOC flag was set. Even now it's only a theoretical issue, as the new callsite of __alloc_pages_direct_compact() is reached only for costly orders and when gfp_pfmemalloc_allowed() is true, which means either __GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true. There is no such known context, but let's play it safe and make __alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is already set. Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath") Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka Reported-by: Andrey Ryabinin Acked-by: Michal Hocko Acked-by: Hillf Danton Cc: Mel Gorman Cc: Johannes Weiner Cc: Boris Brezillon Cc: Chris Leech Cc: "David S. Miller" Cc: Eric Dumazet Cc: Josef Bacik Cc: Lee Duncan Cc: Michal Hocko Cc: Richard Weinberger Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e7486afa7fa7..1daf509722c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3283,6 +3283,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, enum compact_priority prio, enum compact_result *compact_result) { struct page *page; + unsigned int noreclaim_flag = current->flags & PF_MEMALLOC; if (!order) return NULL; @@ -3290,7 +3291,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, current->flags |= PF_MEMALLOC; *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, prio); - current->flags &= ~PF_MEMALLOC; + current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag; if (*compact_result <= COMPACT_INACTIVE) return NULL; -- cgit From 499118e966f1d2150bd66647c8932343c4e9a0b8 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:59:50 -0700 Subject: mm: introduce memalloc_noreclaim_{save,restore} The previous patch ("mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC") has shown that simply setting and clearing PF_MEMALLOC in current->flags can result in wrongly clearing a pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim. Let's introduce helpers that support proper nesting by saving the previous stat of the flag, similar to the existing memalloc_noio_* and memalloc_nofs_* helpers. Convert existing setting/clearing of PF_MEMALLOC within mm to the new helpers. There are no known issues with the converted code, but the change makes it more robust. Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka Suggested-by: Michal Hocko Acked-by: Michal Hocko Acked-by: Hillf Danton Cc: Mel Gorman Cc: Johannes Weiner Cc: Andrey Ryabinin Cc: Boris Brezillon Cc: Chris Leech Cc: "David S. Miller" Cc: Eric Dumazet Cc: Josef Bacik Cc: Lee Duncan Cc: Michal Hocko Cc: Richard Weinberger Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched/mm.h | 12 ++++++++++++ mm/page_alloc.c | 11 ++++++----- mm/vmscan.c | 17 +++++++++++------ 3 files changed, 29 insertions(+), 11 deletions(-) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 9daabe138c99..2b24a6974847 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -191,4 +191,16 @@ static inline void memalloc_nofs_restore(unsigned int flags) current->flags = (current->flags & ~PF_MEMALLOC_NOFS) | flags; } +static inline unsigned int memalloc_noreclaim_save(void) +{ + unsigned int flags = current->flags & PF_MEMALLOC; + current->flags |= PF_MEMALLOC; + return flags; +} + +static inline void memalloc_noreclaim_restore(unsigned int flags) +{ + current->flags = (current->flags & ~PF_MEMALLOC) | flags; +} + #endif /* _LINUX_SCHED_MM_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1daf509722c7..f9e450c6b6e4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3283,15 +3283,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, enum compact_priority prio, enum compact_result *compact_result) { struct page *page; - unsigned int noreclaim_flag = current->flags & PF_MEMALLOC; + unsigned int noreclaim_flag; if (!order) return NULL; - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, prio); - current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag; + memalloc_noreclaim_restore(noreclaim_flag); if (*compact_result <= COMPACT_INACTIVE) return NULL; @@ -3438,12 +3438,13 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, { struct reclaim_state reclaim_state; int progress; + unsigned int noreclaim_flag; cond_resched(); /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); lockdep_set_current_reclaim_state(gfp_mask); reclaim_state.reclaimed_slab = 0; current->reclaim_state = &reclaim_state; @@ -3453,7 +3454,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, current->reclaim_state = NULL; lockdep_clear_current_reclaim_state(); - current->flags &= ~PF_MEMALLOC; + memalloc_noreclaim_restore(noreclaim_flag); cond_resched(); diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e7ed65842af..2f45c0520f43 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3036,6 +3036,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, struct zonelist *zonelist; unsigned long nr_reclaimed; int nid; + unsigned int noreclaim_flag; struct scan_control sc = { .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), .gfp_mask = (current_gfp_context(gfp_mask) & GFP_RECLAIM_MASK) | @@ -3062,9 +3063,9 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, sc.gfp_mask, sc.reclaim_idx); - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = do_try_to_free_pages(zonelist, &sc); - current->flags &= ~PF_MEMALLOC; + memalloc_noreclaim_restore(noreclaim_flag); trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); @@ -3589,8 +3590,9 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); struct task_struct *p = current; unsigned long nr_reclaimed; + unsigned int noreclaim_flag; - p->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); lockdep_set_current_reclaim_state(sc.gfp_mask); reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; @@ -3599,7 +3601,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) p->reclaim_state = NULL; lockdep_clear_current_reclaim_state(); - p->flags &= ~PF_MEMALLOC; + memalloc_noreclaim_restore(noreclaim_flag); return nr_reclaimed; } @@ -3764,6 +3766,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in struct task_struct *p = current; struct reclaim_state reclaim_state; int classzone_idx = gfp_zone(gfp_mask); + unsigned int noreclaim_flag; struct scan_control sc = { .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), .gfp_mask = (gfp_mask = current_gfp_context(gfp_mask)), @@ -3781,7 +3784,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in * and we also need to be able to write out pages for RECLAIM_WRITE * and RECLAIM_UNMAP. */ - p->flags |= PF_MEMALLOC | PF_SWAPWRITE; + noreclaim_flag = memalloc_noreclaim_save(); + p->flags |= PF_SWAPWRITE; lockdep_set_current_reclaim_state(gfp_mask); reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; @@ -3797,7 +3801,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in } p->reclaim_state = NULL; - current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE); + current->flags &= ~PF_SWAPWRITE; + memalloc_noreclaim_restore(noreclaim_flag); lockdep_clear_current_reclaim_state(); return sc.nr_reclaimed >= nr_pages; } -- cgit From f108304872b8d987ceab195174ba41153fb70bf6 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:59:53 -0700 Subject: treewide: convert PF_MEMALLOC manipulations to new helpers We now have memalloc_noreclaim_{save,restore} helpers for robust setting and clearing of PF_MEMALLOC. Let's convert the code which was using the generic tsk_restore_flags(). No functional change. [vbabka@suse.cz: in net/core/sock.c the hunk is missing] Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Cc: Josef Bacik Cc: Lee Duncan Cc: Chris Leech Cc: "David S. Miller" Cc: Eric Dumazet Cc: Mel Gorman Cc: Johannes Weiner Cc: Andrey Ryabinin Cc: Boris Brezillon Cc: Michal Hocko Cc: Richard Weinberger Cc: Wouter Verhelst Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/nbd.c | 7 ++++--- drivers/scsi/iscsi_tcp.c | 7 ++++--- net/core/dev.c | 7 ++++--- net/core/sock.c | 7 ++++--- 4 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index e9e2a9e95a66..9a7bb2c29447 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -347,7 +348,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send, struct socket *sock = config->socks[index]->sock; int result; struct msghdr msg; - unsigned long pflags = current->flags; + unsigned int noreclaim_flag; if (unlikely(!sock)) { dev_err_ratelimited(disk_to_dev(nbd->disk), @@ -358,7 +359,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send, msg.msg_iter = *iter; - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); do { sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC; msg.msg_name = NULL; @@ -381,7 +382,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send, *sent += result; } while (msg_data_left(&msg)); - current_restore_flags(pflags, PF_MEMALLOC); + memalloc_noreclaim_restore(noreclaim_flag); return result; } diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index bbea8eac9abb..4842fc0e809d 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -371,10 +372,10 @@ static inline int iscsi_sw_tcp_xmit_qlen(struct iscsi_conn *conn) static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task) { struct iscsi_conn *conn = task->conn; - unsigned long pflags = current->flags; + unsigned int noreclaim_flag; int rc = 0; - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); while (iscsi_sw_tcp_xmit_qlen(conn)) { rc = iscsi_sw_tcp_xmit(conn); @@ -387,7 +388,7 @@ static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task) rc = 0; } - current_restore_flags(pflags, PF_MEMALLOC); + memalloc_noreclaim_restore(noreclaim_flag); return rc; } diff --git a/net/core/dev.c b/net/core/dev.c index 99924d16f2bd..96cf83da0d66 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -81,6 +81,7 @@ #include #include #include +#include #include #include #include @@ -4235,7 +4236,7 @@ static int __netif_receive_skb(struct sk_buff *skb) int ret; if (sk_memalloc_socks() && skb_pfmemalloc(skb)) { - unsigned long pflags = current->flags; + unsigned int noreclaim_flag; /* * PFMEMALLOC skbs are special, they should @@ -4246,9 +4247,9 @@ static int __netif_receive_skb(struct sk_buff *skb) * Use PF_MEMALLOC as this saves us from propagating the allocation * context down to all allocation sites. */ - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); ret = __netif_receive_skb_core(skb, true); - current_restore_flags(pflags, PF_MEMALLOC); + memalloc_noreclaim_restore(noreclaim_flag); } else ret = __netif_receive_skb_core(skb, false); diff --git a/net/core/sock.c b/net/core/sock.c index b5baeb9cb0fb..79c6aee6af9b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -102,6 +102,7 @@ #include #include #include +#include #include #include #include @@ -372,14 +373,14 @@ EXPORT_SYMBOL_GPL(sk_clear_memalloc); int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb) { int ret; - unsigned long pflags = current->flags; + unsigned int noreclaim_flag; /* these should have been dropped before queueing */ BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); - current->flags |= PF_MEMALLOC; + noreclaim_flag = memalloc_noreclaim_save(); ret = sk->sk_backlog_rcv(sk, skb); - current_restore_flags(pflags, PF_MEMALLOC); + memalloc_noreclaim_restore(noreclaim_flag); return ret; } -- cgit From dcbe82149cc9d03dcdf7cd1a75d5541de7c14be1 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 8 May 2017 15:59:57 -0700 Subject: mtd: nand: nandsim: convert to memalloc_noreclaim_*() Nandsim has own functions set_memalloc() and clear_memalloc() for robust setting and clearing of PF_MEMALLOC. Replace them by the new generic helpers. No functional change. Link: http://lkml.kernel.org/r/20170405074700.29871-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Boris Brezillon Cc: Richard Weinberger Cc: Michal Hocko Cc: Mel Gorman Cc: Johannes Weiner Cc: Andrey Ryabinin Cc: Chris Leech Cc: "David S. Miller" Cc: Eric Dumazet Cc: Josef Bacik Cc: Lee Duncan Cc: Michal Hocko Cc: Adrian Hunter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/mtd/nand/nandsim.c | 29 +++++++++-------------------- 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c index c84742671a5f..092c9bd225be 100644 --- a/drivers/mtd/nand/nandsim.c +++ b/drivers/mtd/nand/nandsim.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -1368,31 +1369,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t return 0; } -static int set_memalloc(void) -{ - if (current->flags & PF_MEMALLOC) - return 0; - current->flags |= PF_MEMALLOC; - return 1; -} - -static void clear_memalloc(int memalloc) -{ - if (memalloc) - current->flags &= ~PF_MEMALLOC; -} - static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t pos) { ssize_t tx; - int err, memalloc; + int err; + unsigned int noreclaim_flag; err = get_pages(ns, file, count, pos); if (err) return err; - memalloc = set_memalloc(); + noreclaim_flag = memalloc_noreclaim_save(); tx = kernel_read(file, pos, buf, count); - clear_memalloc(memalloc); + memalloc_noreclaim_restore(noreclaim_flag); put_pages(ns); return tx; } @@ -1400,14 +1388,15 @@ static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t pos) { ssize_t tx; - int err, memalloc; + int err; + unsigned int noreclaim_flag; err = get_pages(ns, file, count, pos); if (err) return err; - memalloc = set_memalloc(); + noreclaim_flag = memalloc_noreclaim_save(); tx = kernel_write(file, buf, count, pos); - clear_memalloc(memalloc); + memalloc_noreclaim_restore(noreclaim_flag); put_pages(ns); return tx; } -- cgit From a9c42b33ed80968dd160e3be48c7e84ccf171cf9 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:00 -0700 Subject: dax: add tracepoints to dax_iomap_pte_fault() Patch series "second round of tracepoints for DAX". This second round of DAX tracepoint patches adds tracing to the PTE fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(), dax_insert_mapping()) and to the writeback path (dax_writeback_mapping_range(), dax_writeback_one()). The purpose of this tracing is to give us a high level view of what DAX is doing, whether faults are being serviced by PMDs or PTEs, and by real storage or by zero pages covering holes. I do have some patches nearly ready which also add tracing to grab_mapping_entry() and dax_insert_mapping_entry(). These are more targeted at logging how we are interacting with the radix tree, how we use empty entries for locking, whether we "downgrade" huge zero pages to 4k PTE sized allocations, etc. In the end it seemed to me that this might be too detailed to have as constantly present tracepoints, but if anyone sees value in having tracepoints like this in the DAX code permanently (Jan?), please let me know and I'll add those last two patches. All these tracepoints were done to be consistent with the style of the XFS tracepoints and with the existing DAX PMD tracepoints. This patch (of 6): Add tracepoints to dax_iomap_pte_fault(), following the same logging conventions as the rest of DAX. Here is an example fault that initially tries to be serviced by the PMD fault handler but which falls back to PTEs because the VMA isn't large enough to hold a PMD: small-1086 [005] .... 71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003 small-1086 [005] .... 71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 small-1086 [005] .... 71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK small-1086 [005] .... 71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1086 [005] .... 71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 15 +++++++++++---- include/trace/events/fs_dax.h | 41 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 4 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 43bbd6d1037d..f6c32d831af6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1150,13 +1150,16 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, int vmf_ret = 0; void *entry; + trace_dax_pte_fault(inode, vmf, vmf_ret); /* * Check whether offset isn't beyond end of file now. Caller is supposed * to hold locks serializing us with truncate / punch hole so this is * a reliable test. */ - if (pos >= i_size_read(inode)) - return VM_FAULT_SIGBUS; + if (pos >= i_size_read(inode)) { + vmf_ret = VM_FAULT_SIGBUS; + goto out; + } if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page) flags |= IOMAP_WRITE; @@ -1167,8 +1170,10 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, * that we never have to deal with more than a single extent here. */ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); - if (error) - return dax_fault_return(error); + if (error) { + vmf_ret = dax_fault_return(error); + goto out; + } if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) { vmf_ret = dax_fault_return(-EIO); /* fs corruption? */ goto finish_iomap; @@ -1252,6 +1257,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, */ ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap); } +out: + trace_dax_pte_fault_done(inode, vmf, vmf_ret); return vmf_ret; } diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index c566ddc87f73..cbcd7d64a18d 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -150,6 +150,47 @@ DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \ DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping); DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback); +DECLARE_EVENT_CLASS(dax_pte_fault_class, + TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result), + TP_ARGS(inode, vmf, result), + TP_STRUCT__entry( + __field(unsigned long, ino) + __field(unsigned long, vm_flags) + __field(unsigned long, address) + __field(pgoff_t, pgoff) + __field(dev_t, dev) + __field(unsigned int, flags) + __field(int, result) + ), + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->vm_flags = vmf->vma->vm_flags; + __entry->address = vmf->address; + __entry->flags = vmf->flags; + __entry->pgoff = vmf->pgoff; + __entry->result = result; + ), + TP_printk("dev %d:%d ino %#lx %s %s address %#lx pgoff %#lx %s", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->ino, + __entry->vm_flags & VM_SHARED ? "shared" : "private", + __print_flags(__entry->flags, "|", FAULT_FLAG_TRACE), + __entry->address, + __entry->pgoff, + __print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE) + ) +) + +#define DEFINE_PTE_FAULT_EVENT(name) \ +DEFINE_EVENT(dax_pte_fault_class, name, \ + TP_PROTO(struct inode *inode, struct vm_fault *vmf, int result), \ + TP_ARGS(inode, vmf, result)) + +DEFINE_PTE_FAULT_EVENT(dax_pte_fault); +DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done); + #endif /* _TRACE_FS_DAX_H */ /* This part must be outside protection */ -- cgit From c3ff68d7d1e6a24b7ad76d00ee583929858d4001 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:03 -0700 Subject: dax: add tracepoints to dax_pfn_mkwrite() Add tracepoints to dax_pfn_mkwrite(), following the same logging conventions as the rest of DAX. Here is an example PTE fault followed by a pfn_mkwrite: small_aligned-1094 [002] .... 374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 small_aligned-1094 [002] .... 374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE small_aligned-1094 [002] .... 374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 3 +++ include/trace/events/fs_dax.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index f6c32d831af6..d10524ab7e55 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -927,6 +927,7 @@ int dax_pfn_mkwrite(struct vm_fault *vmf) { struct file *file = vmf->vma->vm_file; struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; void *entry, **slot; pgoff_t index = vmf->pgoff; @@ -936,6 +937,7 @@ int dax_pfn_mkwrite(struct vm_fault *vmf) if (entry) put_unlocked_mapping_entry(mapping, index, entry); spin_unlock_irq(&mapping->tree_lock); + trace_dax_pfn_mkwrite_no_entry(inode, vmf, VM_FAULT_NOPAGE); return VM_FAULT_NOPAGE; } radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY); @@ -948,6 +950,7 @@ int dax_pfn_mkwrite(struct vm_fault *vmf) */ finish_mkwrite_fault(vmf); put_locked_mapping_entry(mapping, index, entry); + trace_dax_pfn_mkwrite(inode, vmf, VM_FAULT_NOPAGE); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(dax_pfn_mkwrite); diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index cbcd7d64a18d..b5a520961f8d 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -190,6 +190,8 @@ DEFINE_EVENT(dax_pte_fault_class, name, \ DEFINE_PTE_FAULT_EVENT(dax_pte_fault); DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done); +DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite_no_entry); +DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite); #endif /* _TRACE_FS_DAX_H */ -- cgit From 678c9fd0430a1431bd9901c76f41e04fcb3eac87 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:07 -0700 Subject: dax: add tracepoints to dax_load_hole() Add tracepoints to dax_load_hole(), following the same logging conventions as the rest of DAX. Here is the logging generated by a PTE read from a hole: read-1075 [002] .... 62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 read-1075 [002] .... 62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE read-1075 [002] .... 62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 16 +++++++++++----- include/trace/events/fs_dax.h | 1 + 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index d10524ab7e55..36fafff2e82f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -509,21 +509,25 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, static int dax_load_hole(struct address_space *mapping, void **entry, struct vm_fault *vmf) { + struct inode *inode = mapping->host; struct page *page; int ret; /* Hole page already exists? Return it... */ if (!radix_tree_exceptional_entry(*entry)) { page = *entry; - goto out; + goto finish_fault; } /* This will replace locked radix tree entry with a hole page */ page = find_or_create_page(mapping, vmf->pgoff, vmf->gfp_mask | __GFP_ZERO); - if (!page) - return VM_FAULT_OOM; - out: + if (!page) { + ret = VM_FAULT_OOM; + goto out; + } + +finish_fault: vmf->page = page; ret = finish_fault(vmf); vmf->page = NULL; @@ -531,8 +535,10 @@ static int dax_load_hole(struct address_space *mapping, void **entry, if (!ret) { /* Grab reference for PTE that is now referencing the page */ get_page(page); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; } +out: + trace_dax_load_hole(inode, vmf, ret); return ret; } diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index b5a520961f8d..2f15dfea7fb1 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -192,6 +192,7 @@ DEFINE_PTE_FAULT_EVENT(dax_pte_fault); DEFINE_PTE_FAULT_EVENT(dax_pte_fault_done); DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite_no_entry); DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite); +DEFINE_PTE_FAULT_EVENT(dax_load_hole); #endif /* _TRACE_FS_DAX_H */ -- cgit From d14a3f48a152b75a1e690d443f509c07c7b06c0e Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:10 -0700 Subject: dax: add tracepoints to dax_writeback_mapping_range() Add tracepoints to dax_writeback_mapping_range(), following the same logging conventions as the rest of DAX. Here is an example writeback call: msync-1085 [006] .... 200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff msync-1085 [006] .... 200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()] Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 12 +++++++----- include/trace/events/fs_dax.h | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 36fafff2e82f..7cf2686761e2 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -863,6 +863,8 @@ int dax_writeback_mapping_range(struct address_space *mapping, start_index = wbc->range_start >> PAGE_SHIFT; end_index = wbc->range_end >> PAGE_SHIFT; + trace_dax_writeback_range(inode, start_index, end_index); + tag_pages_for_writeback(mapping, start_index, end_index); pagevec_init(&pvec, 0); @@ -882,14 +884,14 @@ int dax_writeback_mapping_range(struct address_space *mapping, ret = dax_writeback_one(bdev, dax_dev, mapping, indices[i], pvec.pages[i]); - if (ret < 0) { - put_dax(dax_dev); - return ret; - } + if (ret < 0) + goto out; } } +out: put_dax(dax_dev); - return 0; + trace_dax_writeback_range_done(inode, start_index, end_index); + return (ret < 0 ? ret : 0); } EXPORT_SYMBOL_GPL(dax_writeback_mapping_range); diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index 2f15dfea7fb1..9afe8c8f0bef 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -194,6 +194,38 @@ DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite_no_entry); DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite); DEFINE_PTE_FAULT_EVENT(dax_load_hole); +DECLARE_EVENT_CLASS(dax_writeback_range_class, + TP_PROTO(struct inode *inode, pgoff_t start_index, pgoff_t end_index), + TP_ARGS(inode, start_index, end_index), + TP_STRUCT__entry( + __field(unsigned long, ino) + __field(pgoff_t, start_index) + __field(pgoff_t, end_index) + __field(dev_t, dev) + ), + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->start_index = start_index; + __entry->end_index = end_index; + ), + TP_printk("dev %d:%d ino %#lx pgoff %#lx-%#lx", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->ino, + __entry->start_index, + __entry->end_index + ) +) + +#define DEFINE_WRITEBACK_RANGE_EVENT(name) \ +DEFINE_EVENT(dax_writeback_range_class, name, \ + TP_PROTO(struct inode *inode, pgoff_t start_index, pgoff_t end_index),\ + TP_ARGS(inode, start_index, end_index)) + +DEFINE_WRITEBACK_RANGE_EVENT(dax_writeback_range); +DEFINE_WRITEBACK_RANGE_EVENT(dax_writeback_range_done); + #endif /* _TRACE_FS_DAX_H */ /* This part must be outside protection */ -- cgit From f9bc3a07539bc80b4da9ff2f5d6c13d5c7a4f073 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:13 -0700 Subject: dax: add tracepoint to dax_writeback_one() Add a tracepoint to dax_writeback_one(), following the same logging conventions as the rest of DAX. Here is an example range writeback which ends up flushing one PMD and one PTE: test-1265 [003] .... 496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff test-1265 [003] .... 496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200 test-1265 [003] .... 496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1 test-1265 [003] .... 496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared] Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 1 + include/trace/events/fs_dax.h | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index 7cf2686761e2..9bec30e06211 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -823,6 +823,7 @@ static int dax_writeback_one(struct block_device *bdev, spin_lock_irq(&mapping->tree_lock); radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY); spin_unlock_irq(&mapping->tree_lock); + trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT); dax_unlock: dax_read_unlock(id); put_locked_mapping_entry(mapping, index, entry); diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index 9afe8c8f0bef..292a4719edd0 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -226,6 +226,30 @@ DEFINE_EVENT(dax_writeback_range_class, name, \ DEFINE_WRITEBACK_RANGE_EVENT(dax_writeback_range); DEFINE_WRITEBACK_RANGE_EVENT(dax_writeback_range_done); +TRACE_EVENT(dax_writeback_one, + TP_PROTO(struct inode *inode, pgoff_t pgoff, pgoff_t pglen), + TP_ARGS(inode, pgoff, pglen), + TP_STRUCT__entry( + __field(unsigned long, ino) + __field(pgoff_t, pgoff) + __field(pgoff_t, pglen) + __field(dev_t, dev) + ), + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->pgoff = pgoff; + __entry->pglen = pglen; + ), + TP_printk("dev %d:%d ino %#lx pgoff %#lx pglen %#lx", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->ino, + __entry->pgoff, + __entry->pglen + ) +) + #endif /* _TRACE_FS_DAX_H */ /* This part must be outside protection */ -- cgit From b4440734583c3addf80558e8fde2b61e2d76328c Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 8 May 2017 16:00:16 -0700 Subject: dax: add tracepoint to dax_insert_mapping() Add a tracepoint to dax_insert_mapping(), following the same logging conventions as the rest of DAX. This tracepoint, along with the one in dax_load_hole(), lets us know how a DAX PTE fault was serviced. Here is an example DAX fault that inserts a PTE mapping: small-1126 [007] .... 145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 small-1126 [007] .... 145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006 small-1126 [007] .... 145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Dan Williams Cc: Ingo Molnar Cc: Matthew Wilcox Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dax.c | 1 + include/trace/events/fs_dax.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index 9bec30e06211..66d79067eedf 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -925,6 +925,7 @@ static int dax_insert_mapping(struct address_space *mapping, return PTR_ERR(ret); *entryp = ret; + trace_dax_insert_mapping(mapping->host, vmf, ret); return vm_insert_mixed(vma, vaddr, pfn); } diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h index 292a4719edd0..08bb3ed18dcc 100644 --- a/include/trace/events/fs_dax.h +++ b/include/trace/events/fs_dax.h @@ -194,6 +194,36 @@ DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite_no_entry); DEFINE_PTE_FAULT_EVENT(dax_pfn_mkwrite); DEFINE_PTE_FAULT_EVENT(dax_load_hole); +TRACE_EVENT(dax_insert_mapping, + TP_PROTO(struct inode *inode, struct vm_fault *vmf, void *radix_entry), + TP_ARGS(inode, vmf, radix_entry), + TP_STRUCT__entry( + __field(unsigned long, ino) + __field(unsigned long, vm_flags) + __field(unsigned long, address) + __field(void *, radix_entry) + __field(dev_t, dev) + __field(int, write) + ), + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->vm_flags = vmf->vma->vm_flags; + __entry->address = vmf->address; + __entry->write = vmf->flags & FAULT_FLAG_WRITE; + __entry->radix_entry = radix_entry; + ), + TP_printk("dev %d:%d ino %#lx %s %s address %#lx radix_entry %#lx", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->ino, + __entry->vm_flags & VM_SHARED ? "shared" : "private", + __entry->write ? "write" : "read", + __entry->address, + (unsigned long)__entry->radix_entry + ) +) + DECLARE_EVENT_CLASS(dax_writeback_range_class, TP_PROTO(struct inode *inode, pgoff_t start_index, pgoff_t end_index), TP_ARGS(inode, start_index, end_index), -- cgit From 4e5ce33ceb3250f564656588da4d47f3eca7d2af Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 8 May 2017 16:00:19 -0700 Subject: selftests/vm: add a test for virtual address range mapping This verifies virtual address mapping below and above the 128TB range and makes sure that address returned are within the expected range depending upon the hint passed from the user space. Link: http://lkml.kernel.org/r/20170418095252.20533-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual Cc: Michal Suchanek Cc: Aneesh Kumar K.V Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/run_vmtests | 11 ++ tools/testing/selftests/vm/virtual_address_range.c | 122 +++++++++++++++++++++ 3 files changed, 134 insertions(+) create mode 100644 tools/testing/selftests/vm/virtual_address_range.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index dba889004ea1..cbb29e41ef2b 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -16,6 +16,7 @@ TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd TEST_GEN_FILES += mlock-random-test +TEST_GEN_FILES += virtual_address_range TEST_PROGS := run_vmtests diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 3214a6456d13..0640923ded7e 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -165,4 +165,15 @@ else echo "[PASS]" fi +echo "-----------------------------" +echo "running virtual_address_range" +echo "-----------------------------" +./virtual_address_range +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + exit $exitcode diff --git a/tools/testing/selftests/vm/virtual_address_range.c b/tools/testing/selftests/vm/virtual_address_range.c new file mode 100644 index 000000000000..3b02aa6eb9da --- /dev/null +++ b/tools/testing/selftests/vm/virtual_address_range.c @@ -0,0 +1,122 @@ +/* + * Copyright 2017, Anshuman Khandual, IBM Corp. + * Licensed under GPLv2. + * + * Works on architectures which support 128TB virtual + * address range and beyond. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Maximum address range mapped with a single mmap() + * call is little bit more than 16GB. Hence 16GB is + * chosen as the single chunk size for address space + * mapping. + */ +#define MAP_CHUNK_SIZE 17179869184UL /* 16GB */ + +/* + * Address space till 128TB is mapped without any hint + * and is enabled by default. Address space beyond 128TB + * till 512TB is obtained by passing hint address as the + * first argument into mmap() system call. + * + * The process heap address space is divided into two + * different areas one below 128TB and one above 128TB + * till it reaches 512TB. One with size 128TB and the + * other being 384TB. + */ +#define NR_CHUNKS_128TB 8192UL /* Number of 16GB chunks for 128TB */ +#define NR_CHUNKS_384TB 24576UL /* Number of 16GB chunks for 384TB */ + +#define ADDR_MARK_128TB (1UL << 47) /* First address beyond 128TB */ + +static char *hind_addr(void) +{ + int bits = 48 + rand() % 15; + + return (char *) (1UL << bits); +} + +static int validate_addr(char *ptr, int high_addr) +{ + unsigned long addr = (unsigned long) ptr; + + if (high_addr) { + if (addr < ADDR_MARK_128TB) { + printf("Bad address %lx\n", addr); + return 1; + } + return 0; + } + + if (addr > ADDR_MARK_128TB) { + printf("Bad address %lx\n", addr); + return 1; + } + return 0; +} + +static int validate_lower_address_hint(void) +{ + char *ptr; + + ptr = mmap((void *) (1UL << 45), MAP_CHUNK_SIZE, PROT_READ | + PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + + if (ptr == MAP_FAILED) + return 0; + + return 1; +} + +int main(int argc, char *argv[]) +{ + char *ptr[NR_CHUNKS_128TB]; + char *hptr[NR_CHUNKS_384TB]; + char *hint; + unsigned long i, lchunks, hchunks; + + for (i = 0; i < NR_CHUNKS_128TB; i++) { + ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + + if (ptr[i] == MAP_FAILED) { + if (validate_lower_address_hint()) + return 1; + break; + } + + if (validate_addr(ptr[i], 0)) + return 1; + } + lchunks = i; + + for (i = 0; i < NR_CHUNKS_384TB; i++) { + hint = hind_addr(); + hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + + if (hptr[i] == MAP_FAILED) + break; + + if (validate_addr(hptr[i], 1)) + return 1; + } + hchunks = i; + + for (i = 0; i < lchunks; i++) + munmap(ptr[i], MAP_CHUNK_SIZE); + + for (i = 0; i < hchunks; i++) + munmap(hptr[i], MAP_CHUNK_SIZE); + + return 0; +} -- cgit From 4d2b5bcab53f1c76a86279339561c9a36109a93b Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 8 May 2017 16:00:22 -0700 Subject: drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0]..template_ahash') drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].') drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].') ... gcc-4.4.4 has issues with anon union initializers. Work around this. Cc: Gilad Ben-Yossef Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/staging/ccree/ssi_hash.c | 236 +++++++++++++++++++++------------------ 1 file changed, 126 insertions(+), 110 deletions(-) diff --git a/drivers/staging/ccree/ssi_hash.c b/drivers/staging/ccree/ssi_hash.c index 8ff5d4ec9e5c..f99d4219b01e 100644 --- a/drivers/staging/ccree/ssi_hash.c +++ b/drivers/staging/ccree/ssi_hash.c @@ -1987,22 +1987,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-sha1-dx", .blocksize = SHA1_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = SHA1_DIGEST_SIZE, - .statesize = sizeof(struct sha1_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = SHA1_DIGEST_SIZE, + .statesize = sizeof(struct sha1_state), + }, }, - }, + }, .hash_mode = DRV_HASH_SHA1, .hw_mode = DRV_HASH_HW_SHA1, .inter_digestsize = SHA1_DIGEST_SIZE, @@ -2014,22 +2016,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-sha256-dx", .blocksize = SHA256_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = SHA256_DIGEST_SIZE, - .statesize = sizeof(struct sha256_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = SHA256_DIGEST_SIZE, + .statesize = sizeof(struct sha256_state), + }, }, - }, + }, .hash_mode = DRV_HASH_SHA256, .hw_mode = DRV_HASH_HW_SHA256, .inter_digestsize = SHA256_DIGEST_SIZE, @@ -2041,22 +2045,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-sha224-dx", .blocksize = SHA224_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = SHA224_DIGEST_SIZE, - .statesize = sizeof(struct sha256_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = SHA224_DIGEST_SIZE, + .statesize = sizeof(struct sha256_state), + }, }, - }, + }, .hash_mode = DRV_HASH_SHA224, .hw_mode = DRV_HASH_HW_SHA256, .inter_digestsize = SHA256_DIGEST_SIZE, @@ -2069,22 +2075,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-sha384-dx", .blocksize = SHA384_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = SHA384_DIGEST_SIZE, - .statesize = sizeof(struct sha512_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = SHA384_DIGEST_SIZE, + .statesize = sizeof(struct sha512_state), + }, }, - }, + }, .hash_mode = DRV_HASH_SHA384, .hw_mode = DRV_HASH_HW_SHA512, .inter_digestsize = SHA512_DIGEST_SIZE, @@ -2096,22 +2104,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-sha512-dx", .blocksize = SHA512_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = SHA512_DIGEST_SIZE, - .statesize = sizeof(struct sha512_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = SHA512_DIGEST_SIZE, + .statesize = sizeof(struct sha512_state), + }, }, - }, + }, .hash_mode = DRV_HASH_SHA512, .hw_mode = DRV_HASH_HW_SHA512, .inter_digestsize = SHA512_DIGEST_SIZE, @@ -2124,22 +2134,24 @@ static struct ssi_hash_template driver_hash[] = { .hmac_driver_name = "hmac-md5-dx", .blocksize = MD5_HMAC_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_ahash_update, - .final = ssi_ahash_final, - .finup = ssi_ahash_finup, - .digest = ssi_ahash_digest, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_ahash_update, + .final = ssi_ahash_final, + .finup = ssi_ahash_finup, + .digest = ssi_ahash_digest, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .setkey = ssi_ahash_setkey, - .halg = { - .digestsize = MD5_DIGEST_SIZE, - .statesize = sizeof(struct md5_state), + .setkey = ssi_ahash_setkey, + .halg = { + .digestsize = MD5_DIGEST_SIZE, + .statesize = sizeof(struct md5_state), + }, }, - }, + }, .hash_mode = DRV_HASH_MD5, .hw_mode = DRV_HASH_HW_MD5, .inter_digestsize = MD5_DIGEST_SIZE, @@ -2149,52 +2161,56 @@ static struct ssi_hash_template driver_hash[] = { .driver_name = "xcbc-aes-dx", .blocksize = AES_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_mac_update, - .final = ssi_mac_final, - .finup = ssi_mac_finup, - .digest = ssi_mac_digest, - .setkey = ssi_xcbc_setkey, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_mac_update, + .final = ssi_mac_final, + .finup = ssi_mac_finup, + .digest = ssi_mac_digest, + .setkey = ssi_xcbc_setkey, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .halg = { - .digestsize = AES_BLOCK_SIZE, - .statesize = sizeof(struct aeshash_state), + .halg = { + .digestsize = AES_BLOCK_SIZE, + .statesize = sizeof(struct aeshash_state), + }, }, - }, - .hash_mode = DRV_HASH_NULL, - .hw_mode = DRV_CIPHER_XCBC_MAC, - .inter_digestsize = AES_BLOCK_SIZE, }, + .hash_mode = DRV_HASH_NULL, + .hw_mode = DRV_CIPHER_XCBC_MAC, + .inter_digestsize = AES_BLOCK_SIZE, + }, #if SSI_CC_HAS_CMAC { .name = "cmac(aes)", .driver_name = "cmac-aes-dx", .blocksize = AES_BLOCK_SIZE, .synchronize = false, - .template_ahash = { - .init = ssi_ahash_init, - .update = ssi_mac_update, - .final = ssi_mac_final, - .finup = ssi_mac_finup, - .digest = ssi_mac_digest, - .setkey = ssi_cmac_setkey, + { + .template_ahash = { + .init = ssi_ahash_init, + .update = ssi_mac_update, + .final = ssi_mac_final, + .finup = ssi_mac_finup, + .digest = ssi_mac_digest, + .setkey = ssi_cmac_setkey, #ifdef EXPORT_FIXED - .export = ssi_ahash_export, - .import = ssi_ahash_import, + .export = ssi_ahash_export, + .import = ssi_ahash_import, #endif - .halg = { - .digestsize = AES_BLOCK_SIZE, - .statesize = sizeof(struct aeshash_state), + .halg = { + .digestsize = AES_BLOCK_SIZE, + .statesize = sizeof(struct aeshash_state), + }, }, - }, - .hash_mode = DRV_HASH_NULL, - .hw_mode = DRV_CIPHER_CMAC, - .inter_digestsize = AES_BLOCK_SIZE, }, + .hash_mode = DRV_HASH_NULL, + .hw_mode = DRV_CIPHER_CMAC, + .inter_digestsize = AES_BLOCK_SIZE, + }, #endif }; -- cgit