From 7153884c088a9500f9379aeec877f3d4d4ec4fba Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 30 Nov 2017 11:21:57 -0600 Subject: PCI: Remove PCI_REASSIGN_ALL_RSRC use on arm and arm64 On arm, PCI_REASSIGN_ALL_RSRC is used only in pcibios_assign_all_busses(), which helps decide whether to reconfigure bridge bus numbers. It has nothing to do with BAR assignments. On arm64 and powerpc, pcibios_assign_all_busses() tests PCI_REASSIGN_ALL_BUS, which makes more sense. Align arm with arm64 and powerpc, so they all use PCI_REASSIGN_ALL_BUS for pcibios_assign_all_busses(). Remove PCI_REASSIGN_ALL_RSRC from the generic, Tegra, Versatile, and R-Car drivers. These drivers are used only on arm or arm64, where PCI_REASSIGN_ALL_RSRC is not used after this change, so removing it should have no effect. No functional change intended. Signed-off-by: Bjorn Helgaas Reviewed-by: Manikanta Maddireddy Reviewed-by: Lorenzo Pieralisi --- arch/arm/include/asm/pci.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/pci.h b/arch/arm/include/asm/pci.h index 960d9dc4f380..1f0de808d111 100644 --- a/arch/arm/include/asm/pci.h +++ b/arch/arm/include/asm/pci.h @@ -10,10 +10,7 @@ extern unsigned long pcibios_min_io; extern unsigned long pcibios_min_mem; #define PCIBIOS_MIN_MEM pcibios_min_mem -static inline int pcibios_assign_all_busses(void) -{ - return pci_has_flag(PCI_REASSIGN_ALL_RSRC); -} +#define pcibios_assign_all_busses() pci_has_flag(PCI_REASSIGN_ALL_BUS) #ifdef CONFIG_PCI_DOMAINS static inline int pci_proc_domain(struct pci_bus *bus) -- cgit From 0ade34c37012ea5c516d9aa4d19a56e9f40a55ed Mon Sep 17 00:00:00 2001 From: Clement Courbet Date: Tue, 6 Feb 2018 15:38:34 -0800 Subject: lib: optimize cpumask_next_and() We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). It's essentially a joined iteration in search for a non-zero bit, which is currently implemented as a lookup join (find a nonzero bit on the lhs, lookup the rhs to see if it's set there). Implement a direct join (find a nonzero bit on the incrementally built join). Also add generic bitmap benchmarks in the new `test_find_bit` module for new function (see `find_next_and_bit` in [2] and [3] below). For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory usage. Note that on Arm, the new pure-C implementation still outperforms the old one that uses a mix of C and asm (`find_next_bit`) [3]. [1] Approximate benchmark code: ``` unsigned long src1p[nr_cpumask_longs] = {pattern1}; unsigned long src2p[nr_cpumask_longs] = {pattern2}; for (/*a bunch of repetitions*/) { for (int n = -1; n <= nr_cpu_ids; ++n) { asm volatile("" : "+rm"(src1p)); // prevent any optimization asm volatile("" : "+rm"(src2p)); unsigned long result = cpumask_next_and(n, src1p, src2p); asm volatile("" : "+rm"(result)); } } ``` Results: pattern1 pattern2 time_before/time_after 0x0000ffff 0x0000ffff 1.65 0x0000ffff 0x00005555 2.24 0x0000ffff 0x00001111 2.94 0x0000ffff 0x00000000 14.0 0x00005555 0x0000ffff 1.67 0x00005555 0x00005555 1.71 0x00005555 0x00001111 1.90 0x00005555 0x00000000 6.58 0x00001111 0x0000ffff 1.46 0x00001111 0x00005555 1.49 0x00001111 0x00001111 1.45 0x00001111 0x00000000 3.10 0x00000000 0x0000ffff 1.18 0x00000000 0x00005555 1.18 0x00000000 0x00001111 1.17 0x00000000 0x00000000 1.25 ----------------------------- geo.mean 2.06 [2] test_find_next_bit, X86 (skylake) [ 3913.477422] Start testing find_bit() with random-filled bitmap [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations [ 3913.480216] Start testing find_next_and_bit() with random-filled bitmap [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations [ 3913.481075] Start testing find_bit() with sparse bitmap [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations [3] test_find_next_bit, arm (v7 odroid XU3). [ 267.206928] Start testing find_bit() with random-filled bitmap [ 267.214752] find_next_bit: 4474 cycles, 16419 iterations [ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations [ 267.229294] find_last_bit: 4209 cycles, 16419 iterations [ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations [ 267.286265] Start testing find_next_and_bit() with random-filled bitmap [ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations [ 267.309422] Start testing find_bit() with sparse bitmap [ 267.316054] find_next_bit: 191 cycles, 66 iterations [ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations [ 267.329803] find_last_bit: 84 cycles, 66 iterations [ 267.336169] find_first_bit: 4118 cycles, 66 iterations [ 267.342627] Start testing find_next_and_bit() with sparse bitmap [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations [courbet@google.com: v6] Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com [geert@linux-m68k.org: m68k/bitops: always include ] Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com Signed-off-by: Clement Courbet Signed-off-by: Geert Uytterhoeven Cc: Yury Norov Cc: Geert Uytterhoeven Cc: Alexey Dobriyan Cc: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm/include/asm/bitops.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h index ce5ee762ed66..4cab9bb823fb 100644 --- a/arch/arm/include/asm/bitops.h +++ b/arch/arm/include/asm/bitops.h @@ -338,6 +338,7 @@ static inline int find_next_bit_le(const void *p, int size, int offset) #endif +#include #include /* -- cgit