Age | Commit message (Collapse) | Author |
|
This replaces the incorrectly spelled word "localtion" with "location"
in some power8 PMU event descriptions.
Fixes: 2a81fa3bb5ed ("perf vendor events: Add power8 PMU events")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20201012050205.328523-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For comparison, it now runs the benchmark twice - one if regular -b and
another for --buildid-all.
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 21.002 msec (+- 0.172 msec)
Average time per event: 2.059 usec (+- 0.017 usec)
Average memory usage: 8169 KB (+- 0 KB)
Average build-id-all injection took: 19.543 msec (+- 0.124 msec)
Average time per event: 1.916 usec (+- 0.012 usec)
Average memory usage: 7348 KB (+- 0 KB)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like 'perf record', we can even more speedup build-id processing by just
using all DSOs. Then we don't need to look at all the sample events
anymore. The following patch will update 'perf bench' to show the result
of the --buildid-all option too.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Original-patch-by: Stephane Eranian <eranian@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
No need to load symbols in a DSO when injecting build-id. I guess the
reason was to check the DSO is a special file like anon files. Use some
helper functions in map.c to check them before reading build-id. Also
pass sample event's cpumode to a new build-id event.
It brought a speedup in the benchmark of 25 -> 21 msec on my laptop.
Also the memory usage (Max RSS) went down by ~200 KB.
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 21.389 msec (+- 0.138 msec)
Average time per event: 2.097 usec (+- 0.014 usec)
Average memory usage: 8225 KB (+- 0 KB)
Committer notes:
Before:
$ perf stat -r5 perf bench internals inject-build-id > /dev/null
Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
4,020.56 msec task-clock:u # 1.271 CPUs utilized ( +- 0.74% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
123,354 page-faults:u # 0.031 M/sec ( +- 0.81% )
7,119,951,568 cycles:u # 1.771 GHz ( +- 1.74% ) (83.27%)
230,086,969 stalled-cycles-frontend:u # 3.23% frontend cycles idle ( +- 1.97% ) (83.41%)
1,168,298,765 stalled-cycles-backend:u # 16.41% backend cycles idle ( +- 1.13% ) (83.44%)
11,173,083,669 instructions:u # 1.57 insn per cycle
# 0.10 stalled cycles per insn ( +- 1.58% ) (83.31%)
2,413,908,936 branches:u # 600.392 M/sec ( +- 1.69% ) (83.26%)
46,576,289 branch-misses:u # 1.93% of all branches ( +- 2.20% ) (83.31%)
3.1638 +- 0.0309 seconds time elapsed ( +- 0.98% )
$
After:
$ perf stat -r5 perf bench internals inject-build-id > /dev/null
Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
2,379.94 msec task-clock:u # 1.473 CPUs utilized ( +- 0.18% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
62,584 page-faults:u # 0.026 M/sec ( +- 0.07% )
2,372,389,668 cycles:u # 0.997 GHz ( +- 0.29% ) (83.14%)
106,937,862 stalled-cycles-frontend:u # 4.51% frontend cycles idle ( +- 4.89% ) (83.20%)
581,697,915 stalled-cycles-backend:u # 24.52% backend cycles idle ( +- 0.71% ) (83.47%)
3,659,692,199 instructions:u # 1.54 insn per cycle
# 0.16 stalled cycles per insn ( +- 0.10% ) (83.63%)
791,372,961 branches:u # 332.518 M/sec ( +- 0.27% ) (83.39%)
10,648,083 branch-misses:u # 1.35% of all branches ( +- 0.22% ) (83.16%)
1.61570 +- 0.00172 seconds time elapsed ( +- 0.11% )
$
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Original-patch-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It should be in a proper mnt namespace when accessing the file.
I think this had no problem since the build-id was actually read from
map__load() -> dso__load() already. But I'd like to change it in the
following commit.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I found some events (like PERF_RECORD_CGROUP) are not copied by perf
inject due to the missing callbacks. Let's add them.
While at it, I've changed the order of the callbacks to match with
struct perf_tool so that we can compare them easily.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes I can see that 'perf record' piped with 'perf inject' take a
long time processing build-ids.
So introduce a inject-build-id benchmark to the internals benchmark
suite to measure its overhead regularly.
It runs the 'perf inject' command internally and feeds the given number
of synthesized events (MMAP2 + SAMPLE basically).
Usage: perf bench internals inject-build-id <options>
-i, --iterations <n> Number of iterations used to compute average (default: 100)
-m, --nr-mmaps <n> Number of mmap events for each iteration (default: 100)
-n, --nr-samples <n> Number of sample events per mmap event (default: 100)
-v, --verbose be more verbose (show iteration count, DSO name, etc)
By default, it measures average processing time of 100 MMAP2 events
and 10000 SAMPLE events. Below is a result on my laptop.
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 25.789 msec (+- 0.202 msec)
Average time per event: 2.528 usec (+- 0.020 usec)
Average memory usage: 8411 KB (+- 7 KB)
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$ perf bench internals
# List of available benchmarks for collection 'internals':
synthesize: Benchmark perf event synthesis
kallsyms-parse: Benchmark kallsyms parsing
inject-build-id: Benchmark build-id injection
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.202 msec (+- 0.059 msec)
Average time per event: 1.392 usec (+- 0.006 usec)
Average memory usage: 12650 KB (+- 10 KB)
Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
Average time per event: 1.258 usec (+- 0.007 usec)
Average memory usage: 11895 KB (+- 10 KB)
$
$ perf stat -r5 perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.380 msec (+- 0.056 msec)
Average time per event: 1.410 usec (+- 0.006 usec)
Average memory usage: 12608 KB (+- 11 KB)
Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
Average time per event: 1.166 usec (+- 0.006 usec)
Average memory usage: 11838 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.246 msec (+- 0.065 msec)
Average time per event: 1.397 usec (+- 0.006 usec)
Average memory usage: 12744 KB (+- 10 KB)
Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
Average time per event: 1.178 usec (+- 0.006 usec)
Average memory usage: 11963 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.321 msec (+- 0.067 msec)
Average time per event: 1.404 usec (+- 0.007 usec)
Average memory usage: 12690 KB (+- 10 KB)
Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
Average time per event: 1.168 usec (+- 0.004 usec)
Average memory usage: 11938 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.287 msec (+- 0.059 msec)
Average time per event: 1.401 usec (+- 0.006 usec)
Average memory usage: 12864 KB (+- 10 KB)
Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
Average time per event: 1.163 usec (+- 0.006 usec)
Average memory usage: 12103 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.402 msec (+- 0.053 msec)
Average time per event: 1.412 usec (+- 0.005 usec)
Average memory usage: 12876 KB (+- 10 KB)
Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
Average time per event: 1.159 usec (+- 0.006 usec)
Average memory usage: 12111 KB (+- 10 KB)
Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
4,267.48 msec task-clock:u # 1.502 CPUs utilized ( +- 0.14% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
102,092 page-faults:u # 0.024 M/sec ( +- 0.08% )
3,894,589,578 cycles:u # 0.913 GHz ( +- 0.19% ) (83.49%)
140,078,421 stalled-cycles-frontend:u # 3.60% frontend cycles idle ( +- 0.77% ) (83.34%)
948,581,189 stalled-cycles-backend:u # 24.36% backend cycles idle ( +- 0.46% ) (83.25%)
5,835,587,719 instructions:u # 1.50 insn per cycle
# 0.16 stalled cycles per insn ( +- 0.21% ) (83.24%)
1,267,423,636 branches:u # 296.996 M/sec ( +- 0.22% ) (83.12%)
17,484,290 branch-misses:u # 1.38% of all branches ( +- 0.12% ) (83.55%)
2.84176 +- 0.00222 seconds time elapsed ( +- 0.08% )
$
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some hardware designs have USB typec connector attached to both
SoC and super speed mux. We need to use separate connector node for
such design.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/r/20200920134905.4370-2-biju.das.jz@bp.renesas.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The 'sibs' variable would be shifted as a 32-bit integer, so if 'shift'
is more than 32, this is undefined behaviour. In practice, this doesn't
happen because the page cache is the only user and nobody uses 16TB pages.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
* pm-avs:
MAINTAINERS: drop myself from PM AVS drivers
PM: AVS: qcom-cpr: simplify the return expression of cpr_disable()
* powercap:
powercap: include header to fix -Wmissing-prototypes
|
|
* pm-core:
PM: runtime: Fix timer_expires data type on 32-bit arches
PM: runtime: Remove link state checks in rpm_get/put_supplier()
* pm-sleep:
ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
ACPI: EC: PM: Flush EC work unconditionally after wakeup
PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
PM: hibernate: Batch hibernate and resume IO requests
* pm-pci:
PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
* pm-domains:
PM: domains: Allow to abort power off when no ->power_off() callback
PM: domains: Rename power state enums for genpd
|
|
and 'acpi-pci'
* acpi-extlog:
ACPI / extlog: Check for RDMSR failure
* acpi-memhotplug:
ACPI: memhotplug: Remove 'state' from struct acpi_memory_device
* acpi-button:
ACPI: button: fix handling lid state changes when input device closed
* acpi-tools:
tools/power/acpi: Serialize Makefile
* acpi-pci:
ACPI: PCI: update kernel-doc line comments
|
|
* acpi-misc:
ACPI: Make acpi_evaluate_dsm() prototype consistent
ACPI: wakeup: Remove dead ACPICA debug code
ACPI: video: Remove leftover ACPICA debug code
ACPI: tiny-power-button: Remove dead ACPICA debug code
ACPI: processor: Remove dead ACPICA debug code
ACPI: proc: Remove dead ACPICA debug code
ACPI: PCI: Remove unused ACPICA debug code
ACPI: event: Remove leftover ACPICA debug code
ACPI: dock: Remove dead ACPICA debug code
ACPI: debugfs: Remove dead ACPICA debug code
ACPI: custom_method: Remove dead ACPICA debug code
ACPI: container: Remove leftover ACPICA debug functionality
ACPI: platform: Remove ACPI_MODULE_NAME()
ACPI: memhotplug: Remove leftover ACPICA debug functionality
ACPI: LPSS: Remove ACPI_MODULE_NAME()
ACPI: cmos_rtc: Remove leftover ACPI_MODULE_NAME()
ACPI: Remove three unused inline functions
|
|
* acpi-numa:
docs: mm: numaperf.rst Add brief description for access class 1.
node: Add access1 class to represent CPU to memory characteristics
ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
ACPI: Let ACPI know we support Generic Initiator Affinity Structures
x86: Support Generic Initiator only proximity domains
ACPI: Support Generic Initiator only domains
ACPI / NUMA: Add stub function for pxm_to_node()
irq-chip/gic-v3-its: Fix crash if ITS is in a proximity domain without processor or memory
ACPI: Remove side effect of partly creating a node in acpi_get_node()
ACPI: Rename acpi_map_pxm_to_online_node() to pxm_to_online_node()
ACPI: Remove side effect of partly creating a node in acpi_map_pxm_to_online_node()
ACPI: Do not create new NUMA domains from ACPI static tables that are not SRAT
ACPI: Add out of bounds and numa_off protections to pxm_to_node()
|
|
* acpi-video:
ACPI: video: use ACPI backlight for HP 635 Notebook
* acpi-battery:
ACPI: battery: include linux/power_supply.h
* acpi-config:
ACPI: configfs: Add missing config_item_put() to fix refcount leak
* acpi-scan:
ACPI: scan: Replace ACPI_DEBUG_PRINT() with pr_debug()
|
|
* acpi-tables:
ACPI: NFIT: Use kobj_to_dev() instead
* acpi-pmic:
MAINTAINERS: Use my kernel.org address for Intel PMIC work
ACPI / PMIC: Move TPS68470 OpRegion driver to drivers/acpi/pmic/
ACPI / PMIC: Split out Kconfig and Makefile specific for ACPI PMIC
* acpi-dptf:
ACPI: DPTF: Add PCH FIVR participant driver
* acpi-soc:
ACPI: APD: Clean up header file include statements
ACPI: APD: Remove unnecessary APD_ADDR() macro stub
ACPI: APD: Remove ACPI_MODULE_NAME()
ACPI: APD: Remove flags from struct apd_device_desc
ACPI: APD: Add kerneldoc for properties in struct apd_device_desc
|
|
fix the comment of radix_tree_next_slot():
interator --> iterator.
Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
xas_reload() was only checking that the head entry was still at the
head index. If the entry has been split, that's not enough as there
may be a different entry at the specified index now.
Solve this by checking the slot for the requested index instead of the
head index.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
Move the tricky bits of dealing with the XArray from the workingset
code to the XArray. Make it clear in the documentation that this is a
private interface, and only export it for the benefit of the test suite.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
* pm-cpuidle:
cpuidle: record state entry rejection statistics
cpuidle: psci: Allow PM domain to be initialized even if no OSI mode
firmware: psci: Extend psci_set_osi_mode() to allow reset to PC mode
ACPI: processor: Print more information when acpi_processor_evaluate_cst() fails
cpuidle: tegra: Correctly handle result of arm_cpuidle_simple_enter()
* pm-devfreq:
PM / devfreq: tegra30: Improve initial hardware resetting
PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function
PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function
PM / devfreq: Add devfreq_get_devfreq_by_node function
|
|
* pm-cpufreq: (30 commits)
cpufreq: stats: Fix string format specifier mismatch
arm: disable frequency invariance for CONFIG_BL_SWITCHER
cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()
cpufreq: stats: Add memory barrier to store_reset()
cpufreq: schedutil: Simplify sugov_fast_switch()
cpufreq: Move traces and update to policy->cur to cpufreq core
cpufreq: stats: Enable stats for fast-switch as well
cpufreq: stats: Mark few conditionals with unlikely()
cpufreq: stats: Remove locking
cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()
arch_topology, arm, arm64: define arch_scale_freq_invariant()
arch_topology, cpufreq: constify arch_* cpumasks
cpufreq: report whether cpufreq supports Frequency Invariance (FI)
cpufreq: move invariance setter calls in cpufreq core
arch_topology: validate input frequencies to arch_set_freq_scale()
cpufreq: qcom: Don't add frequencies without an OPP
cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC
cpufreq: qcom-hw: Use of_device_get_match_data for offsets and row size
cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code
dt-bindings: cpufreq: cpufreq-qcom-hw: Document Qcom EPSS compatible
...
|
|
Compilation of ixp4xx_defconfig fails with:
arch/arm/mach-ixp4xx/common.c: In function 'ixp4xx_platform_notify_remove':
arch/arm/mach-ixp4xx/common.c:291:3: error: implicit declaration of function 'dmabounce_unregister_dev' [-Werror=implicit-function-declaration]
291 | dmabounce_unregister_dev(dev);
| ^~~~~~~~~~~~~~~~~~~~~~~~
arch/arm/mach-ixp4xx/common.c: In function 'ixp4xx_platform_notify':
arch/arm/mach-ixp4xx/common.c:307:3: error: implicit declaration of function 'dmabounce_register_dev' [-Werror=implicit-function-declaration]
307 | dmabounce_register_dev(dev, 2048, 4096, ixp4xx_needs_bounce);
Add a missing include that is needed because of the header reshuffle.
Fixes: 0a0f0d8be76d ("dma-mapping: split <linux/dma-mapping.h>")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently the BUILD_BUG() macro is expanded to the following:
do {
extern void __compiletime_assert_0(void)
__attribute__((error("BUILD_BUG failed")));
if (!(!(1)))
__compiletime_assert_0();
} while (0);
If used in a function body this would obviously produce build errors
with -Wnested-externs and -Werror.
To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
includes in intel-pt-decoder, build perf without -Wnested-externs.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours
|
|
Change doubled word "is" to "it is".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: linux-mips@vger.kernel.org
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Some of these ralink devices come with an ancient u-boot which can't
extract LZMA properly when image gets too big.
Enable zboot support to get a self-extracting kernel instead of relying
on broken u-boot support.
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
While it is true that Ingenic SoCs support huge pages, we cannot use
them yet as PTEs don't have any single bit that is free. Right now,
having that symbol only causes build errors, so remove it until the
situation with PTEs is resolved.
Fixes: f0f4a753079c ("MIPS: generic: Add support for Ingenic SoCs")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Currently, dw_pcie_msi_init() allocates and maps page for msi, then
program the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI. The Root Complex
may lose power during suspend-to-RAM, so when we resume, we want to
redo the latter but not the former. If designware based driver (for
example, pcie-tegra194.c) calls dw_pcie_msi_init() in resume path, the
msi page will be leaked.
As pointed out by Rob and Ard, there's no need to allocate a page for
the MSI address, we could use an address in the driver data.
To avoid map the MSI msg again during resume, we move the map MSI msg
from dw_pcie_msi_init() to dw_pcie_host_init().
Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201009155505.5a580ef5@xhacker.debian
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
|
|
If MSI is disabled, there's no need to program PCIE_MSI_INTR0_MASK
and PCIE_MSI_INTR0_ENABLE registers.
Link: https://lore.kernel.org/r/20201009155436.27e67238@xhacker.debian
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
|
|
After applying "PCI: dwc: Add common iATU register support",
there is no need to set own iATU in the Keystone driver itself.
Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/1601444167-11316-5-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
|
|
This gets iATU register area from reg property that has reg-names "atu".
In Synopsys DWC version 4.80 or later, since iATU register area is
separated from core register area, this area is necessary to get from
DT independently.
Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/1601444167-11316-4-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
|
|
In the dt-bindings, "atu" reg-names is required to get the register space
for iATU in Synopsis DWC version 4.80 or later.
Link: https://lore.kernel.org/r/1601444167-11316-3-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
|
|
In the dt-bindings, "atu" reg-names is required to get the register space
for iATU in Synopsys DWC version 4.80 or later.
Link: https://lore.kernel.org/r/1601444167-11316-2-git-send-email-hayashi.kunihiko@socionext.com
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
|
|
platform_get_irq() returns -ERRNO on error. In such case casting to u32
and comparing to 0 would pass the check.
Fixes: 623a6143a845 ("mailbox: mediatek: Add Mediatek CMDQ driver")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
The MHU drives the signal using a 32-bit register, with all 32 bits
logically ORed together. The MHU provides a set of registers to enable
software to set, clear, and check the status of each of the bits of this
register independently. The use of 32 bits for each interrupt line
enables software to provide more information about the source of the
interrupt. For example, each bit of the register can be associated with
a type of event that can contribute to raising the interrupt.
This patch adds a separate the MHU controller driver for doorbel mode
of operation using the extended DT binding to add support the same.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Since we will be soon adding a separate driver based on this ARM MHU
driver to support doorbell mode, let us add explicit check to match
the default compatible for this driver. This is needed as the probe
and match reuses the AMBA device ids currently and don't have any
explicit compatible check.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
The ARM MHU's reference manual states following:
"The MHU drives the signal using a 32-bit register, with all 32 bits
logically ORed together. The MHU provides a set of registers to enable
software to set, clear, and check the status of each of the bits of this
register independently. The use of 32 bits for each interrupt line
enables software to provide more information about the source of the
interrupt. For example, each bit of the register can be associated with
a type of event that can contribute to raising the interrupt."
This patch thus extends the MHU controller's DT binding to add support
for doorbell mode.
Though the same MHU hardware controller is used in the two modes, A new
compatible string is added here to represent the combination of the MHU
hardware and the firmware sitting on the other side (which expects each
bit to represent a different signal now).
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Convert the DT binding over to Json-schema.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
|
|
Clang-11 shipped support for outputs to asm goto statments along the
fallthrough path. Double up some of the get_user() and related macros
to be able to take advantage of this extended GNU C extension. This
should help improve the generated code's performance for these accesses.
Cc: Bill Wendling <morbo@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Instead of inlining the stac/mov/clac sequence (which also requires
individual exception table entries and several asm instruction
alternatives entries), just generate "call __put_user_nocheck_X" for the
__put_user() cases, the same way we changed __get_user earlier.
Unlike the get_user() case, we didn't have the same nice infrastructure
to just generate the call with a single case, so this actually has to
change some of the infrastructure in order to do this. But that only
cleans up the code further.
So now, instead of using a case statement for the sizes, we just do the
same thing we've done on the get_user() side for a long time: use the
size as an immediate constant to the asm, and generate the asm that way
directly.
In order to handle the special case of 64-bit data on a 32-bit kernel, I
needed to change the calling convention slightly: the data is passed in
%eax[:%edx], the pointer in %ecx, and the return value is also returned
in %ecx. It used to be returned in %eax, but because of how %eax can
now be a double register input, we don't want mix that with a
single-register output.
The actual low-level asm is easier to handle: we'll just share the code
between the checking and non-checking case, with the non-checking case
jumping into the middle of the function. That may sound a bit too
special, but this code is all very very special anyway, so...
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Instead of inlining the whole stac/lfence/mov/clac sequence (which also
requires individual exception table entries and several asm instruction
alternatives entries), just generate "call __get_user_nocheck_X" for the
__get_user() cases.
We can use all the same infrastructure that we already do for the
regular "get_user()", and the end result is simpler source code, and
much simpler code generation.
It also means that when I introduce asm goto with input for
"unsafe_get_user()", there are no nasty interactions with the
__get_user() code.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fix from Jeff Layton:
"Just a single patch to fix up some tracepoint output"
* tag 'locks-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: Remove extra "0x" in tracepoint format specifier
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat mount cleanups from Al Viro:
"The last remnants of mount(2) compat buried by Christoph.
Buried into NFS, that is.
Generally I'm less enthusiastic about "let's use in_compat_syscall()
deep in call chain" kind of approach than Christoph seems to be, but
in this case it's warranted - that had been an NFS-specific wart,
hopefully not to be repeated in any other filesystems (read: any new
filesystem introducing non-text mount options will get NAKed even if
it doesn't mess the layout up).
IOW, not worth trying to grow an infrastructure that would avoid that
use of in_compat_syscall()..."
* 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove compat_sys_mount
fs,nfs: lift compat nfs4 mount data handling into the nfs code
nfs: simplify nfs4_parse_monolithic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat quotactl cleanups from Al Viro:
"More Christoph's compat cleanups: quotactl(2)"
* 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
quota: simplify the quotactl compat handling
compat: add a compat_need_64bit_alignment_fixup() helper
compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"
[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]
* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument
|
|
Pull documentation updates from Jonathan Corbet:
"As hoped, things calmed down for docs this cycle; fewer changes and
almost no conflicts at all. This includes:
- A reworked and expanded user-mode Linux document
- Some simplifications and improvements for submitting-patches.rst
- An emergency fix for (some) problems with Sphinx 3.x
- Some welcome automarkup improvements to automatically generate
cross-references to struct definitions and other documents
- The usual collection of translation updates, typo fixes, etc"
* tag 'docs-5.10' of git://git.lwn.net/linux: (81 commits)
gpiolib: Update indentation in driver.rst for code excerpts
Documentation/admin-guide: tainted-kernels: Fix typo occured
Documentation: better locations for sysfs-pci, sysfs-tagging
docs: programming-languages: refresh blurb on clang support
Documentation: kvm: fix a typo
Documentation: Chinese translation of Documentation/arm64/amu.rst
doc: zh_CN: index files in arm64 subdirectory
mailmap: add entry for <mstarovoitov@marvell.com>
doc: seq_file: clarify role of *pos in ->next()
docs: trace: ring-buffer-design.rst: use the new SPDX tag
Documentation: kernel-parameters: clarify "module." parameters
Fix references to nommu-mmap.rst
docs: rewrite admin-guide/sysctl/abi.rst
docs: fb: Remove vesafb scrollback boot option
docs: fb: Remove sstfb scrollback boot option
docs: fb: Remove matroxfb scrollback boot option
docs: fb: Remove framebuffer scrollback boot option
docs: replace the old User Mode Linux HowTo with a new one
Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev"
Documentation/admin-guide: README & svga: remove use of "rdev"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"Cleanups by Christoph:
- Switch to libata instead of legacy ide driver
- Drop ia64 perfmon (it's been broken for a while)"
* tag 'ia64_for_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ia64: Use libata instead of the legacy ide driver in defconfigs
ia64: Remove perfmon
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-10-12
The main changes are:
1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.
2) libbpf relocation support for different size load/store, from Andrii.
3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.
4) BPF support for per-cpu variables, form Hao.
5) sockmap improvements, from John.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the TX data path, spot packets with xfrm stack IPsec offload
indication.
Fill Software-Parser segment in TX descriptor so that the hardware
may parse the ESP protocol, and perform TX checksum offload on the
inner payload.
Support GSO, by providing the trailer data and ICV placeholder
so HW can fill it post encryption operation.
Padding alignment cannot be performed in HW (ConnectX-6Dx) due to
a bug. Software can overcome this limitation by adding NETIF_F_HW_ESP to
the gso_partial_features field in netdev so the packets being
aligned by the stack.
l4_inner_checksum cannot be offloaded by HW for IPsec tunnel type packet.
Note that for GSO SKBs, the stack does not include an ESP trailer,
unlike the non-GSO case.
Below is the iperf3 performance report on two server of 24 cores
Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX.
All the bandwidth test uses iperf3 TCP traffic with packet size 128KB.
Each tunnel uses one iperf3 stream with one thread (option -P1).
TX crypto offload shows improvements on both bandwidth
and CPU utilization.
----------------------------------------------------------------------
Mode | Num tunnel | BW | Send CPU util | Recv CPU util
| | (Gbps) | (Average %) | (Average %)
----------------------------------------------------------------------
Cryto offload | | | |
(RX only) | 1 | 4.7 | 4.2 | 3.5
----------------------------------------------------------------------
Cryto offload | | | |
(RX only) | 24 | 15.6 | 20 | 10
----------------------------------------------------------------------
Non-offload | 1 | 4.6 | 4 | 5
----------------------------------------------------------------------
Non-offload | 24 | 11.9 | 16 | 12
----------------------------------------------------------------------
Cryto offload | | | |
(TX & RX) | 1 | 11.9 | 2.1 | 5.9
----------------------------------------------------------------------
Cryto offload | | | |
(TX & RX) | 24 | 38 | 9.5 | 27.5
----------------------------------------------------------------------
Cryto offload | | | |
(TX only) | 1 | 4.7 | 0.7 | 5
----------------------------------------------------------------------
Cryto offload | | | |
(TX only) | 24 | 14.5 | 6 | 20
Regression tests show no degradation on non-ipsec and
non-offload-ipsec traffics. The packet rate test uses pktgen UDP to
transmit on single CPU, the instructions and cycles are measured on
the transmit CPU.
before:
----------------------------------------------------------------------
Non-offload | 1 | 4.7 | 4.2 | 5.1
----------------------------------------------------------------------
Non-offload | 24 | 11.2 | 14 | 15
----------------------------------------------------------------------
Non-ipsec | 1 | 28 | 4 | 5.7
----------------------------------------------------------------------
Non-ipsec | 24 | 68.3 | 17.8 | 39.7
----------------------------------------------------------------------
Non-ipsec packet rate(BURST=1000 BC=5 NCPUS=1 SIZE=60)
13.56Mpps, 456 instructions/pkt, 191 cycles/pkt
after:
----------------------------------------------------------------------
Non-offload | 1 | 4.69 | 4.2 | 5
----------------------------------------------------------------------
Non-offload | 24 | 11.9 | 13.5 | 15.1
----------------------------------------------------------------------
Non-ipsec | 1 | 29 | 3.2 | 5.5
----------------------------------------------------------------------
Non-ipsec | 24 | 68.2 | 18.5 | 39.8
----------------------------------------------------------------------
Non-ipsec packet rate: 13.56Mpps, 472 instructions/pkt, 191 cycles/pkt
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|