summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-07s390/bpf: Use kvcalloc for addrs arrayIlya Leoshkevich
A BPF program may consist of 1m instructions, which means JIT instruction-address mapping can be as large as 4m. s390 has FORCE_MAX_ZONEORDER=9 (for memory hotplug reasons), which means maximum kmalloc size is 1m. This makes it impossible to JIT programs with more than 256k instructions. Fix by using kvcalloc, which falls back to vmalloc for larger allocations. An alternative would be to use a radix tree, but that is not supported by bpf_prog_fill_jited_linfo. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107141838.92202-1-iii@linux.ibm.com
2019-11-07tools, bpf_asm: Warn when jumps are out of rangeIlya Leoshkevich
When compiling larger programs with bpf_asm, it's possible to accidentally exceed jt/jf range, in which case it won't complain, but rather silently emit a truncated offset, leading to a "happy debugging" situation. Add a warning to help detecting such issues. It could be made an error instead, but this might break compilation of existing code (which might be working by accident). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191107100349.88976-1-iii@linux.ibm.com
2019-11-07x86/stacktrace: update kconfig help text for reliable unwindersJoe Lawrence
commit 6415b38bae26 ("x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder") added the ORC unwinder as a "reliable" unwinder. Update the help text to reflect that change: the frame pointer unwinder is no longer the only one that can provide HAVE_RELIABLE_STACKTRACE. Link: http://lkml.kernel.org/r/20191107032958.14034-1-joe.lawrence@redhat.com To: linux-kernel@vger.kernel.org To: live-patching@vger.kernel.org Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-11-07ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxmBrice Goglin
On systems where PXMs and nids are in different order, memory initiators exposed in sysfs could be wrong: On dual-socket CLX with SNC enabled (4 nodes, 1 and 2 swapped between PXMs and nids), node1 would only get node2 as initiator, and node2 would only get node1. With this patch, we get node1 as the only initiator of itself, and node2 as the only initiator of itself, as expected. This should likely go to stable up to 5.2. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" deviceDan Williams
Memory that has been tagged EFI_MEMORY_SP, and has performance properties described by the ACPI HMAT is expected to have an application specific consumer. Those consumers may want 100% of the memory capacity to be reserved from any usage by the kernel. By default, with this enabling, a platform device is created to represent this differentiated resource. The device-dax "hmem" driver claims these devices by default and provides an mmap interface for the target application. If the administrator prefers, the hmem resource range can be made available to the core-mm via the device-dax hotplug facility, kmem, to online the memory with its own numa node. This was tested with an emulated HMAT produced by qemu (with the pending HMAT enabling patches), and "efi_fake_mem=8G@9G:0x40000" on the kernel command line to mark the memory ranges associated with node2 and node3 as EFI_MEMORY_SP. qemu numa configuration options: -numa node,mem=4G,cpus=0-19,nodeid=0 -numa node,mem=4G,cpus=20-39,nodeid=1 -numa node,mem=4G,nodeid=2 -numa node,mem=4G,nodeid=3 -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=21 -numa dist,src=0,dst=3,val=21 -numa dist,src=1,dst=0,val=21 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=21 -numa dist,src=2,dst=0,val=21 -numa dist,src=2,dst=1,val=21 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=3,dst=0,val=21 -numa dist,src=3,dst=1,val=21 -numa dist,src=3,dst=2,val=21 -numa dist,src=3,dst=3,val=10 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15 -numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15 -numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20 -numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20 -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15 -numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15 -numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20 -numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20 Result: [ { "path":"\/platform\/hmem.1", "id":1, "size":"4.00 GiB (4.29 GB)", "align":2097152, "devices":[ { "chardev":"dax1.0", "size":"4.00 GiB (4.29 GB)" } ] }, { "path":"\/platform\/hmem.0", "id":0, "size":"4.00 GiB (4.29 GB)", "align":2097152, "devices":[ { "chardev":"dax0.0", "size":"4.00 GiB (4.29 GB)" } ] } ] [..] 240000000-43fffffff : Soft Reserved 240000000-33fffffff : hmem.0 240000000-33fffffff : dax0.0 340000000-43fffffff : hmem.1 340000000-43fffffff : dax1.0 Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07ACPI: NUMA: HMAT: Register HMAT at device_initcall levelDan Williams
In preparation for registering device-dax instances for accessing EFI specific-purpose memory, arrange for the HMAT registration to occur later in the init process. Critically HMAT initialization needs to occur after e820__reserve_resources_late() which is the point at which the iomem resource tree is populated with "Application Reserved" (IORES_DESC_APPLICATION_RESERVED). e820__reserve_resources_late() happens at subsys_initcall time. Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07device-dax: Add a driver for "hmem" devicesDan Williams
Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default. However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver. This consumes "hmem" devices the producer of "hmem" devices is saved for a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM symbol to gate performing the enumeration work. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07dax: Fix alloc_dax_region() compile warningDan Williams
PFN flags are (unsigned long long), fix the alloc_dax_region() calling convention to fix warnings of the form: >> include/linux/pfn_t.h:18:17: warning: large integer implicitly truncated to unsigned type [-Woverflow] #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3)) Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07lib: Uplevel the pmem "region" ida to a global allocatorDan Williams
In preparation for handling platform differentiated memory types beyond persistent memory, uplevel the "region" identifier to a global number space. This enables a device-dax instance to be registered to any memory type with guaranteed unique names. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07x86/efi: Add efi_fake_mem support for EFI_MEMORY_SPDan Williams
Given that EFI_MEMORY_SP is platform BIOS policy decision for marking memory ranges as "reserved for a specific purpose" there will inevitably be scenarios where the BIOS omits the attribute in situations where it is desired. Unlike other attributes if the OS wants to reserve this memory from the kernel the reservation needs to happen early in init. So early, in fact, that it needs to happen before e820__memblock_setup() which is a pre-requisite for efi_fake_memmap() that wants to allocate memory for the updated table. Introduce an x86 specific efi_fake_memmap_early() that can search for attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table accordingly. The KASLR code that scans the command line looking for user-directed memory reservations also needs to be updated to consider "efi_fake_mem=nn@ss:0x40000" requests. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07arm/efi: EFI soft reservation to memblockDan Williams
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. For this patch, update the ARM paths that consider EFI_CONVENTIONAL_MEMORY to optionally take the EFI_MEMORY_SP attribute into account as a reservation indicator. Publish the soft reservation as IORES_DESC_SOFT_RESERVED memory, similar to x86. (Based on an original patch by Ard) Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07x86/efi: EFI soft reservation to E820 enumerationDan Williams
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. This patch introduces 2 new concepts at once given the entanglement between early boot enumeration relative to memory that can optionally be reserved from the kernel page allocator by default. The new concepts are: - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP attribute on EFI_CONVENTIONAL memory, update the E820 map with this new type. Only perform this classification if the CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as typical ram. - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for a device driver to search iomem resources for application specific memory. Teach the iomem code to identify such ranges as "Soft Reserved". Note that the comment for do_add_efi_memmap() needed refreshing since it seemed to imply that the efi map might overflow the e820 table, but that is not an issue as of commit 7b6e4ba3cb1f "x86/boot/e820: Clean up the E820_X_MAX definition" that removed the 128 entry limit for e820__range_add(). A follow-on change integrates parsing of the ACPI HMAT to identify the node and sub-range boundaries of EFI_MEMORY_SP designated memory. For now, just identify and reserve memory of this type. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07efi: Common enable/disable infrastructure for EFI soft reservationDan Williams
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. As for this patch, define the common helpers to determine if the EFI_MEMORY_SP attribute should be honored. The determination needs to be made early to prevent the kernel from being loaded into soft-reserved memory, or otherwise allowing early allocations to land there. Follow-on changes are needed per architecture to leverage these helpers in their respective mem-init paths. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07x86/efi: Push EFI_MEMMAP check into leaf routinesDan Williams
In preparation for adding another EFI_MEMMAP dependent call that needs to occur before e820__memblock_setup() fixup the existing efi calls to check for EFI_MEMMAP internally. This ends up being cleaner than the alternative of checking EFI_MEMMAP multiple times in setup_arch(). Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07efi: Enumerate EFI_MEMORY_SPDan Williams
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The intent of this bit is to allow the OS to identify precious or scarce memory resources and optionally manage it separately from EfiConventionalMemory. As defined older OSes that do not know about this attribute are permitted to ignore it and the memory will be handled according to the OS default policy for the given memory type. In other words, this "specific purpose" hint is deliberately weaker than EfiReservedMemoryType in that the system continues to operate if the OS takes no action on the attribute. The risk of taking no action is potentially unwanted / unmovable kernel allocations from the designated resource that prevent the full realization of the "specific purpose". For example, consider a system with a high-bandwidth memory pool. Older kernels are permitted to boot and consume that memory as conventional "System-RAM" newer kernels may arrange for that memory to be set aside (soft reserved) by the system administrator for a dedicated high-bandwidth memory aware application to consume. Specifically, this mechanism allows for the elimination of scenarios where platform firmware tries to game OS policy by lying about ACPI SLIT values, i.e. claiming that a precious memory resource has a high distance to trigger the OS to avoid it by default. This reservation hint allows platform-firmware to instead tell the truth about performance characteristics by indicate to OS memory management to put immovable allocations elsewhere. Implement simple detection of the bit for EFI memory table dumps and save the kernel policy for a follow-on change. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07ACPI: NUMA: Establish a new drivers/acpi/numa/ directoryDan Williams
Currently hmat.c lives under an "hmat" directory which does not enhance the description of the file. The initial motivation for giving hmat.c its own directory was to delineate it as mm functionality in contrast to ACPI device driver functionality. As ACPI continues to play an increasing role in conveying memory location and performance topology information to the OS take the opportunity to co-locate these NUMA relevant tables in a combined directory. numa.c is renamed to srat.c and moved to drivers/acpi/numa/ along with hmat.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07Bluetooth: btmtksdio: add MODULE_DEVICE_TABLE()Bartosz Golaszewski
This adds the missing MODULE_DEVICE_TABLE() for SDIO IDs. While certain platforms using this driver indeed have HW issues causing problems if the module is loaded too early - this should be handled from user-space by blacklisting it or delaying the loading. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-11-07scsi: sd_zbc: add zone open, close, and finish supportAjay Joshi
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH support to allow explicit control of zone states. Contains contributions from Matias Bjorling, Hans Holmberg, Keith Busch and Damien Le Moal. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07Merge branch 'fixes' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi into for-5.5/drivers-post * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi: scsi: core: Handle drivers which set sg_tablesize to zero scsi: qla2xxx: fix NPIV tear down process scsi: sd_zbc: Fix sd_zbc_complete()
2019-11-07Merge tag 'scsi-fixes' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi into for-5.5/drivers-post SCSI fixes on 20191101 Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4] and one core change in sd that fixes an I/O failure on DIF type 3 devices. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits) scsi: qla2xxx: stop timer in shutdown path scsi: sd: define variable dif as unsigned int instead of bool scsi: target: cxgbit: Fix cxgbit_fw4_ack() scsi: qla2xxx: Fix partial flash write of MBI scsi: qla2xxx: Initialized mailbox to prevent driver load failure scsi: lpfc: Honor module parameter lpfc_use_adisc scsi: ufs-bsg: Wake the device before sending raw upiu commands scsi: lpfc: Check queue pointer before use scsi: qla2xxx: fixup incorrect usage of host_byte scsi: lpfc: remove left-over BUILD_NVME defines scsi: core: try to get module before removing device scsi: hpsa: add missing hunks in reset-patch scsi: target: core: Do not overwrite CDB byte 1 scsi: ch: Make it possible to open a ch device multiple times again scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE scsi: sni_53c710: fix compilation error scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions scsi: qla2xxx: fix a potential NULL pointer dereference scsi: MAINTAINERS: Update qla2xxx driver scsi: zfcp: fix reaction on bit error threshold notification ...
2019-11-07null_blk: add zone open, close, and finish supportAjay Joshi
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH support to allow explicit control of zone states. Contains contributions from Matias Bjorling, Hans Holmberg, Keith Busch and Damien Le Moal. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07dm: add zone open, close and finish supportAjay Joshi
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH support to allow explicit control of zone states. Contains contributions from Matias Bjorling, Hans Holmberg and Damien Le Moal. Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07Merge branch 'for-5.5/block' into for-5.5/driversJens Axboe
Pull in dependencies for the new zoned open/close/finish support. * for-5.5/block: (32 commits) block: add zone open, close and finish ioctl support block: add zone open, close and finish operations block: Simplify REQ_OP_ZONE_RESET_ALL handling block: Remove REQ_OP_ZONE_RESET plugging block: Warn if elevator= parameter is used block: avoid blk_bio_segment_split for small I/O operations blk-mq: make sure that line break can be printed block: sed-opal: Introduce Opal Datastore UID block: sed-opal: Add support to read/write opal tables generically block: sed-opal: Generalizing write data to any opal table bdev: Refresh bdev size for disks without partitioning bdev: Factor out bdev revalidation into a common helper blk-mq: avoid sysfs buffer overflow with too many CPU cores blk-mq: Make blk_mq_run_hw_queue() return void fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name blk-mq: fill header with kernel-doc blk-mq: remove needless goto from blk_mq_get_driver_tag block: reorder bio::__bi_remaining for better packing block: Reduce the amount of memory used for tag sets block: Reduce the amount of memory required per request queue ...
2019-11-07drivers: ipmi: Support for both IPMB Req and RespVijay Khemka
Removed check for request or response in IPMB packets coming from device as well as from host. Now it supports both way communication to device via IPMB. Both request and response will be passed to application. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Message-Id: <20191106182921.1086795-1-vijaykhemka@fb.com> Reviewed-by: Asmaa Mnebhi <Asmaa@mellanox.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
2019-11-07block: add zone open, close and finish ioctl supportAjay Joshi
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE to allow applications to control the condition of zones on a zoned block device through the execution of the REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07block: add zone open, close and finish operationsAjay Joshi
Zoned block devices (ZBC and ZAC devices) allow an explicit control over the condition (state) of zones. The operations allowed are: * Open a zone: Transition to open condition to indicate that a zone will actively be written * Close a zone: Transition to closed condition to release the drive resources used for writing to a zone * Finish a zone: Transition an open or closed zone to the full condition to prevent write operations To enable this control for in-kernel zoned block device users, define the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH as well as the generic function blkdev_zone_mgmt() for submitting these operations on a range of zones. This results in blkdev_reset_zones() removal and replacement with this new zone magement function. Users of blkdev_reset_zones() (f2fs and dm-zoned) are updated accordingly. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07block: Simplify REQ_OP_ZONE_RESET_ALL handlingDamien Le Moal
There is no need for the function __blkdev_reset_all_zones() as REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones() bio loop with an early break from the loop. This patch removes this function and modifies blkdev_reset_zones(), simplifying the code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07block: Remove REQ_OP_ZONE_RESET pluggingDamien Le Moal
REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests do not have a size and are never sequential due to the zone start sector position required for their execution. As a result, there is no point in using a plug around blkdev_reset_zones() bio issuing loop. This patch removes this unnecessary plugging. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07perf report: Sort by sampled cycles percent per block for tuiJin Yao
Previous patch has implemented a new option "--total-cycles". But only stdio mode is supported. This patch supports the tui mode and support '--percent-limit'. For example, perf record -b ./div perf report --total-cycles --percent-limit 1 # Samples: 2753248 of event 'cycles' Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so -------------------------------------------------- v7: --- 1. Since we have used use_browser in report__browse_block_hists to support stdio mode, now we also add supporting for tui. 2. Move block tui browser code from ui/browsers/hists.c to block-info.c. v6: --- Create report__tui_browse_block_hists in block-info.c (codes are moved from builtin-report.c). v5: --- Fix a crash issue when running perf report without '--total-cycles'. The issue is because the internal flag is renamed from 'total_cycles' to 'total_cycles_mode' in previous patch but this patch still uses 'total_cycles' to check if the '--total-cycles' option is enabled, which causes the code to be inconsistent. v4: --- Since the block collection is moved out of printing in previous patch, this patch is updated accordingly for tui supporting. v3: --- Minor change since the function name is changed: block_total_cycles_percent -> block_info__total_cycles_percent Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf report: Support --percent-limit for --total-cyclesJin Yao
We have already supported the '--total-cycles' option in previous patch. It's also useful to show entries only above a threshold percent. This patch enables '--percent-limit' for not showing entries under that percent. For example: perf report --total-cycles --stdio --percent-limit 1 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................................. .................... # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so Committer testing: From second exapmple onwards slightly edited for brevity: # perf report --total-cycles --percent-limit 2 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # # perf report --total-cycles --percent-limit 1 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so # # perf report --total-cycles --percent-limit 0.7 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] # ------------------------------------------- It only shows the entries which 'Sampled Cycles%' > 1%. v7: --- No functional change. Only fix the conflict issue because previous patches are changed. v6: --- No functional change. Only fix the conflict issue because previous patches are changed. v5: --- No functional change. Only fix the conflict issue because previous patches are changed. v4: --- No functional change. Only fix the build issue because previous patches are changed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf report: Sort by sampled cycles percent per block for stdioJin Yao
It would be useful to support sorting for all blocks by the sampled cycles percent per block. This is useful to concentrate on the globally hottest blocks. This patch implements a new option "--total-cycles" which sorts all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent: percent = block sampled cycles aggregation / total sampled cycles Note that, this patch only supports "--stdio" mode. For example, # perf record -b ./div # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................ ................. # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so 0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so 0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms] 0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms] 0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms] 0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms] 0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms] 0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms] 0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms] 0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms] 0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms] Committer testing: This should provide material for hours of endless joy, both from looking for suspicious things in the implementation of this patch, such as the top one: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] As well from things that look legit: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] :-) Very short system wide taken branches session: # perf record -h -b Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -b, --branch-any sample any taken branches # # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] 0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux] 0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux] 0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux] 0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux] 0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] 0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux] 0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux] 0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux] 0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux] 0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux] 0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux] 0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux] 0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux] 0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux] 0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux] 0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux] 0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux] 0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux] 0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux] ----------------------------------------------------------- v7: --- Use use_browser in report__browse_block_hists for supporting stdio and potential tui mode. v6: --- Create report__browse_block_hists in block-info.c (codes are moved from builtin-report.c). It's called from perf_evlist__tty_browse_hists. v5: --- 1. Move all block functions to block-info.c 2. Move the code of setting ms in block hist_entry to other patch. v4: --- 1. Use new option '--total-cycles' to replace '-s total_cycles' in v3. 2. Move block info collection out of block info printing. v3: --- 1. Use common function block_info__process_sym to process the blocks per symbol. 2. Remove the nasty hack for skipping calculation of column length 3. Some minor cleanup Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf hist: Support block formats with compare/sort/displayJin Yao
This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07spi: spi-mem: fallback to using transfers when CS gpios are usedChris Packham
Devices with chip selects driven via GPIO are not compatible with the spi-mem operations. Fallback to using standard spi transfers when the device is connected with a gpio CS. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Link: https://lore.kernel.org/r/20191107044235.4864-3-chris.packham@alliedtelesis.co.nz Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-07spi: bcm-qspi: Convert to use CS GPIO descriptorsChris Packham
Set use_gpio_descriptors to true and avoid asserting the native chip select if the spi core has done it for us. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Link: https://lore.kernel.org/r/20191107044235.4864-2-chris.packham@alliedtelesis.co.nz Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-07regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_idStephan Gerhold
Those regulators are not actually supported by the AB8500 regulator driver. There is no ab8500_regulator_info for them and no entry in ab8505_regulator_match. As such, they cannot be registered successfully, and looking them up in ab8505_regulator_match causes an out-of-bounds array read. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20191106173125.14496-2-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-07regulator: ab8500: Remove AB8505 USB regulatorStephan Gerhold
The USB regulator was removed for AB8500 in commit 41a06aa738ad ("regulator: ab8500: Remove USB regulator"). It was then added for AB8505 in commit 547f384f33db ("regulator: ab8500: add support for ab8505"). However, there was never an entry added for it in ab8505_regulator_match. This causes all regulators after it to be initialized with the wrong device tree data, eventually leading to an out-of-bounds array read. Given that it is not used anywhere in the kernel, it seems likely that similar arguments against supporting it exist for AB8505 (it is controlled by hardware). Therefore, simply remove it like for AB8500 instead of adding an entry in ab8505_regulator_match. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-07drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platformShaokun Zhang
For some HiSilicon platform, the originally designed SCCL_ID and CCL_ID are not satisfied with much rich topology when the MT is set, so we extend the SCCL_ID to MPIDR[aff3] and CCL_ID to MPIDR[aff2]. Let's update this for HiSilicon uncore PMU driver. Cc: John Garry <john.garry@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-11-07Merge tag 'linux-cpupower-5.5-rc1' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility updates for v5.5 from Shuah Khan: "This cpupower update for Linux 5.5-rc1 consists of bug fixes and improvements to make it more accurate by removing the userspace to kernel transition and read_msr initiated IPI delays." * tag 'linux-cpupower-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: ToDo: Update ToDo with ideas for per_cpu_schedule handling cpupower: mperf_monitor: Update cpupower to use the RDPRU instruction cpupower: mperf_monitor: Introduce per_cpu_schedule flag cpupower: Move needs_root variable into a sub-struct cpupower : Handle set and info subcommands correctly tools/power/cpupower: Fix initializer override in hsw_ext_cstates
2019-11-07Merge tag 'asoc-fix-v5.4-rc6' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.4 These are a collection of fixes since v5.4-rc4 that have accumilated, they're all driver specific and there's nothing major in here so it's probably not essential to actually send them but I'll leave that call to you.
2019-11-07Merge tag 'devfreq-next-for-5.5' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq updates for v5.5 from Chanwoo Choi: "1. Update devfreq core - Check NULL governor in available_governors_show sysfs in order to prevent showing wrong governor information. - Fix race condition between devfreq_update_status() and trans_stat_show() - Add new 'interrupt-driven' flag for devfreq goveroris. Each devfreq driver can add their own interrupt-driven governor (NIVIDIA Tegra30 ACTMON governor). It needs to use the following sysfs interface to get the new polling interval in order to change the NVIDIA Tegra30 hardware's polling interval. : /sys/class/devfreq/devfreqX/polling_interval So, if 'interrupt-driven' flag of devfreq governor is 1, the devfreq governor is able to use the 'polling_interval' sysfs interface to get the new polling interval value. But, the devfreq core doesn't schedule out the polling work for this governor like NVIDIA Tegra30 ACTMON governor. 2. Update devfreq drivers - For exynos-bus.c, remove unused property from dt-binding documentation - For tegra30-devfreq.c, update the internal behavior like fixing the overflow integer issue and clean-up code. 3. Update devfreq-event driver - For exynos-ppmu.c, add exynos_ppmu.h dt-binding file for 'event-data-type' filed. and update dt-binging documentation. Also, Fix minor coding style." * tag 'devfreq-next-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: (25 commits) PM / devfreq: tegra30: Tune up MCCPU boost-down coefficient PM / devfreq: tegra30: Support variable polling interval PM / devfreq: Add new interrupt_driven flag for governors PM / devfreq: tegra30: Use kHz units for dependency threshold PM / devfreq: tegra30: Disable consecutive interrupts when appropriate PM / devfreq: tegra30: Don't enable already enabled consecutive interrupts PM / devfreq: tegra30: Include appropriate header PM / devfreq: tegra30: Constify structs PM / devfreq: tegra30: Don't enable consecutive-down interrupt on startup PM / devfreq: tegra30: Reset boosting on startup PM / devfreq: tegra30: Move clk-notifier's registration to governor's start PM / devfreq: tegra30: Use CPUFreq notifier PM / devfreq: tegra30: Use kHz units uniformly in the code PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out PM / devfreq: tegra30: Drop write-barrier PM / devfreq: tegra30: Handle possible round-rate error PM / devfreq: tegra30: Keep interrupt disabled while governor is stopped PM / devfreq: tegra30: Change irq type to unsigned int PM / devfreq: exynos-ppmu: remove useless assignment PM / devfreq: Lock devfreq in trans_stat_show ...
2019-11-07perf hist: Count the total cycles of all samplesJin Yao
We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf block: Cleanup and refactor block info functionsJin Yao
We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf diff: Don't use hack to skip column length calculationJin Yao
Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf tests: Fix out of bounds memory accessLeo Yan
The test case 'Read backward ring buffer' failed on 32-bit architectures which were found by LKFT perf testing. The test failed on arm32 x15 device, qemu_arm32, qemu_i386, and found intermittent failure on i386; the failure log is as below: 50: Read backward ring buffer : --- start --- test child forked, pid 510 Using CPUID GenuineIntel-6-9E-9 mmap size 1052672B mmap size 8192B Finished reading overwrite ring buffer: rewind free(): invalid next size (fast) test child interrupted ---- end ---- Read backward ring buffer: FAILED! The log hints there have issue for memory usage, thus free() reports error 'invalid next size' and directly exit for the case. Finally, this issue is root caused as out of bounds memory access for the data array 'evsel->id'. The backward ring buffer test invokes do_test() twice. 'evsel->id' is allocated at the first call with the flow: test__backward_ring_buffer() `-> do_test() `-> evlist__mmap() `-> evlist__mmap_ex() `-> perf_evsel__alloc_id() So 'evsel->id' is allocated with one item, and it will be used in function perf_evlist__id_add(): evsel->id[0] = id evsel->ids = 1 At the second call for do_test(), it skips to initialize 'evsel->id' and reuses the array which is allocated in the first call. But 'evsel->ids' contains the stale value. Thus: evsel->id[1] = id -> out of bound access evsel->ids = 2 To fix this issue, we will use evlist__open() and evlist__close() pair functions to prepare and cleanup context for evlist; so 'evsel->id' and 'evsel->ids' can be initialized properly when invoke do_test() and avoid the out of bounds memory access. Fixes: ee74701ed8ad ("perf tests: Add test to check backward ring buffer") Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org # v4.10+ Link: http://lore.kernel.org/lkml/20191107020244.2427-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf record: Add support for limit perf output file sizeJiwei Sun
The patch adds a new option to limit the output file size, then based on it, we can create a wrapper of the perf command that uses the option to avoid exhausting the disk space by the unconscious user. In order to make the perf.data parsable, we just limit the sample data size, since the perf.data consists of many headers and sample data and other data, the actual size of the recorded file will bigger than the setting value. Testing it: # ./perf record -a -g --max-size=10M Couldn't synthesize bpf events. [ perf record: perf size limit reached (10249 KB), stopping session ] [ perf record: Woken up 32 times to write data ] [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ] # ls -lh perf.data -rw------- 1 root root 11M Oct 22 14:32 perf.data # ./perf record -a -g --max-size=10K [ perf record: perf size limit reached (10 KB), stopping session ] Couldn't synthesize bpf events. [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ] # ls -l perf.data -rw------- 1 root root 1626952 Oct 22 14:36 perf.data Committer notes: Fixed the build in multiple distros by using PRIu64 to print u64 struct members, fixing this: builtin-record.c: In function 'record__write': builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] rec->bytes_written >> 10); ^ CC /tmp/build/pe Signed-off-by: Jiwei Sun <jiwei.sun@windriver.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Danter <richard.danter@windriver.com> Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf probe: Skip overlapped location on searching variablesMasami Hiramatsu
Since debuginfo__find_probes() callback function can be called with the location which already passed, the callback function must filter out such overlapped locations. add_probe_trace_event() has already done it by commit 1a375ae7659a ("perf probe: Skip same probe address for a given line"), but add_available_vars() doesn't. Thus perf probe -v shows same address repeatedly as below: # perf probe -V vfs_read:18 Available variables at vfs_read:18 @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+226> char* buf loff_t* pos ssize_t ret struct file* file With this fix, perf probe -V shows it correctly: # perf probe -V vfs_read:18 Available variables at vfs_read:18 @<vfs_read+217> char* buf loff_t* pos ssize_t ret struct file* file @<vfs_read+226> char* buf loff_t* pos ssize_t ret struct file* file Fixes: cf6eb489e5c0 ("perf probe: Show accessible local variables") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf probe: Fix to show calling lines of inlined functionsMasami Hiramatsu
Fix to show calling lines of inlined functions (where an inline function is called). die_walk_lines() filtered out the lines inside inlined functions based on the address. However this also filtered out the lines which call those inlined functions from the target function. To solve this issue, check the call_file and call_line attributes and do not filter out if it matches to the line information. Without this fix, perf probe -L doesn't show some lines correctly. (don't see the lines after 17) # perf probe -L vfs_read <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0> 0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) 1 { 2 ssize_t ret; 4 if (!(file->f_mode & FMODE_READ)) return -EBADF; 6 if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; 8 if (unlikely(!access_ok(buf, count))) return -EFAULT; 11 ret = rw_verify_area(READ, file, pos, count); 12 if (!ret) { 13 if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; 15 ret = __vfs_read(file, buf, count, pos); 16 if (ret > 0) { fsnotify_access(file); add_rchar(current, ret); } With this fix: # perf probe -L vfs_read <vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0> 0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) 1 { 2 ssize_t ret; 4 if (!(file->f_mode & FMODE_READ)) return -EBADF; 6 if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; 8 if (unlikely(!access_ok(buf, count))) return -EFAULT; 11 ret = rw_verify_area(READ, file, pos, count); 12 if (!ret) { 13 if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; 15 ret = __vfs_read(file, buf, count, pos); 16 if (ret > 0) { 17 fsnotify_access(file); 18 add_rchar(current, ret); } 20 inc_syscr(current); } Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf probe: Filter out instances except for inlined subroutine and subprogramMasami Hiramatsu
Filter out instances except for inlined_subroutine and subprogram DIE in die_walk_instances() and die_is_func_instance(). This fixes an issue that perf probe sets some probes on calling address instead of a target function itself. When perf probe walks on instances of an abstruct origin (a kind of function prototype of inlined function), die_walk_instances() can also pass a GNU_call_site (a GNU extension for call site) to callback. Since it is not an inlined instance of target function, we have to filter out when searching a probe point. Without this patch, perf probe sets probes on call site address too.This can happen on some function which is marked "inlined", but has actual symbol. (I'm not sure why GCC mark it "inlined"): # perf probe -D vfs_read p:probe/vfs_read _text+2500017 p:probe/vfs_read_1 _text+2499468 p:probe/vfs_read_2 _text+2499563 p:probe/vfs_read_3 _text+2498876 p:probe/vfs_read_4 _text+2498512 p:probe/vfs_read_5 _text+2498627 With this patch: Slightly different results, similar tho: # perf probe -D vfs_read p:probe/vfs_read _text+2498512 Committer testing: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Before: # perf probe -D vfs_read p:probe/vfs_read _text+3131557 p:probe/vfs_read_1 _text+3130975 p:probe/vfs_read_2 _text+3131047 p:probe/vfs_read_3 _text+3130380 p:probe/vfs_read_4 _text+3130000 # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # After: # perf probe -D vfs_read p:probe/vfs_read _text+3130000 # Fixes: db0d2c6420ee ("perf probe: Search concrete out-of-line instances") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf probe: Skip end-of-sequence and non statement linesMasami Hiramatsu
Skip end-of-sequence and non-statement lines while walking through lines list. The "end-of-sequence" line information means: "the current address is that of the first byte after the end of a sequence of target machine instructions." (DWARF version 4 spec 6.2.2) This actually means out of scope and we can not probe on it. On the other hand, the statement lines (is_stmt) means: "the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement." (DWARF version 4 spec 6.2.2) So, non-statement line info also should be skipped. These can reduce unneeded probe points and also avoid an error. E.g. without this patch: # perf probe -a "clear_tasks_mm_cpumask:1" Added new events: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1 # This puts 5 probes on one line, but acutally it's not inlined function. This is because there are many non statement instructions at the function prologue. With this patch: # perf probe -a "clear_tasks_mm_cpumask:1" Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 # Now perf-probe skips unneeded addresses. Committer testing: Slightly different results, but similar: Before: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # # perf probe -a "clear_tasks_mm_cpumask:1" Added new events: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1) probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1 # After: # perf probe -a "clear_tasks_mm_cpumask:1" Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) # Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf probe: Return a better scope DIE if there is no best scopeMasami Hiramatsu
Make find_best_scope() returns innermost DIE at given address if there is no best matched scope DIE. Since Gcc sometimes generates intuitively strange line info which is out of inlined function address range, we need this fixup. Without this, sometimes perf probe failed to probe on a line inside an inlined function: # perf probe -D ksys_open:3 Failed to find scope of probe point. Error: Failed to add events. With this fix, 'perf probe' can probe it: # perf probe -D ksys_open:3 p:probe/ksys_open _text+25707308 p:probe/ksys_open_1 _text+25710596 p:probe/ksys_open_2 _text+25711114 p:probe/ksys_open_3 _text+25711343 p:probe/ksys_open_4 _text+25714058 p:probe/ksys_open_5 _text+2819653 p:probe/ksys_open_6 _text+2819701 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>