From da422ade5c879355f7410859069ef4f957c303d4 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 23 Jul 2019 14:22:03 +0100 Subject: Documentation/features/locking: update lists The locking feature lists don't match reality as of v5.3-rc1: * arm64 moved to queued spinlocks in commit: c11090474d70590170cf5fa6afe85864ab494b37 ("arm64: locking: Replace ticket lock implementation with qspinlock") * xtensa moved to queued spinlocks and rwlocks in commit: 579afe866f52adcd921272a224ab36733051059c ("xtensa: use generic spinlock/rwlock implementation") * architecture-specific rwsem support was removed in commit: 46ad0840b1584b92b5ff2cc3ed0b011dd6b8e0f1 ("locking/rwsem: Remove arch specific rwsem files") So update the feature lists accordingly, and remove the now redundant rwsem-optimized list. Signed-off-by: Mark Rutland Signed-off-by: Jonathan Corbet --- .../locking/queued-rwlocks/arch-support.txt | 2 +- .../locking/queued-spinlocks/arch-support.txt | 4 +-- .../locking/rwsem-optimized/arch-support.txt | 34 ---------------------- 3 files changed, 3 insertions(+), 37 deletions(-) delete mode 100644 Documentation/features/locking/rwsem-optimized/arch-support.txt (limited to 'Documentation') diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt b/Documentation/features/locking/queued-rwlocks/arch-support.txt index c683da198f31..ee922746a64c 100644 --- a/Documentation/features/locking/queued-rwlocks/arch-support.txt +++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt @@ -30,5 +30,5 @@ | um: | TODO | | unicore32: | TODO | | x86: | ok | - | xtensa: | TODO | + | xtensa: | ok | ----------------------- diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt index e3080b82aefd..c52116c1a049 100644 --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | csky: | TODO | | h8300: | TODO | @@ -30,5 +30,5 @@ | um: | TODO | | unicore32: | TODO | | x86: | ok | - | xtensa: | TODO | + | xtensa: | ok | ----------------------- diff --git a/Documentation/features/locking/rwsem-optimized/arch-support.txt b/Documentation/features/locking/rwsem-optimized/arch-support.txt deleted file mode 100644 index 7521d7500fbe..000000000000 --- a/Documentation/features/locking/rwsem-optimized/arch-support.txt +++ /dev/null @@ -1,34 +0,0 @@ -# -# Feature name: rwsem-optimized -# Kconfig: !RWSEM_GENERIC_SPINLOCK -# description: arch provides optimized rwsem APIs -# - ----------------------- - | arch |status| - ----------------------- - | alpha: | ok | - | arc: | TODO | - | arm: | ok | - | arm64: | ok | - | c6x: | TODO | - | csky: | TODO | - | h8300: | TODO | - | hexagon: | TODO | - | ia64: | ok | - | m68k: | TODO | - | microblaze: | TODO | - | mips: | TODO | - | nds32: | TODO | - | nios2: | TODO | - | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | TODO | - | riscv: | TODO | - | s390: | ok | - | sh: | ok | - | sparc: | ok | - | um: | ok | - | unicore32: | TODO | - | x86: | ok | - | xtensa: | ok | - ----------------------- -- cgit From 38a449ff533c9a21c254473a9f0cf59b6f420f50 Mon Sep 17 00:00:00 2001 From: Sheriff Esseson Date: Tue, 23 Jul 2019 12:48:13 +0100 Subject: Documentation: filesystem: fix "Removed Sysctls" table the "Removed Sysctls" section is a table - bring it alive with ReST. Signed-off-by: Sheriff Esseson Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/xfs.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst index e76665a8f2f2..fb5b39f73059 100644 --- a/Documentation/admin-guide/xfs.rst +++ b/Documentation/admin-guide/xfs.rst @@ -337,11 +337,12 @@ None at present. Removed Sysctls =============== +============================= ======= Name Removed - ---- ------- +============================= ======= fs.xfs.xfsbufd_centisec v4.0 fs.xfs.age_buffer_centisecs v4.0 - +============================= ======= Error handling ============== -- cgit From c6e0396124de72d6ab7f8c8e01af7f79ee1aa698 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 23 Jul 2019 19:57:50 +0300 Subject: coda: Fix typo in the struct CodaCred documentation Documentation mistakenly refers to a different type while explaining the contents of the struct CodaCred. Fix the typo in the struct CodaCred description in the documentation. Signed-off-by: Andy Shevchenko Signed-off-by: Jonathan Corbet --- Documentation/filesystems/coda.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/coda.txt b/Documentation/filesystems/coda.txt index 545262c167c3..1711ad48e38a 100644 --- a/Documentation/filesystems/coda.txt +++ b/Documentation/filesystems/coda.txt @@ -421,14 +421,14 @@ kernel support. The CodaCred structure defines a variety of user and group ids as - they are set for the calling process. The vuid_t and guid_t are 32 bit + they are set for the calling process. The vuid_t and vgid_t are 32 bit unsigned integers. It also defines group membership in an array. On Unix the CodaCred has proven sufficient to implement good security semantics for Coda but the structure may have to undergo modification for the Windows environment when these mature. struct CodaCred { - vuid_t cr_uid, cr_euid, cr_suid, cr_fsuid; /* Real, effective, set, fs uid*/ + vuid_t cr_uid, cr_euid, cr_suid, cr_fsuid; /* Real, effective, set, fs uid */ vgid_t cr_gid, cr_egid, cr_sgid, cr_fsgid; /* same for groups */ vgid_t cr_groups[NGROUPS]; /* Group membership for caller */ }; -- cgit From 257e26c6403c23714db69d6e21c58b0d92636c4e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Wed, 31 Jul 2019 11:02:11 +0200 Subject: docs: arm: Remove orphan sh-mobile directory This directory is empty, except for a .gitignore file, listing an executable file that can no longer be built since commit c6535e1e0361157e ("Documentation: Remove ZBOOT MMC/SDHI utility and docs"). Signed-off-by: Geert Uytterhoeven Signed-off-by: Jonathan Corbet --- Documentation/arm/sh-mobile/.gitignore | 1 - 1 file changed, 1 deletion(-) delete mode 100644 Documentation/arm/sh-mobile/.gitignore (limited to 'Documentation') diff --git a/Documentation/arm/sh-mobile/.gitignore b/Documentation/arm/sh-mobile/.gitignore deleted file mode 100644 index c928dbf3cc88..000000000000 --- a/Documentation/arm/sh-mobile/.gitignore +++ /dev/null @@ -1 +0,0 @@ -vrl4 -- cgit From 803deeaaea10a87b7d92bd2eccbc637711390a04 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Mon, 29 Jul 2019 23:58:56 +0200 Subject: doc:it_IT: align translation to mainline The patch translates the following patches in Italian: 1fb12b35e5ff kbuild: Raise the minimum required binutils version to 2.21 9c3c0c204814 isdn: remove isdn4linux Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../translations/it_IT/process/changes.rst | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/translations/it_IT/process/changes.rst b/Documentation/translations/it_IT/process/changes.rst index d0874327f301..94a6499742ac 100644 --- a/Documentation/translations/it_IT/process/changes.rst +++ b/Documentation/translations/it_IT/process/changes.rst @@ -26,16 +26,15 @@ Prima di pensare d'avere trovato un baco, aggiornate i seguenti programmi usando, il comando indicato dovrebbe dirvelo. Questa lista presume che abbiate già un kernel Linux funzionante. In aggiunta, -non tutti gli strumenti sono necessari ovunque; ovviamente, se non avete un -modem ISDN, per esempio, probabilmente non dovreste preoccuparvi di -isdn4k-utils. +non tutti gli strumenti sono necessari ovunque; ovviamente, se non avete una +PC Card, per esempio, probabilmente non dovreste preoccuparvi di pcmciautils. ====================== ================= ======================================== Programma Versione minima Comando per verificare la versione ====================== ================= ======================================== GNU C 4.6 gcc --version GNU make 3.81 make --version -binutils 2.20 ld -v +binutils 2.21 ld -v flex 2.5.35 flex --version bison 2.0 bison --version util-linux 2.10o fdformat --version @@ -49,7 +48,6 @@ btrfs-progs 0.18 btrfsck pcmciautils 004 pccardctl -V quota-tools 3.09 quota -V PPP 2.4.0 pppd --version -isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version nfs-utils 1.0.5 showmount --version procps 3.2.0 ps --version oprofile 0.9 oprofiled --version @@ -81,10 +79,7 @@ Per compilare il kernel vi servirà GNU make 3.81 o successivo. Binutils -------- -Il sistema di compilazione, dalla versione 4.13, per la produzione dei passi -intermedi, si è convertito all'uso di *thin archive* (`ar T`) piuttosto che -all'uso del *linking* incrementale (`ld -r`). Questo richiede binutils 2.20 o -successivo. +Per generare il kernel è necessario avere Binutils 2.21 o superiore. pkg-config ---------- @@ -286,11 +281,6 @@ col seguente comando:: mknod /dev/ppp c 108 0 -Isdn4k-utils ------------- - -Per via della modifica del campo per il numero di telefono, il pacchetto -isdn4k-utils dev'essere ricompilato o (preferibilmente) aggiornato. NFS-utils --------- @@ -456,10 +446,6 @@ PPP - -Isdn4k-utils ------------- - -- NFS-utils --------- -- cgit From 23aa16489c06e6739c7c99e9fdccf723d2691a5d Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 26 Jul 2019 16:29:13 -0300 Subject: docs: cgroup-v1/blkio-controller.rst: remove a CFQ left over changeset fb5772cbfe48 ("blkio-controller.txt: Remove references to CFQ") removed cgroup references to CFQ, but it kept one left. Get rid of it. Fixes: fb5772cbfe48 ("blkio-controller.txt: Remove references to CFQ") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/cgroup-v1/blkio-controller.rst | 6 ------ 1 file changed, 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst index 1d7d962933be..36d43ae7dc13 100644 --- a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst +++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst @@ -130,12 +130,6 @@ Proportional weight policy files dev weight 8:16 300 -- blkio.leaf_weight[_device] - - Equivalents of blkio.weight[_device] for the purpose of - deciding how much weight tasks in the given cgroup has while - competing with the cgroup's child cgroups. For details, - please refer to Documentation/block/cfq-iosched.txt. - - blkio.time - disk time allocated to cgroup per device in milliseconds. First two fields specify the major and minor number of the device and -- cgit From 54bfe6feba0e9ead1172cf26381f7dce5df10e51 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 26 Jul 2019 16:29:14 -0300 Subject: docs: zh_CN: howto.rst: fix a broken reference There's a broken reference there pointing to the long gone DocBook dir. While I don't read chinese, Google translator translates it to: "The generated documentation will be placed in the Documentation/DocBook/ directory." Well, we know that the output will be Documentation/output dir. So, let's fix this one. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/translations/zh_CN/process/howto.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/translations/zh_CN/process/howto.rst b/Documentation/translations/zh_CN/process/howto.rst index 5b671178b17b..c4ff8356b88d 100644 --- a/Documentation/translations/zh_CN/process/howto.rst +++ b/Documentation/translations/zh_CN/process/howto.rst @@ -147,7 +147,7 @@ Linux内核代码中包含有大量的文档。这些文档对于学习如何与 关于补丁是什么以及如何将它打在不同内核开发分支上的好介绍 内核还拥有大量从代码自动生成的文档。它包含内核内部API的全面介绍以及如何 -妥善处理加锁的规则。生成的文档会放在 Documentation/DocBook/目录下。在内 +妥善处理加锁的规则。生成的文档会放在 Documentation/output/目录下。在内 核源码的主目录中使用以下不同命令将会分别生成PDF、Postscript、HTML和手册 页等不同格式的文档:: -- cgit From 638b642f82bb8ee81fe00bdbb70b5ff0885df1a5 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 26 Jul 2019 18:01:55 -0300 Subject: docs: riscv: convert boot-image-header.txt to ReST Convert this small file to ReST format by: - Using a proper markup for the document title; - marking a code block as such; - use tags for Author and date; - use tables for bit map fields. While here, fix a broken reference for a document with is planned but is not here yet. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Atish Patra Signed-off-by: Jonathan Corbet --- Documentation/riscv/boot-image-header.rst | 61 +++++++++++++++++++++++++++++++ Documentation/riscv/boot-image-header.txt | 50 ------------------------- Documentation/riscv/index.rst | 1 + 3 files changed, 62 insertions(+), 50 deletions(-) create mode 100644 Documentation/riscv/boot-image-header.rst delete mode 100644 Documentation/riscv/boot-image-header.txt (limited to 'Documentation') diff --git a/Documentation/riscv/boot-image-header.rst b/Documentation/riscv/boot-image-header.rst new file mode 100644 index 000000000000..43e9bd0731d5 --- /dev/null +++ b/Documentation/riscv/boot-image-header.rst @@ -0,0 +1,61 @@ +================================= +Boot image header in RISC-V Linux +================================= + +:Author: Atish Patra +:Date: 20 May 2019 + +This document only describes the boot image header details for RISC-V Linux. + +TODO: + Write a complete booting guide. + +The following 64-byte header is present in decompressed Linux kernel image:: + + u32 code0; /* Executable code */ + u32 code1; /* Executable code */ + u64 text_offset; /* Image load offset, little endian */ + u64 image_size; /* Effective Image size, little endian */ + u64 flags; /* kernel flags, little endian */ + u32 version; /* Version of this header */ + u32 res1 = 0; /* Reserved */ + u64 res2 = 0; /* Reserved */ + u64 magic = 0x5643534952; /* Magic number, little endian, "RISCV" */ + u32 res3; /* Reserved for additional RISC-V specific header */ + u32 res4; /* Reserved for PE COFF offset */ + +This header format is compliant with PE/COFF header and largely inspired from +ARM64 header. Thus, both ARM64 & RISC-V header can be combined into one common +header in future. + +Notes +===== + +- This header can also be reused to support EFI stub for RISC-V in future. EFI + specification needs PE/COFF image header in the beginning of the kernel image + in order to load it as an EFI application. In order to support EFI stub, + code0 should be replaced with "MZ" magic string and res5(at offset 0x3c) should + point to the rest of the PE/COFF header. + +- version field indicate header version number + + ========== ============= + Bits 0:15 Minor version + Bits 16:31 Major version + ========== ============= + + This preserves compatibility across newer and older version of the header. + The current version is defined as 0.1. + +- res3 is reserved for offset to any other additional fields. This makes the + header extendible in future. One example would be to accommodate ISA + extension for RISC-V in future. For current version, it is set to be zero. + +- In current header, the flag field has only one field. + + ===== ==================================== + Bit 0 Kernel endianness. 1 if BE, 0 if LE. + ===== ==================================== + +- Image size is mandatory for boot loader to load kernel image. Booting will + fail otherwise. diff --git a/Documentation/riscv/boot-image-header.txt b/Documentation/riscv/boot-image-header.txt deleted file mode 100644 index 1b73fea23b39..000000000000 --- a/Documentation/riscv/boot-image-header.txt +++ /dev/null @@ -1,50 +0,0 @@ - Boot image header in RISC-V Linux - ============================================= - -Author: Atish Patra -Date : 20 May 2019 - -This document only describes the boot image header details for RISC-V Linux. -The complete booting guide will be available at Documentation/riscv/booting.txt. - -The following 64-byte header is present in decompressed Linux kernel image. - - u32 code0; /* Executable code */ - u32 code1; /* Executable code */ - u64 text_offset; /* Image load offset, little endian */ - u64 image_size; /* Effective Image size, little endian */ - u64 flags; /* kernel flags, little endian */ - u32 version; /* Version of this header */ - u32 res1 = 0; /* Reserved */ - u64 res2 = 0; /* Reserved */ - u64 magic = 0x5643534952; /* Magic number, little endian, "RISCV" */ - u32 res3; /* Reserved for additional RISC-V specific header */ - u32 res4; /* Reserved for PE COFF offset */ - -This header format is compliant with PE/COFF header and largely inspired from -ARM64 header. Thus, both ARM64 & RISC-V header can be combined into one common -header in future. - -Notes: -- This header can also be reused to support EFI stub for RISC-V in future. EFI - specification needs PE/COFF image header in the beginning of the kernel image - in order to load it as an EFI application. In order to support EFI stub, - code0 should be replaced with "MZ" magic string and res5(at offset 0x3c) should - point to the rest of the PE/COFF header. - -- version field indicate header version number. - Bits 0:15 - Minor version - Bits 16:31 - Major version - - This preserves compatibility across newer and older version of the header. - The current version is defined as 0.1. - -- res3 is reserved for offset to any other additional fields. This makes the - header extendible in future. One example would be to accommodate ISA - extension for RISC-V in future. For current version, it is set to be zero. - -- In current header, the flag field has only one field. - Bit 0: Kernel endianness. 1 if BE, 0 if LE. - -- Image size is mandatory for boot loader to load kernel image. Booting will - fail otherwise. diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst index e3ca0922a8c2..215fd3c1f2d5 100644 --- a/Documentation/riscv/index.rst +++ b/Documentation/riscv/index.rst @@ -5,6 +5,7 @@ RISC-V architecture .. toctree:: :maxdepth: 1 + boot-image-header pmu .. only:: subproject and html -- cgit From e226b4f0e04f4cd5396041661a27eae5aa370bb3 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Sat, 6 Jul 2019 23:01:00 +0200 Subject: doc: email-clients miscellaneous fixes Fixed some style inconsistencies and remove old statement referring to kmail missing feature (saving email from the view window is possible). Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/process/email-clients.rst | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst index 07faa5457bcb..5273d06c8ff6 100644 --- a/Documentation/process/email-clients.rst +++ b/Documentation/process/email-clients.rst @@ -40,7 +40,7 @@ Emailed patches should be in ASCII or UTF-8 encoding only. If you configure your email client to send emails with UTF-8 encoding, you avoid some possible charset problems. -Email clients should generate and maintain References: or In-Reply-To: +Email clients should generate and maintain "References:" or "In-Reply-To:" headers so that mail threading is not broken. Copy-and-paste (or cut-and-paste) usually does not work for patches @@ -89,7 +89,7 @@ Claws Mail (GUI) Works. Some people use this successfully for patches. -To insert a patch use :menuselection:`Message-->Insert` File (:kbd:`CTRL-I`) +To insert a patch use :menuselection:`Message-->Insert File` (:kbd:`CTRL-I`) or an external editor. If the inserted patch has to be edited in the Claws composition window @@ -132,8 +132,8 @@ wrapping. At the bottom of your email, put the commonly-used patch delimiter before inserting your patch: three hyphens (``---``). -Then from the :menuselection:`Message` menu item, select insert file and -choose your patch. +Then from the :menuselection:`Message` menu item, select +:menuselection:`insert file` and choose your patch. As an added bonus you can customise the message creation toolbar menu and put the :menuselection:`insert file` icon there. @@ -149,18 +149,16 @@ patches so do not GPG sign them. Signing patches that have been inserted as inlined text will make them tricky to extract from their 7-bit encoding. If you absolutely must send patches as attachments instead of inlining -them as text, right click on the attachment and select properties, and -highlight :menuselection:`Suggest automatic display` to make the attachment +them as text, right click on the attachment and select :menuselection:`properties`, +and highlight :menuselection:`Suggest automatic display` to make the attachment inlined to make it more viewable. When saving patches that are sent as inlined text, select the email that contains the patch from the message list pane, right click and select :menuselection:`save as`. You can use the whole email unmodified as a patch -if it was properly composed. There is no option currently to save the email -when you are actually viewing it in its own window -- there has been a request -filed at kmail's bugzilla and hopefully this will be addressed. Emails are -saved as read-write for user only so you will have to chmod them to make them -group and world readable if you copy them elsewhere. +if it was properly composed. Emails are saved as read-write for user only so +you will have to chmod them to make them group and world readable if you copy +them elsewhere. Lotus Notes (GUI) ***************** -- cgit From ac841c4e457c1fae6f661108f811b554c8581976 Mon Sep 17 00:00:00 2001 From: Shobhit Kukreti Date: Wed, 10 Jul 2019 08:29:01 -0700 Subject: Documentation: filesystems: Convert jfs.txt to This converts the plain text documentation of jfs.txt to reStructuredText format. Added to documentation build process and verified with make htmldocs Signed-off-by: Shobhit Kukreti Reviewed-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/jfs.rst | 66 +++++++++++++++++++++++++++++++++++++ Documentation/filesystems/jfs.txt | 52 ----------------------------- 3 files changed, 67 insertions(+), 52 deletions(-) create mode 100644 Documentation/admin-guide/jfs.rst delete mode 100644 Documentation/filesystems/jfs.txt (limited to 'Documentation') diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 33feab2f4084..2376cb91b83c 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -78,6 +78,7 @@ configure specific aspects of kernel behavior to your liking. ext4 binderfs xfs + jfs pm/index thunderbolt LSM/index diff --git a/Documentation/admin-guide/jfs.rst b/Documentation/admin-guide/jfs.rst new file mode 100644 index 000000000000..9e12d936bc90 --- /dev/null +++ b/Documentation/admin-guide/jfs.rst @@ -0,0 +1,66 @@ +=========================================== +IBM's Journaled File System (JFS) for Linux +=========================================== + +JFS Homepage: http://jfs.sourceforge.net/ + +The following mount options are supported: + +(*) == default + +iocharset=name + Character set to use for converting from Unicode to + ASCII. The default is to do no conversion. Use + iocharset=utf8 for UTF-8 translations. This requires + CONFIG_NLS_UTF8 to be set in the kernel .config file. + iocharset=none specifies the default behavior explicitly. + +resize=value + Resize the volume to blocks. JFS only supports + growing a volume, not shrinking it. This option is only + valid during a remount, when the volume is mounted + read-write. The resize keyword with no value will grow + the volume to the full size of the partition. + +nointegrity + Do not write to the journal. The primary use of this option + is to allow for higher performance when restoring a volume + from backup media. The integrity of the volume is not + guaranteed if the system abnormally abends. + +integrity(*) + Commit metadata changes to the journal. Use this option to + remount a volume where the nointegrity option was + previously specified in order to restore normal behavior. + +errors=continue + Keep going on a filesystem error. +errors=remount-ro(*) + Remount the filesystem read-only on an error. +errors=panic + Panic and halt the machine if an error occurs. + +uid=value + Override on-disk uid with specified value +gid=value + Override on-disk gid with specified value +umask=value + Override on-disk umask with specified octal value. For + directories, the execute bit will be set if the corresponding + read bit is set. + +discard=minlen, discard/nodiscard(*) + This enables/disables the use of discard/TRIM commands. + The discard/TRIM commands are sent to the underlying + block device when blocks are freed. This is useful for SSD + devices and sparse/thinly-provisioned LUNs. The FITRIM ioctl + command is also available together with the nodiscard option. + The value of minlen specifies the minimum blockcount, when + a TRIM command to the block device is considered useful. + When no value is given to the discard option, it defaults to + 64 blocks, which means 256KiB in JFS. + The minlen value of discard overrides the minlen value given + on an FITRIM ioctl(). + +The JFS mailing list can be subscribed to by using the link labeled +"Mail list Subscribe" at our web page http://jfs.sourceforge.net/ diff --git a/Documentation/filesystems/jfs.txt b/Documentation/filesystems/jfs.txt deleted file mode 100644 index 41fd757997b3..000000000000 --- a/Documentation/filesystems/jfs.txt +++ /dev/null @@ -1,52 +0,0 @@ -IBM's Journaled File System (JFS) for Linux - -JFS Homepage: http://jfs.sourceforge.net/ - -The following mount options are supported: -(*) == default - -iocharset=name Character set to use for converting from Unicode to - ASCII. The default is to do no conversion. Use - iocharset=utf8 for UTF-8 translations. This requires - CONFIG_NLS_UTF8 to be set in the kernel .config file. - iocharset=none specifies the default behavior explicitly. - -resize=value Resize the volume to blocks. JFS only supports - growing a volume, not shrinking it. This option is only - valid during a remount, when the volume is mounted - read-write. The resize keyword with no value will grow - the volume to the full size of the partition. - -nointegrity Do not write to the journal. The primary use of this option - is to allow for higher performance when restoring a volume - from backup media. The integrity of the volume is not - guaranteed if the system abnormally abends. - -integrity(*) Commit metadata changes to the journal. Use this option to - remount a volume where the nointegrity option was - previously specified in order to restore normal behavior. - -errors=continue Keep going on a filesystem error. -errors=remount-ro(*) Remount the filesystem read-only on an error. -errors=panic Panic and halt the machine if an error occurs. - -uid=value Override on-disk uid with specified value -gid=value Override on-disk gid with specified value -umask=value Override on-disk umask with specified octal value. For - directories, the execute bit will be set if the corresponding - read bit is set. - -discard=minlen This enables/disables the use of discard/TRIM commands. -discard The discard/TRIM commands are sent to the underlying -nodiscard(*) block device when blocks are freed. This is useful for SSD - devices and sparse/thinly-provisioned LUNs. The FITRIM ioctl - command is also available together with the nodiscard option. - The value of minlen specifies the minimum blockcount, when - a TRIM command to the block device is considered useful. - When no value is given to the discard option, it defaults to - 64 blocks, which means 256KiB in JFS. - The minlen value of discard overrides the minlen value given - on an FITRIM ioctl(). - -The JFS mailing list can be subscribed to by using the link labeled -"Mail list Subscribe" at our web page http://jfs.sourceforge.net/ -- cgit From 34d5f4f269a21c7b5d64b4db0391d36511f1ac1a Mon Sep 17 00:00:00 2001 From: Shobhit Kukreti Date: Wed, 10 Jul 2019 08:31:23 -0700 Subject: Documentation: filesystems: Convert ufs.txt to reStructuredText format This converts the plain text documentation of ufs.txt to reStructuredText format. Added to documentation build process and verified with make htmldocs Signed-off-by: Shobhit Kukreti Reviewed-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/ufs.rst | 68 +++++++++++++++++++++++++++++++++++++ Documentation/filesystems/ufs.txt | 60 -------------------------------- 3 files changed, 69 insertions(+), 60 deletions(-) create mode 100644 Documentation/admin-guide/ufs.rst delete mode 100644 Documentation/filesystems/ufs.txt (limited to 'Documentation') diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 2376cb91b83c..592107a3295f 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -79,6 +79,7 @@ configure specific aspects of kernel behavior to your liking. binderfs xfs jfs + ufs pm/index thunderbolt LSM/index diff --git a/Documentation/admin-guide/ufs.rst b/Documentation/admin-guide/ufs.rst new file mode 100644 index 000000000000..55d15297f8d7 --- /dev/null +++ b/Documentation/admin-guide/ufs.rst @@ -0,0 +1,68 @@ +========= +Using UFS +========= + +mount -t ufs -o ufstype=type_of_ufs device dir + + +UFS Options +=========== + +ufstype=type_of_ufs + UFS is a file system widely used in different operating systems. + The problem are differences among implementations. Features of + some implementations are undocumented, so its hard to recognize + type of ufs automatically. That's why user must specify type of + ufs manually by mount option ufstype. Possible values are: + + old + old format of ufs + default value, supported as read-only + + 44bsd + used in FreeBSD, NetBSD, OpenBSD + supported as read-write + + ufs2 + used in FreeBSD 5.x + supported as read-write + + 5xbsd + synonym for ufs2 + + sun + used in SunOS (Solaris) + supported as read-write + + sunx86 + used in SunOS for Intel (Solarisx86) + supported as read-write + + hp + used in HP-UX + supported as read-only + + nextstep + used in NextStep + supported as read-only + + nextstep-cd + used for NextStep CDROMs (block_size == 2048) + supported as read-only + + openstep + used in OpenStep + supported as read-only + + +Possible Problems +----------------- + +See next section, if you have any. + + +Bug Reports +----------- + +Any ufs bug report you can send to daniel.pirkl@email.cz or +to dushistov@mail.ru (do not send partition tables bug reports). diff --git a/Documentation/filesystems/ufs.txt b/Documentation/filesystems/ufs.txt deleted file mode 100644 index 7a602adeca2b..000000000000 --- a/Documentation/filesystems/ufs.txt +++ /dev/null @@ -1,60 +0,0 @@ -USING UFS -========= - -mount -t ufs -o ufstype=type_of_ufs device dir - - -UFS OPTIONS -=========== - -ufstype=type_of_ufs - UFS is a file system widely used in different operating systems. - The problem are differences among implementations. Features of - some implementations are undocumented, so its hard to recognize - type of ufs automatically. That's why user must specify type of - ufs manually by mount option ufstype. Possible values are: - - old old format of ufs - default value, supported as read-only - - 44bsd used in FreeBSD, NetBSD, OpenBSD - supported as read-write - - ufs2 used in FreeBSD 5.x - supported as read-write - - 5xbsd synonym for ufs2 - - sun used in SunOS (Solaris) - supported as read-write - - sunx86 used in SunOS for Intel (Solarisx86) - supported as read-write - - hp used in HP-UX - supported as read-only - - nextstep - used in NextStep - supported as read-only - - nextstep-cd - used for NextStep CDROMs (block_size == 2048) - supported as read-only - - openstep - used in OpenStep - supported as read-only - - -POSSIBLE PROBLEMS -================= - -See next section, if you have any. - - -BUG REPORTS -=========== - -Any ufs bug report you can send to daniel.pirkl@email.cz or -to dushistov@mail.ru (do not send partition tables bug reports). -- cgit From fe13225fdc3f7b79e2921869a13386f48b30bf79 Mon Sep 17 00:00:00 2001 From: Phong Tran Date: Thu, 11 Jul 2019 23:52:01 +0700 Subject: Documentation: coresight: convert txt to rst This changes from plain text to reStructuredText as suggestion in doc-guide [1] [1] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html Some adaptations such as: literal block, ``inline literal`` and alignment text,... Signed-off-by: Phong Tran Reviewed-by: Mauro Carvalho Chehab Acked-by: Mathieu Poirier Signed-off-by: Jonathan Corbet --- Documentation/trace/coresight-cpu-debug.rst | 192 +++++++++++ Documentation/trace/coresight-cpu-debug.txt | 187 ----------- Documentation/trace/coresight.rst | 498 ++++++++++++++++++++++++++++ Documentation/trace/coresight.txt | 482 --------------------------- Documentation/trace/index.rst | 2 + 5 files changed, 692 insertions(+), 669 deletions(-) create mode 100644 Documentation/trace/coresight-cpu-debug.rst delete mode 100644 Documentation/trace/coresight-cpu-debug.txt create mode 100644 Documentation/trace/coresight.rst delete mode 100644 Documentation/trace/coresight.txt (limited to 'Documentation') diff --git a/Documentation/trace/coresight-cpu-debug.rst b/Documentation/trace/coresight-cpu-debug.rst new file mode 100644 index 000000000000..993dd294b81b --- /dev/null +++ b/Documentation/trace/coresight-cpu-debug.rst @@ -0,0 +1,192 @@ +========================== +Coresight CPU Debug Module +========================== + + :Author: Leo Yan + :Date: April 5th, 2017 + +Introduction +------------ + +Coresight CPU debug module is defined in ARMv8-a architecture reference manual +(ARM DDI 0487A.k) Chapter 'Part H: External debug', the CPU can integrate +debug module and it is mainly used for two modes: self-hosted debug and +external debug. Usually the external debug mode is well known as the external +debugger connects with SoC from JTAG port; on the other hand the program can +explore debugging method which rely on self-hosted debug mode, this document +is to focus on this part. + +The debug module provides sample-based profiling extension, which can be used +to sample CPU program counter, secure state and exception level, etc; usually +every CPU has one dedicated debug module to be connected. Based on self-hosted +debug mechanism, Linux kernel can access these related registers from mmio +region when the kernel panic happens. The callback notifier for kernel panic +will dump related registers for every CPU; finally this is good for assistant +analysis for panic. + + +Implementation +-------------- + +- During driver registration, it uses EDDEVID and EDDEVID1 - two device ID + registers to decide if sample-based profiling is implemented or not. On some + platforms this hardware feature is fully or partially implemented; and if + this feature is not supported then registration will fail. + +- At the time this documentation was written, the debug driver mainly relies on + information gathered by the kernel panic callback notifier from three + sampling registers: EDPCSR, EDVIDSR and EDCIDSR: from EDPCSR we can get + program counter; EDVIDSR has information for secure state, exception level, + bit width, etc; EDCIDSR is context ID value which contains the sampled value + of CONTEXTIDR_EL1. + +- The driver supports a CPU running in either AArch64 or AArch32 mode. The + registers naming convention is a bit different between them, AArch64 uses + 'ED' for register prefix (ARM DDI 0487A.k, chapter H9.1) and AArch32 uses + 'DBG' as prefix (ARM DDI 0487A.k, chapter G5.1). The driver is unified to + use AArch64 naming convention. + +- ARMv8-a (ARM DDI 0487A.k) and ARMv7-a (ARM DDI 0406C.b) have different + register bits definition. So the driver consolidates two difference: + + If PCSROffset=0b0000, on ARMv8-a the feature of EDPCSR is not implemented; + but ARMv7-a defines "PCSR samples are offset by a value that depends on the + instruction set state". For ARMv7-a, the driver checks furthermore if CPU + runs with ARM or thumb instruction set and calibrate PCSR value, the + detailed description for offset is in ARMv7-a ARM (ARM DDI 0406C.b) chapter + C11.11.34 "DBGPCSR, Program Counter Sampling Register". + + If PCSROffset=0b0010, ARMv8-a defines "EDPCSR implemented, and samples have + no offset applied and do not sample the instruction set state in AArch32 + state". So on ARMv8 if EDDEVID1.PCSROffset is 0b0010 and the CPU operates + in AArch32 state, EDPCSR is not sampled; when the CPU operates in AArch64 + state EDPCSR is sampled and no offset are applied. + + +Clock and power domain +---------------------- + +Before accessing debug registers, we should ensure the clock and power domain +have been enabled properly. In ARMv8-a ARM (ARM DDI 0487A.k) chapter 'H9.1 +Debug registers', the debug registers are spread into two domains: the debug +domain and the CPU domain. +:: + + +---------------+ + | | + | | + +----------+--+ | + dbg_clock -->| |**| |<-- cpu_clock + | Debug |**| CPU | + dbg_power_domain -->| |**| |<-- cpu_power_domain + +----------+--+ | + | | + | | + +---------------+ + +For debug domain, the user uses DT binding "clocks" and "power-domains" to +specify the corresponding clock source and power supply for the debug logic. +The driver calls the pm_runtime_{put|get} operations as needed to handle the +debug power domain. + +For CPU domain, the different SoC designs have different power management +schemes and finally this heavily impacts external debug module. So we can +divide into below cases: + +- On systems with a sane power controller which can behave correctly with + respect to CPU power domain, the CPU power domain can be controlled by + register EDPRCR in driver. The driver firstly writes bit EDPRCR.COREPURQ + to power up the CPU, and then writes bit EDPRCR.CORENPDRQ for emulation + of CPU power down. As result, this can ensure the CPU power domain is + powered on properly during the period when access debug related registers; + +- Some designs will power down an entire cluster if all CPUs on the cluster + are powered down - including the parts of the debug registers that should + remain powered in the debug power domain. The bits in EDPRCR are not + respected in these cases, so these designs do not support debug over + power down in the way that the CoreSight / Debug designers anticipated. + This means that even checking EDPRSR has the potential to cause a bus hang + if the target register is unpowered. + + In this case, accessing to the debug registers while they are not powered + is a recipe for disaster; so we need preventing CPU low power states at boot + time or when user enable module at the run time. Please see chapter + "How to use the module" for detailed usage info for this. + + +Device Tree Bindings +-------------------- + +See Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt for details. + + +How to use the module +--------------------- + +If you want to enable debugging functionality at boot time, you can add +"coresight_cpu_debug.enable=1" to the kernel command line parameter. + +The driver also can work as module, so can enable the debugging when insmod +module:: + + # insmod coresight_cpu_debug.ko debug=1 + +When boot time or insmod module you have not enabled the debugging, the driver +uses the debugfs file system to provide a knob to dynamically enable or disable +debugging: + +To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable:: + + # echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable + +To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable:: + + # echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable + +As explained in chapter "Clock and power domain", if you are working on one +platform which has idle states to power off debug logic and the power +controller cannot work well for the request from EDPRCR, then you should +firstly constraint CPU idle states before enable CPU debugging feature; so can +ensure the accessing to debug logic. + +If you want to limit idle states at boot time, you can use "nohlt" or +"cpuidle.off=1" in the kernel command line. + +At the runtime you can disable idle states with below methods: + +It is possible to disable CPU idle states by way of the PM QoS +subsystem, more specifically by using the "/dev/cpu_dma_latency" +interface (see Documentation/power/pm_qos_interface.rst for more +details). As specified in the PM QoS documentation the requested +parameter will stay in effect until the file descriptor is released. +For example:: + + # exec 3<> /dev/cpu_dma_latency; echo 0 >&3 + ... + Do some work... + ... + # exec 3<>- + +The same can also be done from an application program. + +Disable specific CPU's specific idle state from cpuidle sysfs (see +Documentation/admin-guide/pm/cpuidle.rst):: + + # echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable + +Output format +------------- + +Here is an example of the debugging output format:: + + ARM external debug module: + coresight-cpu-debug 850000.debug: CPU[0]: + coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) + coresight-cpu-debug 850000.debug: EDPCSR: handle_IPI+0x174/0x1d8 + coresight-cpu-debug 850000.debug: EDCIDSR: 00000000 + coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) + coresight-cpu-debug 852000.debug: CPU[1]: + coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) + coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358 + coresight-cpu-debug 852000.debug: EDCIDSR: 00000000 + coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) diff --git a/Documentation/trace/coresight-cpu-debug.txt b/Documentation/trace/coresight-cpu-debug.txt deleted file mode 100644 index 1a660a39e3c0..000000000000 --- a/Documentation/trace/coresight-cpu-debug.txt +++ /dev/null @@ -1,187 +0,0 @@ - Coresight CPU Debug Module - ========================== - - Author: Leo Yan - Date: April 5th, 2017 - -Introduction ------------- - -Coresight CPU debug module is defined in ARMv8-a architecture reference manual -(ARM DDI 0487A.k) Chapter 'Part H: External debug', the CPU can integrate -debug module and it is mainly used for two modes: self-hosted debug and -external debug. Usually the external debug mode is well known as the external -debugger connects with SoC from JTAG port; on the other hand the program can -explore debugging method which rely on self-hosted debug mode, this document -is to focus on this part. - -The debug module provides sample-based profiling extension, which can be used -to sample CPU program counter, secure state and exception level, etc; usually -every CPU has one dedicated debug module to be connected. Based on self-hosted -debug mechanism, Linux kernel can access these related registers from mmio -region when the kernel panic happens. The callback notifier for kernel panic -will dump related registers for every CPU; finally this is good for assistant -analysis for panic. - - -Implementation --------------- - -- During driver registration, it uses EDDEVID and EDDEVID1 - two device ID - registers to decide if sample-based profiling is implemented or not. On some - platforms this hardware feature is fully or partially implemented; and if - this feature is not supported then registration will fail. - -- At the time this documentation was written, the debug driver mainly relies on - information gathered by the kernel panic callback notifier from three - sampling registers: EDPCSR, EDVIDSR and EDCIDSR: from EDPCSR we can get - program counter; EDVIDSR has information for secure state, exception level, - bit width, etc; EDCIDSR is context ID value which contains the sampled value - of CONTEXTIDR_EL1. - -- The driver supports a CPU running in either AArch64 or AArch32 mode. The - registers naming convention is a bit different between them, AArch64 uses - 'ED' for register prefix (ARM DDI 0487A.k, chapter H9.1) and AArch32 uses - 'DBG' as prefix (ARM DDI 0487A.k, chapter G5.1). The driver is unified to - use AArch64 naming convention. - -- ARMv8-a (ARM DDI 0487A.k) and ARMv7-a (ARM DDI 0406C.b) have different - register bits definition. So the driver consolidates two difference: - - If PCSROffset=0b0000, on ARMv8-a the feature of EDPCSR is not implemented; - but ARMv7-a defines "PCSR samples are offset by a value that depends on the - instruction set state". For ARMv7-a, the driver checks furthermore if CPU - runs with ARM or thumb instruction set and calibrate PCSR value, the - detailed description for offset is in ARMv7-a ARM (ARM DDI 0406C.b) chapter - C11.11.34 "DBGPCSR, Program Counter Sampling Register". - - If PCSROffset=0b0010, ARMv8-a defines "EDPCSR implemented, and samples have - no offset applied and do not sample the instruction set state in AArch32 - state". So on ARMv8 if EDDEVID1.PCSROffset is 0b0010 and the CPU operates - in AArch32 state, EDPCSR is not sampled; when the CPU operates in AArch64 - state EDPCSR is sampled and no offset are applied. - - -Clock and power domain ----------------------- - -Before accessing debug registers, we should ensure the clock and power domain -have been enabled properly. In ARMv8-a ARM (ARM DDI 0487A.k) chapter 'H9.1 -Debug registers', the debug registers are spread into two domains: the debug -domain and the CPU domain. - - +---------------+ - | | - | | - +----------+--+ | - dbg_clock -->| |**| |<-- cpu_clock - | Debug |**| CPU | - dbg_power_domain -->| |**| |<-- cpu_power_domain - +----------+--+ | - | | - | | - +---------------+ - -For debug domain, the user uses DT binding "clocks" and "power-domains" to -specify the corresponding clock source and power supply for the debug logic. -The driver calls the pm_runtime_{put|get} operations as needed to handle the -debug power domain. - -For CPU domain, the different SoC designs have different power management -schemes and finally this heavily impacts external debug module. So we can -divide into below cases: - -- On systems with a sane power controller which can behave correctly with - respect to CPU power domain, the CPU power domain can be controlled by - register EDPRCR in driver. The driver firstly writes bit EDPRCR.COREPURQ - to power up the CPU, and then writes bit EDPRCR.CORENPDRQ for emulation - of CPU power down. As result, this can ensure the CPU power domain is - powered on properly during the period when access debug related registers; - -- Some designs will power down an entire cluster if all CPUs on the cluster - are powered down - including the parts of the debug registers that should - remain powered in the debug power domain. The bits in EDPRCR are not - respected in these cases, so these designs do not support debug over - power down in the way that the CoreSight / Debug designers anticipated. - This means that even checking EDPRSR has the potential to cause a bus hang - if the target register is unpowered. - - In this case, accessing to the debug registers while they are not powered - is a recipe for disaster; so we need preventing CPU low power states at boot - time or when user enable module at the run time. Please see chapter - "How to use the module" for detailed usage info for this. - - -Device Tree Bindings --------------------- - -See Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt for details. - - -How to use the module ---------------------- - -If you want to enable debugging functionality at boot time, you can add -"coresight_cpu_debug.enable=1" to the kernel command line parameter. - -The driver also can work as module, so can enable the debugging when insmod -module: -# insmod coresight_cpu_debug.ko debug=1 - -When boot time or insmod module you have not enabled the debugging, the driver -uses the debugfs file system to provide a knob to dynamically enable or disable -debugging: - -To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable: -# echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable - -To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable: -# echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable - -As explained in chapter "Clock and power domain", if you are working on one -platform which has idle states to power off debug logic and the power -controller cannot work well for the request from EDPRCR, then you should -firstly constraint CPU idle states before enable CPU debugging feature; so can -ensure the accessing to debug logic. - -If you want to limit idle states at boot time, you can use "nohlt" or -"cpuidle.off=1" in the kernel command line. - -At the runtime you can disable idle states with below methods: - -It is possible to disable CPU idle states by way of the PM QoS -subsystem, more specifically by using the "/dev/cpu_dma_latency" -interface (see Documentation/power/pm_qos_interface.rst for more -details). As specified in the PM QoS documentation the requested -parameter will stay in effect until the file descriptor is released. -For example: - -# exec 3<> /dev/cpu_dma_latency; echo 0 >&3 -... -Do some work... -... -# exec 3<>- - -The same can also be done from an application program. - -Disable specific CPU's specific idle state from cpuidle sysfs (see -Documentation/admin-guide/pm/cpuidle.rst): -# echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable - - -Output format -------------- - -Here is an example of the debugging output format: - -ARM external debug module: -coresight-cpu-debug 850000.debug: CPU[0]: -coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) -coresight-cpu-debug 850000.debug: EDPCSR: handle_IPI+0x174/0x1d8 -coresight-cpu-debug 850000.debug: EDCIDSR: 00000000 -coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) -coresight-cpu-debug 852000.debug: CPU[1]: -coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) -coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358 -coresight-cpu-debug 852000.debug: EDCIDSR: 00000000 -coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) diff --git a/Documentation/trace/coresight.rst b/Documentation/trace/coresight.rst new file mode 100644 index 000000000000..72f4b7ef1bad --- /dev/null +++ b/Documentation/trace/coresight.rst @@ -0,0 +1,498 @@ +====================================== +Coresight - HW Assisted Tracing on ARM +====================================== + + :Author: Mathieu Poirier + :Date: September 11th, 2014 + +Introduction +------------ + +Coresight is an umbrella of technologies allowing for the debugging of ARM +based SoC. It includes solutions for JTAG and HW assisted tracing. This +document is concerned with the latter. + +HW assisted tracing is becoming increasingly useful when dealing with systems +that have many SoCs and other components like GPU and DMA engines. ARM has +developed a HW assisted tracing solution by means of different components, each +being added to a design at synthesis time to cater to specific tracing needs. +Components are generally categorised as source, link and sinks and are +(usually) discovered using the AMBA bus. + +"Sources" generate a compressed stream representing the processor instruction +path based on tracing scenarios as configured by users. From there the stream +flows through the coresight system (via ATB bus) using links that are connecting +the emanating source to a sink(s). Sinks serve as endpoints to the coresight +implementation, either storing the compressed stream in a memory buffer or +creating an interface to the outside world where data can be transferred to a +host without fear of filling up the onboard coresight memory buffer. + +At typical coresight system would look like this:: + + ***************************************************************** + **************************** AMBA AXI ****************************===|| + ***************************************************************** || + ^ ^ | || + | | * ** + 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| + 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || + |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || + | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| + | # ETM # ::::: | # PTM # ::::: ::::: @ | + | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| + | |->### | ! | |->### | ! | ! . | || DAP || + | | # | ! | | # | ! | ! . | ||||||||| + | | . | ! | | . | ! | ! . | | | + | | . | ! | | . | ! | ! . | | * + | | . | ! | | . | ! | ! . | | SWD/ + | | . | ! | | . | ! | ! . | | JTAG + *****************************************************************<-| + *************************** AMBA Debug APB ************************ + ***************************************************************** + | . ! . ! ! . | + | . * . * * . | + ***************************************************************** + ******************** Cross Trigger Matrix (CTM) ******************* + ***************************************************************** + | . ^ . . | + | * ! * * | + ***************************************************************** + ****************** AMBA Advanced Trace Bus (ATB) ****************** + ***************************************************************** + | ! =============== | + | * ===== F =====<---------| + | ::::::::: ==== U ==== + |-->:: CTI ::&& ETB &&<......II I ======= + | ! &&&&&&&&& II I . + | ! I I . + | ! I REP I<.......... + | ! I I + | !!>&&&&&&&&& II I *Source: ARM ltd. + |------>& TPIU &<......II I DAP = Debug Access Port + &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell + ; PTM = Program Trace Macrocell + ; CTI = Cross Trigger Interface + * ETB = Embedded Trace Buffer + To trace port TPIU= Trace Port Interface Unit + SWD = Serial Wire Debug + +While on target configuration of the components is done via the APB bus, +all trace data are carried out-of-band on the ATB bus. The CTM provides +a way to aggregate and distribute signals between CoreSight components. + +The coresight framework provides a central point to represent, configure and +manage coresight devices on a platform. This first implementation centers on +the basic tracing functionality, enabling components such ETM/PTM, funnel, +replicator, TMC, TPIU and ETB. Future work will enable more +intricate IP blocks such as STM and CTI. + + +Acronyms and Classification +--------------------------- + +Acronyms: + +PTM: + Program Trace Macrocell +ETM: + Embedded Trace Macrocell +STM: + System trace Macrocell +ETB: + Embedded Trace Buffer +ITM: + Instrumentation Trace Macrocell +TPIU: + Trace Port Interface Unit +TMC-ETR: + Trace Memory Controller, configured as Embedded Trace Router +TMC-ETF: + Trace Memory Controller, configured as Embedded Trace FIFO +CTI: + Cross Trigger Interface + +Classification: + +Source: + ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM +Link: + Funnel, replicator (intelligent or not), TMC-ETR +Sinks: + ETBv1.0, ETB1.1, TPIU, TMC-ETF +Misc: + CTI + + +Device Tree Bindings +-------------------- + +See Documentation/devicetree/bindings/arm/coresight.txt for details. + +As of this writing drivers for ITM, STMs and CTIs are not provided but are +expected to be added as the solution matures. + + +Framework and implementation +---------------------------- + +The coresight framework provides a central point to represent, configure and +manage coresight devices on a platform. Any coresight compliant device can +register with the framework for as long as they use the right APIs: + +.. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc); +.. c:function:: void coresight_unregister(struct coresight_device *csdev); + +The registering function is taking a ``struct coresight_desc *desc`` and +register the device with the core framework. The unregister function takes +a reference to a ``struct coresight_device *csdev`` obtained at registration time. + +If everything goes well during the registration process the new devices will +show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:: + + root:~# ls /sys/bus/coresight/devices/ + replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm + 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm + root:~# + +The functions take a ``struct coresight_device``, which looks like this:: + + struct coresight_desc { + enum coresight_dev_type type; + struct coresight_dev_subtype subtype; + const struct coresight_ops *ops; + struct coresight_platform_data *pdata; + struct device *dev; + const struct attribute_group **groups; + }; + + +The "coresight_dev_type" identifies what the device is, i.e, source link or +sink while the "coresight_dev_subtype" will characterise that type further. + +The ``struct coresight_ops`` is mandatory and will tell the framework how to +perform base operations related to the components, each component having +a different set of requirement. For that ``struct coresight_ops_sink``, +``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been +provided. + +The next field ``struct coresight_platform_data *pdata`` is acquired by calling +``of_get_coresight_platform_data()``, as part of the driver's _probe routine and +``struct device *dev`` gets the device reference embedded in the ``amba_device``:: + + static int etm_probe(struct amba_device *adev, const struct amba_id *id) + { + ... + ... + drvdata->dev = &adev->dev; + ... + } + +Specific class of device (source, link, or sink) have generic operations +that can be performed on them (see ``struct coresight_ops``). The ``**groups`` +is a list of sysfs entries pertaining to operations +specific to that component only. "Implementation defined" customisations are +expected to be accessed and controlled using those entries. + +Device Naming scheme +-------------------- + +The devices that appear on the "coresight" bus were named the same as their +parent devices, i.e, the real devices that appears on AMBA bus or the platform bus. +Thus the names were based on the Linux Open Firmware layer naming convention, +which follows the base physical address of the device followed by the device +type. e.g:: + + root:~# ls /sys/bus/coresight/devices/ + 20010000.etf 20040000.funnel 20100000.stm 22040000.etm + 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu + 20070000.etr 20120000.replicator 220c0000.funnel + 23040000.etm 23140000.etm 23340000.etm + +However, with the introduction of ACPI support, the names of the real +devices are a bit cryptic and non-obvious. Thus, a new naming scheme was +introduced to use more generic names based on the type of the device. The +following rules apply:: + + 1) Devices that are bound to CPUs, are named based on the CPU logical + number. + + e.g, ETM bound to CPU0 is named "etm0" + + 2) All other devices follow a pattern, "N", where : + + - A prefix specific to the type of the device + N - a sequential number assigned based on the order + of probing. + + e.g, tmc_etf0, tmc_etr0, funnel0, funnel1 + +Thus, with the new scheme the devices could appear as :: + + root:~# ls /sys/bus/coresight/devices/ + etm0 etm1 etm2 etm3 etm4 etm5 funnel0 + funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 + +Some of the examples below might refer to old naming scheme and some +to the newer scheme, to give a confirmation that what you see on your +system is not unexpected. One must use the "names" as they appear on +the system under specified locations. + +How to use the tracer modules +----------------------------- + +There are two ways to use the Coresight framework: + +1. using the perf cmd line tools. +2. interacting directly with the Coresight devices using the sysFS interface. + +Preference is given to the former as using the sysFS interface +requires a deep understanding of the Coresight HW. The following sections +provide details on using both methods. + +1) Using the sysFS interface: + +Before trace collection can start, a coresight sink needs to be identified. +There is no limit on the amount of sinks (nor sources) that can be enabled at +any given moment. As a generic operation, all device pertaining to the sink +class will have an "active" entry in sysfs:: + + root:/sys/bus/coresight/devices# ls + replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm + 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm + root:/sys/bus/coresight/devices# ls 20010000.etb + enable_sink status trigger_cntr + root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink + root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink + 1 + root:/sys/bus/coresight/devices# + +At boot time the current etm3x driver will configure the first address +comparator with "_stext" and "_etext", essentially tracing any instruction +that falls within that range. As such "enabling" a source will immediately +trigger a trace capture:: + + root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source + root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source + 1 + root:/sys/bus/coresight/devices# cat 20010000.etb/status + Depth: 0x2000 + Status: 0x1 + RAM read ptr: 0x0 + RAM wrt ptr: 0x19d3 <----- The write pointer is moving + Trigger cnt: 0x0 + Control: 0x1 + Flush status: 0x0 + Flush ctrl: 0x2001 + root:/sys/bus/coresight/devices# + +Trace collection is stopped the same way:: + + root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source + root:/sys/bus/coresight/devices# + +The content of the ETB buffer can be harvested directly from /dev:: + + root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ + of=~/cstrace.bin + 64+0 records in + 64+0 records out + 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s + root:/sys/bus/coresight/devices# + +The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. + +Following is a DS-5 output of an experimental loop that increments a variable up +to a certain value. The example is simple and yet provides a glimpse of the +wealth of possibilities that coresight provides. +:: + + Info Tracing enabled + Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} + Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc + Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 + Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Timestamp Timestamp: 17106715833 + Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] + Instruction 0 0x8026B550 E3530004 false CMP r3,#4 + Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 + Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] + Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c + Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 + Instruction 0 0x8026B564 E1A0100D false MOV r1,sp + Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 + Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f + Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] + Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 + Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] + Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] + Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 + Info Tracing enabled + Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc + Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} + Timestamp Timestamp: 17107041535 + +2) Using perf framework: + +Coresight tracers are represented using the Perf framework's Performance +Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of +controlling when tracing gets enabled based on when the process of interest is +scheduled. When configured in a system, Coresight PMUs will be listed when +queried by the perf command line tool: + + linaro@linaro-nano:~$ ./perf list pmu + + List of pre-defined events (to be used in -e): + + cs_etm// [Kernel PMU event] + + linaro@linaro-nano:~$ + +Regardless of the number of tracers available in a system (usually equal to the +amount of processor cores), the "cs_etm" PMU will be listed only once. + +A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is +listed along with configuration options within forward slashes '/'. Since a +Coresight system will typically have more than one sink, the name of the sink to +work with needs to be specified as an event option. +On newer kernels the available sinks are listed in sysFS under +($SYSFS)/bus/event_source/devices/cs_etm/sinks/:: + + root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls + tmc_etf0 tmc_etr0 tpiu0 + +On older kernels, this may need to be found from the list of coresight devices, +available under ($SYSFS)/bus/coresight/devices/:: + + root:~# ls /sys/bus/coresight/devices/ + etm0 etm1 etm2 etm3 etm4 etm5 funnel0 + funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 + root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program + +As mentioned above in section "Device Naming scheme", the names of the devices could +look different from what is used in the example above. One must use the device names +as it appears under the sysFS. + +The syntax within the forward slashes '/' is important. The '@' character +tells the parser that a sink is about to be specified and that this is the sink +to use for the trace session. + +More information on the above and other example on how to use Coresight with +the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub +repository [#third]_. + +2.1) AutoFDO analysis using the perf tools: + +perf can be used to record and analyze trace of programs. + +Execution can be recorded using 'perf record' with the cs_etm event, +specifying the name of the sink to record to, e.g:: + + perf record -e cs_etm/@tmc_etr0/u --per-thread + +The 'perf report' and 'perf script' commands can be used to analyze execution, +synthesizing instruction and branch events from the instruction trace. +'perf inject' can be used to replace the trace data with the synthesized events. +The --itrace option controls the type and frequency of synthesized events +(see perf documentation). + +Note that only 64-bit programs are currently supported - further work is +required to support instruction decode of 32-bit Arm programs. + + +Generating coverage files for Feedback Directed Optimization: AutoFDO +--------------------------------------------------------------------- + +'perf inject' accepts the --itrace option in which case tracing data is +removed and replaced with the synthesized events. e.g. +:: + + perf inject --itrace --strip -i perf.data -o perf.data.new + +Below is an example of using ARM ETM for autoFDO. It requires autofdo +(https://github.com/google/autofdo) and gcc version 5. The bubble +sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). +:: + + $ gcc-5 -O3 sort.c -o sort + $ taskset -c 2 ./sort + Bubble sorting array of 30000 elements + 5910 ms + + $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort + Bubble sorting array of 30000 elements + 12543 ms + [ perf record: Woken up 35 times to write data ] + [ perf record: Captured and wrote 69.640 MB perf.data ] + + $ perf inject -i perf.data -o inj.data --itrace=il64 --strip + $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 + $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo + $ taskset -c 2 ./sort_autofdo + Bubble sorting array of 30000 elements + 5806 ms + + +How to use the STM module +------------------------- + +Using the System Trace Macrocell module is the same as the tracers - the only +difference is that clients are driving the trace capture rather +than the program flow through the code. + +As with any other CoreSight component, specifics about the STM tracer can be +found in sysfs with more information on each entry being found in [#first]_:: + + root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0 + enable_source hwevent_select port_enable subsystem uevent + hwevent_enable mgmt port_select traceid + root@genericarmv8:~# + +Like any other source a sink needs to be identified and the STM enabled before +being used:: + + root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink + root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source + +From there user space applications can request and use channels using the devfs +interface provided for that purpose by the generic STM API:: + + root@genericarmv8:~# ls -l /dev/stm0 + crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 + root@genericarmv8:~# + +Details on how to use the generic STM API can be found here [#second]_. + +.. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm + +.. [#second] Documentation/trace/stm.rst + +.. [#third] https://github.com/Linaro/perf-opencsd diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt deleted file mode 100644 index b027d61b27a6..000000000000 --- a/Documentation/trace/coresight.txt +++ /dev/null @@ -1,482 +0,0 @@ - Coresight - HW Assisted Tracing on ARM - ====================================== - - Author: Mathieu Poirier - Date: September 11th, 2014 - -Introduction ------------- - -Coresight is an umbrella of technologies allowing for the debugging of ARM -based SoC. It includes solutions for JTAG and HW assisted tracing. This -document is concerned with the latter. - -HW assisted tracing is becoming increasingly useful when dealing with systems -that have many SoCs and other components like GPU and DMA engines. ARM has -developed a HW assisted tracing solution by means of different components, each -being added to a design at synthesis time to cater to specific tracing needs. -Components are generally categorised as source, link and sinks and are -(usually) discovered using the AMBA bus. - -"Sources" generate a compressed stream representing the processor instruction -path based on tracing scenarios as configured by users. From there the stream -flows through the coresight system (via ATB bus) using links that are connecting -the emanating source to a sink(s). Sinks serve as endpoints to the coresight -implementation, either storing the compressed stream in a memory buffer or -creating an interface to the outside world where data can be transferred to a -host without fear of filling up the onboard coresight memory buffer. - -At typical coresight system would look like this: - - ***************************************************************** - **************************** AMBA AXI ****************************===|| - ***************************************************************** || - ^ ^ | || - | | * ** - 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| - 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || - |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || - | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| - | # ETM # ::::: | # PTM # ::::: ::::: @ | - | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| - | |->### | ! | |->### | ! | ! . | || DAP || - | | # | ! | | # | ! | ! . | ||||||||| - | | . | ! | | . | ! | ! . | | | - | | . | ! | | . | ! | ! . | | * - | | . | ! | | . | ! | ! . | | SWD/ - | | . | ! | | . | ! | ! . | | JTAG - *****************************************************************<-| - *************************** AMBA Debug APB ************************ - ***************************************************************** - | . ! . ! ! . | - | . * . * * . | - ***************************************************************** - ******************** Cross Trigger Matrix (CTM) ******************* - ***************************************************************** - | . ^ . . | - | * ! * * | - ***************************************************************** - ****************** AMBA Advanced Trace Bus (ATB) ****************** - ***************************************************************** - | ! =============== | - | * ===== F =====<---------| - | ::::::::: ==== U ==== - |-->:: CTI ::&& ETB &&<......II I ======= - | ! &&&&&&&&& II I . - | ! I I . - | ! I REP I<.......... - | ! I I - | !!>&&&&&&&&& II I *Source: ARM ltd. - |------>& TPIU &<......II I DAP = Debug Access Port - &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell - ; PTM = Program Trace Macrocell - ; CTI = Cross Trigger Interface - * ETB = Embedded Trace Buffer - To trace port TPIU= Trace Port Interface Unit - SWD = Serial Wire Debug - -While on target configuration of the components is done via the APB bus, -all trace data are carried out-of-band on the ATB bus. The CTM provides -a way to aggregate and distribute signals between CoreSight components. - -The coresight framework provides a central point to represent, configure and -manage coresight devices on a platform. This first implementation centers on -the basic tracing functionality, enabling components such ETM/PTM, funnel, -replicator, TMC, TPIU and ETB. Future work will enable more -intricate IP blocks such as STM and CTI. - - -Acronyms and Classification ---------------------------- - -Acronyms: - -PTM: Program Trace Macrocell -ETM: Embedded Trace Macrocell -STM: System trace Macrocell -ETB: Embedded Trace Buffer -ITM: Instrumentation Trace Macrocell -TPIU: Trace Port Interface Unit -TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router -TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO -CTI: Cross Trigger Interface - -Classification: - -Source: - ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM -Link: - Funnel, replicator (intelligent or not), TMC-ETR -Sinks: - ETBv1.0, ETB1.1, TPIU, TMC-ETF -Misc: - CTI - - -Device Tree Bindings ----------------------- - -See Documentation/devicetree/bindings/arm/coresight.txt for details. - -As of this writing drivers for ITM, STMs and CTIs are not provided but are -expected to be added as the solution matures. - - -Framework and implementation ----------------------------- - -The coresight framework provides a central point to represent, configure and -manage coresight devices on a platform. Any coresight compliant device can -register with the framework for as long as they use the right APIs: - -struct coresight_device *coresight_register(struct coresight_desc *desc); -void coresight_unregister(struct coresight_device *csdev); - -The registering function is taking a "struct coresight_device *csdev" and -register the device with the core framework. The unregister function takes -a reference to a "struct coresight_device", obtained at registration time. - -If everything goes well during the registration process the new devices will -show up under /sys/bus/coresight/devices, as showns here for a TC2 platform: - -root:~# ls /sys/bus/coresight/devices/ -replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm -20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm -root:~# - -The functions take a "struct coresight_device", which looks like this: - -struct coresight_desc { - enum coresight_dev_type type; - struct coresight_dev_subtype subtype; - const struct coresight_ops *ops; - struct coresight_platform_data *pdata; - struct device *dev; - const struct attribute_group **groups; -}; - - -The "coresight_dev_type" identifies what the device is, i.e, source link or -sink while the "coresight_dev_subtype" will characterise that type further. - -The "struct coresight_ops" is mandatory and will tell the framework how to -perform base operations related to the components, each component having -a different set of requirement. For that "struct coresight_ops_sink", -"struct coresight_ops_link" and "struct coresight_ops_source" have been -provided. - -The next field, "struct coresight_platform_data *pdata" is acquired by calling -"of_get_coresight_platform_data()", as part of the driver's _probe routine and -"struct device *dev" gets the device reference embedded in the "amba_device": - -static int etm_probe(struct amba_device *adev, const struct amba_id *id) -{ - ... - ... - drvdata->dev = &adev->dev; - ... -} - -Specific class of device (source, link, or sink) have generic operations -that can be performed on them (see "struct coresight_ops"). The -"**groups" is a list of sysfs entries pertaining to operations -specific to that component only. "Implementation defined" customisations are -expected to be accessed and controlled using those entries. - - -Device Naming scheme ------------------------- -The devices that appear on the "coresight" bus were named the same as their -parent devices, i.e, the real devices that appears on AMBA bus or the platform bus. -Thus the names were based on the Linux Open Firmware layer naming convention, -which follows the base physical address of the device followed by the device -type. e.g: - -root:~# ls /sys/bus/coresight/devices/ - 20010000.etf 20040000.funnel 20100000.stm 22040000.etm - 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu - 20070000.etr 20120000.replicator 220c0000.funnel - 23040000.etm 23140000.etm 23340000.etm - -However, with the introduction of ACPI support, the names of the real -devices are a bit cryptic and non-obvious. Thus, a new naming scheme was -introduced to use more generic names based on the type of the device. The -following rules apply: - - 1) Devices that are bound to CPUs, are named based on the CPU logical - number. - - e.g, ETM bound to CPU0 is named "etm0" - - 2) All other devices follow a pattern, "N", where : - - - A prefix specific to the type of the device - N - a sequential number assigned based on the order - of probing. - - e.g, tmc_etf0, tmc_etr0, funnel0, funnel1 - -Thus, with the new scheme the devices could appear as : - -root:~# ls /sys/bus/coresight/devices/ - etm0 etm1 etm2 etm3 etm4 etm5 funnel0 - funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 - -Some of the examples below might refer to old naming scheme and some -to the newer scheme, to give a confirmation that what you see on your -system is not unexpected. One must use the "names" as they appear on -the system under specified locations. - -How to use the tracer modules ------------------------------ - -There are two ways to use the Coresight framework: 1) using the perf cmd line -tools and 2) interacting directly with the Coresight devices using the sysFS -interface. Preference is given to the former as using the sysFS interface -requires a deep understanding of the Coresight HW. The following sections -provide details on using both methods. - -1) Using the sysFS interface: - -Before trace collection can start, a coresight sink needs to be identified. -There is no limit on the amount of sinks (nor sources) that can be enabled at -any given moment. As a generic operation, all device pertaining to the sink -class will have an "active" entry in sysfs: - -root:/sys/bus/coresight/devices# ls -replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm -20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm -root:/sys/bus/coresight/devices# ls 20010000.etb -enable_sink status trigger_cntr -root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink -root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink -1 -root:/sys/bus/coresight/devices# - -At boot time the current etm3x driver will configure the first address -comparator with "_stext" and "_etext", essentially tracing any instruction -that falls within that range. As such "enabling" a source will immediately -trigger a trace capture: - -root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source -root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source -1 -root:/sys/bus/coresight/devices# cat 20010000.etb/status -Depth: 0x2000 -Status: 0x1 -RAM read ptr: 0x0 -RAM wrt ptr: 0x19d3 <----- The write pointer is moving -Trigger cnt: 0x0 -Control: 0x1 -Flush status: 0x0 -Flush ctrl: 0x2001 -root:/sys/bus/coresight/devices# - -Trace collection is stopped the same way: - -root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source -root:/sys/bus/coresight/devices# - -The content of the ETB buffer can be harvested directly from /dev: - -root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ -of=~/cstrace.bin - -64+0 records in -64+0 records out -32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s -root:/sys/bus/coresight/devices# - -The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. - -Following is a DS-5 output of an experimental loop that increments a variable up -to a certain value. The example is simple and yet provides a glimpse of the -wealth of possibilities that coresight provides. - -Info Tracing enabled -Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} -Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc -Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 -Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Timestamp Timestamp: 17106715833 -Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] -Instruction 0 0x8026B550 E3530004 false CMP r3,#4 -Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 -Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] -Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c -Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 -Instruction 0 0x8026B564 E1A0100D false MOV r1,sp -Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 -Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f -Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] -Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 -Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] -Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] -Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 -Info Tracing enabled -Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc -Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} -Timestamp Timestamp: 17107041535 - -2) Using perf framework: - -Coresight tracers are represented using the Perf framework's Performance -Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of -controlling when tracing gets enabled based on when the process of interest is -scheduled. When configured in a system, Coresight PMUs will be listed when -queried by the perf command line tool: - - linaro@linaro-nano:~$ ./perf list pmu - - List of pre-defined events (to be used in -e): - - cs_etm// [Kernel PMU event] - - linaro@linaro-nano:~$ - -Regardless of the number of tracers available in a system (usually equal to the -amount of processor cores), the "cs_etm" PMU will be listed only once. - -A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is -listed along with configuration options within forward slashes '/'. Since a -Coresight system will typically have more than one sink, the name of the sink to -work with needs to be specified as an event option. -On newer kernels the available sinks are listed in sysFS under: -($SYSFS)/bus/event_source/devices/cs_etm/sinks/ - - root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls - tmc_etf0 tmc_etr0 tpiu0 - -On older kernels, this may need to be found from the list of coresight devices, -available under ($SYSFS)/bus/coresight/devices/: - - root:~# ls /sys/bus/coresight/devices/ - etm0 etm1 etm2 etm3 etm4 etm5 funnel0 - funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 - - root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program - -As mentioned above in section "Device Naming scheme", the names of the devices could -look different from what is used in the example above. One must use the device names -as it appears under the sysFS. - -The syntax within the forward slashes '/' is important. The '@' character -tells the parser that a sink is about to be specified and that this is the sink -to use for the trace session. - -More information on the above and other example on how to use Coresight with -the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub -repository [3]. - -2.1) AutoFDO analysis using the perf tools: - -perf can be used to record and analyze trace of programs. - -Execution can be recorded using 'perf record' with the cs_etm event, -specifying the name of the sink to record to, e.g: - - perf record -e cs_etm/@tmc_etr0/u --per-thread - -The 'perf report' and 'perf script' commands can be used to analyze execution, -synthesizing instruction and branch events from the instruction trace. -'perf inject' can be used to replace the trace data with the synthesized events. -The --itrace option controls the type and frequency of synthesized events -(see perf documentation). - -Note that only 64-bit programs are currently supported - further work is -required to support instruction decode of 32-bit Arm programs. - - -Generating coverage files for Feedback Directed Optimization: AutoFDO ---------------------------------------------------------------------- - -'perf inject' accepts the --itrace option in which case tracing data is -removed and replaced with the synthesized events. e.g. - - perf inject --itrace --strip -i perf.data -o perf.data.new - -Below is an example of using ARM ETM for autoFDO. It requires autofdo -(https://github.com/google/autofdo) and gcc version 5. The bubble -sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). - - $ gcc-5 -O3 sort.c -o sort - $ taskset -c 2 ./sort - Bubble sorting array of 30000 elements - 5910 ms - - $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort - Bubble sorting array of 30000 elements - 12543 ms - [ perf record: Woken up 35 times to write data ] - [ perf record: Captured and wrote 69.640 MB perf.data ] - - $ perf inject -i perf.data -o inj.data --itrace=il64 --strip - $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 - $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo - $ taskset -c 2 ./sort_autofdo - Bubble sorting array of 30000 elements - 5806 ms - - -How to use the STM module -------------------------- - -Using the System Trace Macrocell module is the same as the tracers - the only -difference is that clients are driving the trace capture rather -than the program flow through the code. - -As with any other CoreSight component, specifics about the STM tracer can be -found in sysfs with more information on each entry being found in [1]: - -root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0 -enable_source hwevent_select port_enable subsystem uevent -hwevent_enable mgmt port_select traceid -root@genericarmv8:~# - -Like any other source a sink needs to be identified and the STM enabled before -being used: - -root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink -root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source - -From there user space applications can request and use channels using the devfs -interface provided for that purpose by the generic STM API: - -root@genericarmv8:~# ls -l /dev/stm0 -crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 -root@genericarmv8:~# - -Details on how to use the generic STM API can be found here [2]. - -[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm -[2]. Documentation/trace/stm.rst -[3]. https://github.com/Linaro/perf-opencsd diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 6b4107cf4b98..b7891cb1ab4d 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -23,3 +23,5 @@ Linux Tracing Technologies intel_th stm sys-t + coresight + coresight-cpu-debug -- cgit From eaf7b46083a7e341a23ab3d6042e0ccc115b0914 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 26 Jul 2019 09:51:12 -0300 Subject: docs: thermal: add it to the driver API The file contents mostly describes driver internals. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/driver-api/index.rst | 1 + .../driver-api/thermal/cpu-cooling-api.rst | 107 +++ .../driver-api/thermal/exynos_thermal.rst | 90 +++ .../thermal/exynos_thermal_emulation.rst | 61 ++ Documentation/driver-api/thermal/index.rst | 18 + .../driver-api/thermal/intel_powerclamp.rst | 320 +++++++++ .../driver-api/thermal/nouveau_thermal.rst | 96 +++ .../driver-api/thermal/power_allocator.rst | 271 +++++++ Documentation/driver-api/thermal/sysfs-api.rst | 798 +++++++++++++++++++++ .../thermal/x86_pkg_temperature_thermal.rst | 55 ++ Documentation/thermal/cpu-cooling-api.rst | 107 --- Documentation/thermal/exynos_thermal.rst | 90 --- Documentation/thermal/exynos_thermal_emulation.rst | 61 -- Documentation/thermal/index.rst | 18 - Documentation/thermal/intel_powerclamp.rst | 320 --------- Documentation/thermal/nouveau_thermal.rst | 96 --- Documentation/thermal/power_allocator.rst | 271 ------- Documentation/thermal/sysfs-api.rst | 798 --------------------- .../thermal/x86_pkg_temperature_thermal.rst | 55 -- 19 files changed, 1817 insertions(+), 1816 deletions(-) create mode 100644 Documentation/driver-api/thermal/cpu-cooling-api.rst create mode 100644 Documentation/driver-api/thermal/exynos_thermal.rst create mode 100644 Documentation/driver-api/thermal/exynos_thermal_emulation.rst create mode 100644 Documentation/driver-api/thermal/index.rst create mode 100644 Documentation/driver-api/thermal/intel_powerclamp.rst create mode 100644 Documentation/driver-api/thermal/nouveau_thermal.rst create mode 100644 Documentation/driver-api/thermal/power_allocator.rst create mode 100644 Documentation/driver-api/thermal/sysfs-api.rst create mode 100644 Documentation/driver-api/thermal/x86_pkg_temperature_thermal.rst delete mode 100644 Documentation/thermal/cpu-cooling-api.rst delete mode 100644 Documentation/thermal/exynos_thermal.rst delete mode 100644 Documentation/thermal/exynos_thermal_emulation.rst delete mode 100644 Documentation/thermal/index.rst delete mode 100644 Documentation/thermal/intel_powerclamp.rst delete mode 100644 Documentation/thermal/nouveau_thermal.rst delete mode 100644 Documentation/thermal/power_allocator.rst delete mode 100644 Documentation/thermal/sysfs-api.rst delete mode 100644 Documentation/thermal/x86_pkg_temperature_thermal.rst (limited to 'Documentation') diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index d12a80f386a6..37ac052ded85 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -65,6 +65,7 @@ available subsections can be seen below. dmaengine/index slimbus soundwire/index + thermal/index fpga/index acpi/index backlight/lp855x-driver.rst diff --git a/Documentation/driver-api/thermal/cpu-cooling-api.rst b/Documentation/driver-api/thermal/cpu-cooling-api.rst new file mode 100644 index 000000000000..645d914c45a6 --- /dev/null +++ b/Documentation/driver-api/thermal/cpu-cooling-api.rst @@ -0,0 +1,107 @@ +======================= +CPU cooling APIs How To +======================= + +Written by Amit Daniel Kachhap + +Updated: 6 Jan 2015 + +Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com) + +0. Introduction +=============== + +The generic cpu cooling(freq clipping) provides registration/unregistration APIs +to the caller. The binding of the cooling devices to the trip point is left for +the user. The registration APIs returns the cooling device pointer. + +1. cpu cooling APIs +=================== + +1.1 cpufreq registration/unregistration APIs +-------------------------------------------- + + :: + + struct thermal_cooling_device + *cpufreq_cooling_register(struct cpumask *clip_cpus) + + This interface function registers the cpufreq cooling device with the name + "thermal-cpufreq-%x". This api can support multiple instances of cpufreq + cooling devices. + + clip_cpus: + cpumask of cpus where the frequency constraints will happen. + + :: + + struct thermal_cooling_device + *of_cpufreq_cooling_register(struct cpufreq_policy *policy) + + This interface function registers the cpufreq cooling device with + the name "thermal-cpufreq-%x" linking it with a device tree node, in + order to bind it via the thermal DT code. This api can support multiple + instances of cpufreq cooling devices. + + policy: + CPUFreq policy. + + + :: + + void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) + + This interface function unregisters the "thermal-cpufreq-%x" cooling device. + + cdev: Cooling device pointer which has to be unregistered. + +2. Power models +=============== + +The power API registration functions provide a simple power model for +CPUs. The current power is calculated as dynamic power (static power isn't +supported currently). This power model requires that the operating-points of +the CPUs are registered using the kernel's opp library and the +`cpufreq_frequency_table` is assigned to the `struct device` of the +cpu. If you are using CONFIG_CPUFREQ_DT then the +`cpufreq_frequency_table` should already be assigned to the cpu +device. + +The dynamic power consumption of a processor depends on many factors. +For a given processor implementation the primary factors are: + +- The time the processor spends running, consuming dynamic power, as + compared to the time in idle states where dynamic consumption is + negligible. Herein we refer to this as 'utilisation'. +- The voltage and frequency levels as a result of DVFS. The DVFS + level is a dominant factor governing power consumption. +- In running time the 'execution' behaviour (instruction types, memory + access patterns and so forth) causes, in most cases, a second order + variation. In pathological cases this variation can be significant, + but typically it is of a much lesser impact than the factors above. + +A high level dynamic power consumption model may then be represented as:: + + Pdyn = f(run) * Voltage^2 * Frequency * Utilisation + +f(run) here represents the described execution behaviour and its +result has a units of Watts/Hz/Volt^2 (this often expressed in +mW/MHz/uVolt^2) + +The detailed behaviour for f(run) could be modelled on-line. However, +in practice, such an on-line model has dependencies on a number of +implementation specific processor support and characterisation +factors. Therefore, in initial implementation that contribution is +represented as a constant coefficient. This is a simplification +consistent with the relative contribution to overall power variation. + +In this simplified representation our model becomes:: + + Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation + +Where `capacitance` is a constant that represents an indicative +running time dynamic power coefficient in fundamental units of +mW/MHz/uVolt^2. Typical values for mobile CPUs might lie in range +from 100 to 500. For reference, the approximate values for the SoC in +ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and +140 for the Cortex-A53 cluster. diff --git a/Documentation/driver-api/thermal/exynos_thermal.rst b/Documentation/driver-api/thermal/exynos_thermal.rst new file mode 100644 index 000000000000..5bd556566c70 --- /dev/null +++ b/Documentation/driver-api/thermal/exynos_thermal.rst @@ -0,0 +1,90 @@ +======================== +Kernel driver exynos_tmu +======================== + +Supported chips: + +* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC + + Datasheet: Not publicly available + +Authors: Donggeun Kim +Authors: Amit Daniel + +TMU controller Description: +--------------------------- + +This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC. + +The chip only exposes the measured 8-bit temperature code value +through a register. +Temperature can be taken from the temperature code. +There are three equations converting from temperature to temperature code. + +The three equations are: + 1. Two point trimming:: + + Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1 + + 2. One point trimming:: + + Tc = T + TI1 - 25 + + 3. No trimming:: + + Tc = T + 50 + + Tc: + Temperature code, T: Temperature, + TI1: + Trimming info for 25 degree Celsius (stored at TRIMINFO register) + Temperature code measured at 25 degree Celsius which is unchanged + TI2: + Trimming info for 85 degree Celsius (stored at TRIMINFO register) + Temperature code measured at 85 degree Celsius which is unchanged + +TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt +when temperature exceeds pre-defined levels. +The maximum number of configurable threshold is five. +The threshold levels are defined as follows:: + + Level_0: current temperature > trigger_level_0 + threshold + Level_1: current temperature > trigger_level_1 + threshold + Level_2: current temperature > trigger_level_2 + threshold + Level_3: current temperature > trigger_level_3 + threshold + +The threshold and each trigger_level are set +through the corresponding registers. + +When an interrupt occurs, this driver notify kernel thermal framework +with the function exynos_report_trigger. +Although an interrupt condition for level_0 can be set, +it can be used to synchronize the cooling action. + +TMU driver description: +----------------------- + +The exynos thermal driver is structured as:: + + Kernel Core thermal framework + (thermal_core.c, step_wise.c, cpu_cooling.c) + ^ + | + | + TMU configuration data -----> TMU Driver <----> Exynos Core thermal wrapper + (exynos_tmu_data.c) (exynos_tmu.c) (exynos_thermal_common.c) + (exynos_tmu_data.h) (exynos_tmu.h) (exynos_thermal_common.h) + +a) TMU configuration data: + This consist of TMU register offsets/bitfields + described through structure exynos_tmu_registers. Also several + other platform data (struct exynos_tmu_platform_data) members + are used to configure the TMU. +b) TMU driver: + This component initialises the TMU controller and sets different + thresholds. It invokes core thermal implementation with the call + exynos_report_trigger. +c) Exynos Core thermal wrapper: + This provides 3 wrapper function to use the + Kernel core thermal framework. They are exynos_unregister_thermal, + exynos_register_thermal and exynos_report_trigger. diff --git a/Documentation/driver-api/thermal/exynos_thermal_emulation.rst b/Documentation/driver-api/thermal/exynos_thermal_emulation.rst new file mode 100644 index 000000000000..c21d10838bc5 --- /dev/null +++ b/Documentation/driver-api/thermal/exynos_thermal_emulation.rst @@ -0,0 +1,61 @@ +===================== +Exynos Emulation Mode +===================== + +Copyright (C) 2012 Samsung Electronics + +Written by Jonghwa Lee + +Description +----------- + +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal +management unit. Thermal emulation mode supports software debug for +TMU's operation. User can set temperature manually with software code +and TMU will read current temperature from user value not from sensor's +value. + +Enabling CONFIG_THERMAL_EMULATION option will make this support +available. When it's enabled, sysfs node will be created as +/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp. + +The sysfs node, 'emul_node', will contain value 0 for the initial state. +When you input any temperature you want to update to sysfs node, it +automatically enable emulation mode and current temperature will be +changed into it. + +(Exynos also supports user changeable delay time which would be used to +delay of changing temperature. However, this node only uses same delay +of real sensing time, 938us.) + +Exynos emulation mode requires synchronous of value changing and +enabling. It means when you want to update the any value of delay or +next temperature, then you have to enable emulation mode at the same +time. (Or you have to keep the mode enabling.) If you don't, it fails to +change the value to updated one and just use last succeessful value +repeatedly. That's why this node gives users the right to change +termerpature only. Just one interface makes it more simply to use. + +Disabling emulation mode only requires writing value 0 to sysfs node. + +:: + + + TEMP 120 | + | + 100 | + | + 80 | + | +----------- + 60 | | | + | +-------------| | + 40 | | | | + | | | | + 20 | | | +---------- + | | | | | + 0 |______________|_____________|__________|__________|_________ + A A A A TIME + |<----->| |<----->| |<----->| | + | 938us | | | | | | + emulation : 0 50 | 70 | 20 | 0 + current temp: sensor 50 70 20 sensor diff --git a/Documentation/driver-api/thermal/index.rst b/Documentation/driver-api/thermal/index.rst new file mode 100644 index 000000000000..5ba61d19c6ae --- /dev/null +++ b/Documentation/driver-api/thermal/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======= +Thermal +======= + +.. toctree:: + :maxdepth: 1 + + cpu-cooling-api + sysfs-api + power_allocator + + exynos_thermal + exynos_thermal_emulation + intel_powerclamp + nouveau_thermal + x86_pkg_temperature_thermal diff --git a/Documentation/driver-api/thermal/intel_powerclamp.rst b/Documentation/driver-api/thermal/intel_powerclamp.rst new file mode 100644 index 000000000000..3f6dfb0b3ea6 --- /dev/null +++ b/Documentation/driver-api/thermal/intel_powerclamp.rst @@ -0,0 +1,320 @@ +======================= +Intel Powerclamp Driver +======================= + +By: + - Arjan van de Ven + - Jacob Pan + +.. Contents: + + (*) Introduction + - Goals and Objectives + + (*) Theory of Operation + - Idle Injection + - Calibration + + (*) Performance Analysis + - Effectiveness and Limitations + - Power vs Performance + - Scalability + - Calibration + - Comparison with Alternative Techniques + + (*) Usage and Interfaces + - Generic Thermal Layer (sysfs) + - Kernel APIs (TBD) + +INTRODUCTION +============ + +Consider the situation where a system’s power consumption must be +reduced at runtime, due to power budget, thermal constraint, or noise +level, and where active cooling is not preferred. Software managed +passive power reduction must be performed to prevent the hardware +actions that are designed for catastrophic scenarios. + +Currently, P-states, T-states (clock modulation), and CPU offlining +are used for CPU throttling. + +On Intel CPUs, C-states provide effective power reduction, but so far +they’re only used opportunistically, based on workload. With the +development of intel_powerclamp driver, the method of synchronizing +idle injection across all online CPU threads was introduced. The goal +is to achieve forced and controllable C-state residency. + +Test/Analysis has been made in the areas of power, performance, +scalability, and user experience. In many cases, clear advantage is +shown over taking the CPU offline or modulating the CPU clock. + + +THEORY OF OPERATION +=================== + +Idle Injection +-------------- + +On modern Intel processors (Nehalem or later), package level C-state +residency is available in MSRs, thus also available to the kernel. + +These MSRs are:: + + #define MSR_PKG_C2_RESIDENCY 0x60D + #define MSR_PKG_C3_RESIDENCY 0x3F8 + #define MSR_PKG_C6_RESIDENCY 0x3F9 + #define MSR_PKG_C7_RESIDENCY 0x3FA + +If the kernel can also inject idle time to the system, then a +closed-loop control system can be established that manages package +level C-state. The intel_powerclamp driver is conceived as such a +control system, where the target set point is a user-selected idle +ratio (based on power reduction), and the error is the difference +between the actual package level C-state residency ratio and the target idle +ratio. + +Injection is controlled by high priority kernel threads, spawned for +each online CPU. + +These kernel threads, with SCHED_FIFO class, are created to perform +clamping actions of controlled duty ratio and duration. Each per-CPU +thread synchronizes its idle time and duration, based on the rounding +of jiffies, so accumulated errors can be prevented to avoid a jittery +effect. Threads are also bound to the CPU such that they cannot be +migrated, unless the CPU is taken offline. In this case, threads +belong to the offlined CPUs will be terminated immediately. + +Running as SCHED_FIFO and relatively high priority, also allows such +scheme to work for both preemptable and non-preemptable kernels. +Alignment of idle time around jiffies ensures scalability for HZ +values. This effect can be better visualized using a Perf timechart. +The following diagram shows the behavior of kernel thread +kidle_inject/cpu. During idle injection, it runs monitor/mwait idle +for a given "duration", then relinquishes the CPU to other tasks, +until the next time interval. + +The NOHZ schedule tick is disabled during idle time, but interrupts +are not masked. Tests show that the extra wakeups from scheduler tick +have a dramatic impact on the effectiveness of the powerclamp driver +on large scale systems (Westmere system with 80 processors). + +:: + + CPU0 + ____________ ____________ + kidle_inject/0 | sleep | mwait | sleep | + _________| |________| |_______ + duration + CPU1 + ____________ ____________ + kidle_inject/1 | sleep | mwait | sleep | + _________| |________| |_______ + ^ + | + | + roundup(jiffies, interval) + +Only one CPU is allowed to collect statistics and update global +control parameters. This CPU is referred to as the controlling CPU in +this document. The controlling CPU is elected at runtime, with a +policy that favors BSP, taking into account the possibility of a CPU +hot-plug. + +In terms of dynamics of the idle control system, package level idle +time is considered largely as a non-causal system where its behavior +cannot be based on the past or current input. Therefore, the +intel_powerclamp driver attempts to enforce the desired idle time +instantly as given input (target idle ratio). After injection, +powerclamp monitors the actual idle for a given time window and adjust +the next injection accordingly to avoid over/under correction. + +When used in a causal control system, such as a temperature control, +it is up to the user of this driver to implement algorithms where +past samples and outputs are included in the feedback. For example, a +PID-based thermal controller can use the powerclamp driver to +maintain a desired target temperature, based on integral and +derivative gains of the past samples. + + + +Calibration +----------- +During scalability testing, it is observed that synchronized actions +among CPUs become challenging as the number of cores grows. This is +also true for the ability of a system to enter package level C-states. + +To make sure the intel_powerclamp driver scales well, online +calibration is implemented. The goals for doing such a calibration +are: + +a) determine the effective range of idle injection ratio +b) determine the amount of compensation needed at each target ratio + +Compensation to each target ratio consists of two parts: + + a) steady state error compensation + This is to offset the error occurring when the system can + enter idle without extra wakeups (such as external interrupts). + + b) dynamic error compensation + When an excessive amount of wakeups occurs during idle, an + additional idle ratio can be added to quiet interrupts, by + slowing down CPU activities. + +A debugfs file is provided for the user to examine compensation +progress and results, such as on a Westmere system:: + + [jacob@nex01 ~]$ cat + /sys/kernel/debug/intel_powerclamp/powerclamp_calib + controlling cpu: 0 + pct confidence steady dynamic (compensation) + 0 0 0 0 + 1 1 0 0 + 2 1 1 0 + 3 3 1 0 + 4 3 1 0 + 5 3 1 0 + 6 3 1 0 + 7 3 1 0 + 8 3 1 0 + ... + 30 3 2 0 + 31 3 2 0 + 32 3 1 0 + 33 3 2 0 + 34 3 1 0 + 35 3 2 0 + 36 3 1 0 + 37 3 2 0 + 38 3 1 0 + 39 3 2 0 + 40 3 3 0 + 41 3 1 0 + 42 3 2 0 + 43 3 1 0 + 44 3 1 0 + 45 3 2 0 + 46 3 3 0 + 47 3 0 0 + 48 3 2 0 + 49 3 3 0 + +Calibration occurs during runtime. No offline method is available. +Steady state compensation is used only when confidence levels of all +adjacent ratios have reached satisfactory level. A confidence level +is accumulated based on clean data collected at runtime. Data +collected during a period without extra interrupts is considered +clean. + +To compensate for excessive amounts of wakeup during idle, additional +idle time is injected when such a condition is detected. Currently, +we have a simple algorithm to double the injection ratio. A possible +enhancement might be to throttle the offending IRQ, such as delaying +EOI for level triggered interrupts. But it is a challenge to be +non-intrusive to the scheduler or the IRQ core code. + + +CPU Online/Offline +------------------ +Per-CPU kernel threads are started/stopped upon receiving +notifications of CPU hotplug activities. The intel_powerclamp driver +keeps track of clamping kernel threads, even after they are migrated +to other CPUs, after a CPU offline event. + + +Performance Analysis +==================== +This section describes the general performance data collected on +multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P). + +Effectiveness and Limitations +----------------------------- +The maximum range that idle injection is allowed is capped at 50 +percent. As mentioned earlier, since interrupts are allowed during +forced idle time, excessive interrupts could result in less +effectiveness. The extreme case would be doing a ping -f to generated +flooded network interrupts without much CPU acknowledgement. In this +case, little can be done from the idle injection threads. In most +normal cases, such as scp a large file, applications can be throttled +by the powerclamp driver, since slowing down the CPU also slows down +network protocol processing, which in turn reduces interrupts. + +When control parameters change at runtime by the controlling CPU, it +may take an additional period for the rest of the CPUs to catch up +with the changes. During this time, idle injection is out of sync, +thus not able to enter package C- states at the expected ratio. But +this effect is minor, in that in most cases change to the target +ratio is updated much less frequently than the idle injection +frequency. + +Scalability +----------- +Tests also show a minor, but measurable, difference between the 4P/8P +Ivy Bridge system and the 80P Westmere server under 50% idle ratio. +More compensation is needed on Westmere for the same amount of +target idle ratio. The compensation also increases as the idle ratio +gets larger. The above reason constitutes the need for the +calibration code. + +On the IVB 8P system, compared to an offline CPU, powerclamp can +achieve up to 40% better performance per watt. (measured by a spin +counter summed over per CPU counting threads spawned for all running +CPUs). + +Usage and Interfaces +==================== +The powerclamp driver is registered to the generic thermal layer as a +cooling device. Currently, it’s not bound to any thermal zones:: + + jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . * + cur_state:0 + max_state:50 + type:intel_powerclamp + +cur_state allows user to set the desired idle percentage. Writing 0 to +cur_state will stop idle injection. Writing a value between 1 and +max_state will start the idle injection. Reading cur_state returns the +actual and current idle percentage. This may not be the same value +set by the user in that current idle percentage depends on workload +and includes natural idle. When idle injection is disabled, reading +cur_state returns value -1 instead of 0 which is to avoid confusing +100% busy state with the disabled state. + +Example usage: +- To inject 25% idle time:: + + $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state + +If the system is not busy and has more than 25% idle time already, +then the powerclamp driver will not start idle injection. Using Top +will not show idle injection kernel threads. + +If the system is busy (spin test below) and has less than 25% natural +idle time, powerclamp kernel threads will do idle injection. Forced +idle time is accounted as normal idle in that common code path is +taken as the idle task. + +In this example, 24.1% idle is shown. This helps the system admin or +user determine the cause of slowdown, when a powerclamp driver is in action:: + + + Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie + Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st + Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers + Swap: 4087804k total, 0k used, 4087804k free, 945336k cached + + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin + 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0 + 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3 + 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1 + 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2 + 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox + 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg + 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz + +Tests have shown that by using the powerclamp driver as a cooling +device, a PID based userspace thermal controller can manage to +control CPU temperature effectively, when no other thermal influence +is added. For example, a UltraBook user can compile the kernel under +certain temperature (below most active trip points). diff --git a/Documentation/driver-api/thermal/nouveau_thermal.rst b/Documentation/driver-api/thermal/nouveau_thermal.rst new file mode 100644 index 000000000000..37255fd6735d --- /dev/null +++ b/Documentation/driver-api/thermal/nouveau_thermal.rst @@ -0,0 +1,96 @@ +===================== +Kernel driver nouveau +===================== + +Supported chips: + +* NV43+ + +Authors: Martin Peres (mupuf) + +Description +----------- + +This driver allows to read the GPU core temperature, drive the GPU fan and +set temperature alarms. + +Currently, due to the absence of in-kernel API to access HWMON drivers, Nouveau +cannot access any of the i2c external monitoring chips it may find. If you +have one of those, temperature and/or fan management through Nouveau's HWMON +interface is likely not to work. This document may then not cover your situation +entirely. + +Temperature management +---------------------- + +Temperature is exposed under as a read-only HWMON attribute temp1_input. + +In order to protect the GPU from overheating, Nouveau supports 4 configurable +temperature thresholds: + + * Fan_boost: + Fan speed is set to 100% when reaching this temperature; + * Downclock: + The GPU will be downclocked to reduce its power dissipation; + * Critical: + The GPU is put on hold to further lower power dissipation; + * Shutdown: + Shut the computer down to protect your GPU. + +WARNING: + Some of these thresholds may not be used by Nouveau depending + on your chipset. + +The default value for these thresholds comes from the GPU's vbios. These +thresholds can be configured thanks to the following HWMON attributes: + + * Fan_boost: temp1_auto_point1_temp and temp1_auto_point1_temp_hyst; + * Downclock: temp1_max and temp1_max_hyst; + * Critical: temp1_crit and temp1_crit_hyst; + * Shutdown: temp1_emergency and temp1_emergency_hyst. + +NOTE: Remember that the values are stored as milli degrees Celsius. Don't forget +to multiply! + +Fan management +-------------- + +Not all cards have a drivable fan. If you do, then the following HWMON +attributes should be available: + + * pwm1_enable: + Current fan management mode (NONE, MANUAL or AUTO); + * pwm1: + Current PWM value (power percentage); + * pwm1_min: + The minimum PWM speed allowed; + * pwm1_max: + The maximum PWM speed allowed (bypassed when hitting Fan_boost); + +You may also have the following attribute: + + * fan1_input: + Speed in RPM of your fan. + +Your fan can be driven in different modes: + + * 0: The fan is left untouched; + * 1: The fan can be driven in manual (use pwm1 to change the speed); + * 2; The fan is driven automatically depending on the temperature. + +NOTE: + Be sure to use the manual mode if you want to drive the fan speed manually + +NOTE2: + When operating in manual mode outside the vbios-defined + [PWM_min, PWM_max] range, the reported fan speed (RPM) may not be accurate + depending on your hardware. + +Bug reports +----------- + +Thermal management on Nouveau is new and may not work on all cards. If you have +inquiries, please ping mupuf on IRC (#nouveau, freenode). + +Bug reports should be filled on Freedesktop's bug tracker. Please follow +http://nouveau.freedesktop.org/wiki/Bugs diff --git a/Documentation/driver-api/thermal/power_allocator.rst b/Documentation/driver-api/thermal/power_allocator.rst new file mode 100644 index 000000000000..67b6a3297238 --- /dev/null +++ b/Documentation/driver-api/thermal/power_allocator.rst @@ -0,0 +1,271 @@ +================================= +Power allocator governor tunables +================================= + +Trip points +----------- + +The governor works optimally with the following two passive trip points: + +1. "switch on" trip point: temperature above which the governor + control loop starts operating. This is the first passive trip + point of the thermal zone. + +2. "desired temperature" trip point: it should be higher than the + "switch on" trip point. This the target temperature the governor + is controlling for. This is the last passive trip point of the + thermal zone. + +PID Controller +-------------- + +The power allocator governor implements a +Proportional-Integral-Derivative controller (PID controller) with +temperature as the control input and power as the controlled output: + + P_max = k_p * e + k_i * err_integral + k_d * diff_err + sustainable_power + +where + - e = desired_temperature - current_temperature + - err_integral is the sum of previous errors + - diff_err = e - previous_error + +It is similar to the one depicted below:: + + k_d + | + current_temp | + | v + | +----------+ +---+ + | +----->| diff_err |-->| X |------+ + | | +----------+ +---+ | + | | | tdp actor + | | k_i | | get_requested_power() + | | | | | | | + | | | | | | | ... + v | v v v v v + +---+ | +-------+ +---+ +---+ +---+ +----------+ + | S |-----+----->| sum e |----->| X |--->| S |-->| S |-->|power | + +---+ | +-------+ +---+ +---+ +---+ |allocation| + ^ | ^ +----------+ + | | | | | + | | +---+ | | | + | +------->| X |-------------------+ v v + | +---+ granted performance + desired_temperature ^ + | + | + k_po/k_pu + +Sustainable power +----------------- + +An estimate of the sustainable dissipatable power (in mW) should be +provided while registering the thermal zone. This estimates the +sustained power that can be dissipated at the desired control +temperature. This is the maximum sustained power for allocation at +the desired maximum temperature. The actual sustained power can vary +for a number of reasons. The closed loop controller will take care of +variations such as environmental conditions, and some factors related +to the speed-grade of the silicon. `sustainable_power` is therefore +simply an estimate, and may be tuned to affect the aggressiveness of +the thermal ramp. For reference, the sustainable power of a 4" phone +is typically 2000mW, while on a 10" tablet is around 4500mW (may vary +depending on screen size). + +If you are using device tree, do add it as a property of the +thermal-zone. For example:: + + thermal-zones { + soc_thermal { + polling-delay = <1000>; + polling-delay-passive = <100>; + sustainable-power = <2500>; + ... + +Instead, if the thermal zone is registered from the platform code, pass a +`thermal_zone_params` that has a `sustainable_power`. If no +`thermal_zone_params` were being passed, then something like below +will suffice:: + + static const struct thermal_zone_params tz_params = { + .sustainable_power = 3500, + }; + +and then pass `tz_params` as the 5th parameter to +`thermal_zone_device_register()` + +k_po and k_pu +------------- + +The implementation of the PID controller in the power allocator +thermal governor allows the configuration of two proportional term +constants: `k_po` and `k_pu`. `k_po` is the proportional term +constant during temperature overshoot periods (current temperature is +above "desired temperature" trip point). Conversely, `k_pu` is the +proportional term constant during temperature undershoot periods +(current temperature below "desired temperature" trip point). + +These controls are intended as the primary mechanism for configuring +the permitted thermal "ramp" of the system. For instance, a lower +`k_pu` value will provide a slower ramp, at the cost of capping +available capacity at a low temperature. On the other hand, a high +value of `k_pu` will result in the governor granting very high power +while temperature is low, and may lead to temperature overshooting. + +The default value for `k_pu` is:: + + 2 * sustainable_power / (desired_temperature - switch_on_temp) + +This means that at `switch_on_temp` the output of the controller's +proportional term will be 2 * `sustainable_power`. The default value +for `k_po` is:: + + sustainable_power / (desired_temperature - switch_on_temp) + +Focusing on the proportional and feed forward values of the PID +controller equation we have:: + + P_max = k_p * e + sustainable_power + +The proportional term is proportional to the difference between the +desired temperature and the current one. When the current temperature +is the desired one, then the proportional component is zero and +`P_max` = `sustainable_power`. That is, the system should operate in +thermal equilibrium under constant load. `sustainable_power` is only +an estimate, which is the reason for closed-loop control such as this. + +Expanding `k_pu` we get:: + + P_max = 2 * sustainable_power * (T_set - T) / (T_set - T_on) + + sustainable_power + +where: + + - T_set is the desired temperature + - T is the current temperature + - T_on is the switch on temperature + +When the current temperature is the switch_on temperature, the above +formula becomes:: + + P_max = 2 * sustainable_power * (T_set - T_on) / (T_set - T_on) + + sustainable_power = 2 * sustainable_power + sustainable_power = + 3 * sustainable_power + +Therefore, the proportional term alone linearly decreases power from +3 * `sustainable_power` to `sustainable_power` as the temperature +rises from the switch on temperature to the desired temperature. + +k_i and integral_cutoff +----------------------- + +`k_i` configures the PID loop's integral term constant. This term +allows the PID controller to compensate for long term drift and for +the quantized nature of the output control: cooling devices can't set +the exact power that the governor requests. When the temperature +error is below `integral_cutoff`, errors are accumulated in the +integral term. This term is then multiplied by `k_i` and the result +added to the output of the controller. Typically `k_i` is set low (1 +or 2) and `integral_cutoff` is 0. + +k_d +--- + +`k_d` configures the PID loop's derivative term constant. It's +recommended to leave it as the default: 0. + +Cooling device power API +======================== + +Cooling devices controlled by this governor must supply the additional +"power" API in their `cooling_device_ops`. It consists on three ops: + +1. :: + + int get_requested_power(struct thermal_cooling_device *cdev, + struct thermal_zone_device *tz, u32 *power); + + +@cdev: + The `struct thermal_cooling_device` pointer +@tz: + thermal zone in which we are currently operating +@power: + pointer in which to store the calculated power + +`get_requested_power()` calculates the power requested by the device +in milliwatts and stores it in @power . It should return 0 on +success, -E* on failure. This is currently used by the power +allocator governor to calculate how much power to give to each cooling +device. + +2. :: + + int state2power(struct thermal_cooling_device *cdev, struct + thermal_zone_device *tz, unsigned long state, + u32 *power); + +@cdev: + The `struct thermal_cooling_device` pointer +@tz: + thermal zone in which we are currently operating +@state: + A cooling device state +@power: + pointer in which to store the equivalent power + +Convert cooling device state @state into power consumption in +milliwatts and store it in @power. It should return 0 on success, -E* +on failure. This is currently used by thermal core to calculate the +maximum power that an actor can consume. + +3. :: + + int power2state(struct thermal_cooling_device *cdev, u32 power, + unsigned long *state); + +@cdev: + The `struct thermal_cooling_device` pointer +@power: + power in milliwatts +@state: + pointer in which to store the resulting state + +Calculate a cooling device state that would make the device consume at +most @power mW and store it in @state. It should return 0 on success, +-E* on failure. This is currently used by the thermal core to convert +a given power set by the power allocator governor to a state that the +cooling device can set. It is a function because this conversion may +depend on external factors that may change so this function should the +best conversion given "current circumstances". + +Cooling device weights +---------------------- + +Weights are a mechanism to bias the allocation among cooling +devices. They express the relative power efficiency of different +cooling devices. Higher weight can be used to express higher power +efficiency. Weighting is relative such that if each cooling device +has a weight of one they are considered equal. This is particularly +useful in heterogeneous systems where two cooling devices may perform +the same kind of compute, but with different efficiency. For example, +a system with two different types of processors. + +If the thermal zone is registered using +`thermal_zone_device_register()` (i.e., platform code), then weights +are passed as part of the thermal zone's `thermal_bind_parameters`. +If the platform is registered using device tree, then they are passed +as the `contribution` property of each map in the `cooling-maps` node. + +Limitations of the power allocator governor +=========================================== + +The power allocator governor's PID controller works best if there is a +periodic tick. If you have a driver that calls +`thermal_zone_device_update()` (or anything that ends up calling the +governor's `throttle()` function) repetitively, the governor response +won't be very good. Note that this is not particular to this +governor, step-wise will also misbehave if you call its throttle() +faster than the normal thermal framework tick (due to interrupts for +example) as it will overreact. diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst new file mode 100644 index 000000000000..fab2c9b36d08 --- /dev/null +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -0,0 +1,798 @@ +=================================== +Generic Thermal Sysfs driver How To +=================================== + +Written by Sujith Thomas , Zhang Rui + +Updated: 2 January 2008 + +Copyright (c) 2008 Intel Corporation + + +0. Introduction +=============== + +The generic thermal sysfs provides a set of interfaces for thermal zone +devices (sensors) and thermal cooling devices (fan, processor...) to register +with the thermal management solution and to be a part of it. + +This how-to focuses on enabling new thermal zone and cooling devices to +participate in thermal management. +This solution is platform independent and any type of thermal zone devices +and cooling devices should be able to make use of the infrastructure. + +The main task of the thermal sysfs driver is to expose thermal zone attributes +as well as cooling device attributes to the user space. +An intelligent thermal management application can make decisions based on +inputs from thermal zone attributes (the current temperature and trip point +temperature) and throttle appropriate devices. + +- `[0-*]` denotes any positive number starting from 0 +- `[1-*]` denotes any positive number starting from 1 + +1. thermal sysfs driver interface functions +=========================================== + +1.1 thermal zone device interface +--------------------------------- + + :: + + struct thermal_zone_device + *thermal_zone_device_register(char *type, + int trips, int mask, void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay)) + + This interface function adds a new thermal zone device (sensor) to + /sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the + thermal cooling devices registered at the same time. + + type: + the thermal zone type. + trips: + the total number of trip points this thermal zone supports. + mask: + Bit string: If 'n'th bit is set, then trip point 'n' is writeable. + devdata: + device private data + ops: + thermal zone device call-backs. + + .bind: + bind the thermal zone device with a thermal cooling device. + .unbind: + unbind the thermal zone device with a thermal cooling device. + .get_temp: + get the current temperature of the thermal zone. + .set_trips: + set the trip points window. Whenever the current temperature + is updated, the trip points immediately below and above the + current temperature are found. + .get_mode: + get the current mode (enabled/disabled) of the thermal zone. + + - "enabled" means the kernel thermal management is + enabled. + - "disabled" will prevent kernel thermal driver action + upon trip points so that user applications can take + charge of thermal management. + .set_mode: + set the mode (enabled/disabled) of the thermal zone. + .get_trip_type: + get the type of certain trip point. + .get_trip_temp: + get the temperature above which the certain trip point + will be fired. + .set_emul_temp: + set the emulation temperature which helps in debugging + different threshold temperature points. + tzp: + thermal zone platform parameters. + passive_delay: + number of milliseconds to wait between polls when + performing passive cooling. + polling_delay: + number of milliseconds to wait between polls when checking + whether trip points have been crossed (0 for interrupt driven systems). + + :: + + void thermal_zone_device_unregister(struct thermal_zone_device *tz) + + This interface function removes the thermal zone device. + It deletes the corresponding entry from /sys/class/thermal folder and + unbinds all the thermal cooling devices it uses. + + :: + + struct thermal_zone_device + *thermal_zone_of_sensor_register(struct device *dev, int sensor_id, + void *data, + const struct thermal_zone_of_device_ops *ops) + + This interface adds a new sensor to a DT thermal zone. + This function will search the list of thermal zones described in + device tree and look for the zone that refer to the sensor device + pointed by dev->of_node as temperature providers. For the zone + pointing to the sensor node, the sensor will be added to the DT + thermal zone device. + + The parameters for this interface are: + + dev: + Device node of sensor containing valid node pointer in + dev->of_node. + sensor_id: + a sensor identifier, in case the sensor IP has more + than one sensors + data: + a private pointer (owned by the caller) that will be + passed back, when a temperature reading is needed. + ops: + `struct thermal_zone_of_device_ops *`. + + ============== ======================================= + get_temp a pointer to a function that reads the + sensor temperature. This is mandatory + callback provided by sensor driver. + set_trips a pointer to a function that sets a + temperature window. When this window is + left the driver must inform the thermal + core via thermal_zone_device_update. + get_trend a pointer to a function that reads the + sensor temperature trend. + set_emul_temp a pointer to a function that sets + sensor emulated temperature. + ============== ======================================= + + The thermal zone temperature is provided by the get_temp() function + pointer of thermal_zone_of_device_ops. When called, it will + have the private pointer @data back. + + It returns error pointer if fails otherwise valid thermal zone device + handle. Caller should check the return handle with IS_ERR() for finding + whether success or not. + + :: + + void thermal_zone_of_sensor_unregister(struct device *dev, + struct thermal_zone_device *tzd) + + This interface unregisters a sensor from a DT thermal zone which was + successfully added by interface thermal_zone_of_sensor_register(). + This function removes the sensor callbacks and private data from the + thermal zone device registered with thermal_zone_of_sensor_register() + interface. It will also silent the zone by remove the .get_temp() and + get_trend() thermal zone device callbacks. + + :: + + struct thermal_zone_device + *devm_thermal_zone_of_sensor_register(struct device *dev, + int sensor_id, + void *data, + const struct thermal_zone_of_device_ops *ops) + + This interface is resource managed version of + thermal_zone_of_sensor_register(). + + All details of thermal_zone_of_sensor_register() described in + section 1.1.3 is applicable here. + + The benefit of using this interface to register sensor is that it + is not require to explicitly call thermal_zone_of_sensor_unregister() + in error path or during driver unbinding as this is done by driver + resource manager. + + :: + + void devm_thermal_zone_of_sensor_unregister(struct device *dev, + struct thermal_zone_device *tzd) + + This interface is resource managed version of + thermal_zone_of_sensor_unregister(). + All details of thermal_zone_of_sensor_unregister() described in + section 1.1.4 is applicable here. + Normally this function will not need to be called and the resource + management code will ensure that the resource is freed. + + :: + + int thermal_zone_get_slope(struct thermal_zone_device *tz) + + This interface is used to read the slope attribute value + for the thermal zone device, which might be useful for platform + drivers for temperature calculations. + + :: + + int thermal_zone_get_offset(struct thermal_zone_device *tz) + + This interface is used to read the offset attribute value + for the thermal zone device, which might be useful for platform + drivers for temperature calculations. + +1.2 thermal cooling device interface +------------------------------------ + + + :: + + struct thermal_cooling_device + *thermal_cooling_device_register(char *name, + void *devdata, struct thermal_cooling_device_ops *) + + This interface function adds a new thermal cooling device (fan/processor/...) + to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself + to all the thermal zone devices registered at the same time. + + name: + the cooling device name. + devdata: + device private data. + ops: + thermal cooling devices call-backs. + + .get_max_state: + get the Maximum throttle state of the cooling device. + .get_cur_state: + get the Currently requested throttle state of the + cooling device. + .set_cur_state: + set the Current throttle state of the cooling device. + + :: + + void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) + + This interface function removes the thermal cooling device. + It deletes the corresponding entry from /sys/class/thermal folder and + unbinds itself from all the thermal zone devices using it. + +1.3 interface for binding a thermal zone device with a thermal cooling device +----------------------------------------------------------------------------- + + :: + + int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, + int trip, struct thermal_cooling_device *cdev, + unsigned long upper, unsigned long lower, unsigned int weight); + + This interface function binds a thermal cooling device to a particular trip + point of a thermal zone device. + + This function is usually called in the thermal zone device .bind callback. + + tz: + the thermal zone device + cdev: + thermal cooling device + trip: + indicates which trip point in this thermal zone the cooling device + is associated with. + upper: + the Maximum cooling state for this trip point. + THERMAL_NO_LIMIT means no upper limit, + and the cooling device can be in max_state. + lower: + the Minimum cooling state can be used for this trip point. + THERMAL_NO_LIMIT means no lower limit, + and the cooling device can be in cooling state 0. + weight: + the influence of this cooling device in this thermal + zone. See 1.4.1 below for more information. + + :: + + int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz, + int trip, struct thermal_cooling_device *cdev); + + This interface function unbinds a thermal cooling device from a particular + trip point of a thermal zone device. This function is usually called in + the thermal zone device .unbind callback. + + tz: + the thermal zone device + cdev: + thermal cooling device + trip: + indicates which trip point in this thermal zone the cooling device + is associated with. + +1.4 Thermal Zone Parameters +--------------------------- + + :: + + struct thermal_bind_params + + This structure defines the following parameters that are used to bind + a zone with a cooling device for a particular trip point. + + .cdev: + The cooling device pointer + .weight: + The 'influence' of a particular cooling device on this + zone. This is relative to the rest of the cooling + devices. For example, if all cooling devices have a + weight of 1, then they all contribute the same. You can + use percentages if you want, but it's not mandatory. A + weight of 0 means that this cooling device doesn't + contribute to the cooling of this zone unless all cooling + devices have a weight of 0. If all weights are 0, then + they all contribute the same. + .trip_mask: + This is a bit mask that gives the binding relation between + this thermal zone and cdev, for a particular trip point. + If nth bit is set, then the cdev and thermal zone are bound + for trip point n. + .binding_limits: + This is an array of cooling state limits. Must have + exactly 2 * thermal_zone.number_of_trip_points. It is an + array consisting of tuples of + state limits. Each trip will be associated with one state + limit tuple when binding. A NULL pointer means + on all trips. + These limits are used when binding a cdev to a trip point. + .match: + This call back returns success(0) if the 'tz and cdev' need to + be bound, as per platform data. + + :: + + struct thermal_zone_params + + This structure defines the platform level parameters for a thermal zone. + This data, for each thermal zone should come from the platform layer. + This is an optional feature where some platforms can choose not to + provide this data. + + .governor_name: + Name of the thermal governor used for this zone + .no_hwmon: + a boolean to indicate if the thermal to hwmon sysfs interface + is required. when no_hwmon == false, a hwmon sysfs interface + will be created. when no_hwmon == true, nothing will be done. + In case the thermal_zone_params is NULL, the hwmon interface + will be created (for backward compatibility). + .num_tbps: + Number of thermal_bind_params entries for this zone + .tbp: + thermal_bind_params entries + +2. sysfs attributes structure +============================= + +== ================ +RO read only value +WO write only value +RW read/write value +== ================ + +Thermal sysfs attributes will be represented under /sys/class/thermal. +Hwmon sysfs I/F extension is also available under /sys/class/hwmon +if hwmon is compiled in or built as a module. + +Thermal zone device sys I/F, created once it's registered:: + + /sys/class/thermal/thermal_zone[0-*]: + |---type: Type of the thermal zone + |---temp: Current temperature + |---mode: Working mode of the thermal zone + |---policy: Thermal governor used for this zone + |---available_policies: Available thermal governors for this zone + |---trip_point_[0-*]_temp: Trip point temperature + |---trip_point_[0-*]_type: Trip point type + |---trip_point_[0-*]_hyst: Hysteresis value for this trip point + |---emul_temp: Emulated temperature set node + |---sustainable_power: Sustainable dissipatable power + |---k_po: Proportional term during temperature overshoot + |---k_pu: Proportional term during temperature undershoot + |---k_i: PID's integral term in the power allocator gov + |---k_d: PID's derivative term in the power allocator + |---integral_cutoff: Offset above which errors are accumulated + |---slope: Slope constant applied as linear extrapolation + |---offset: Offset constant applied as linear extrapolation + +Thermal cooling device sys I/F, created once it's registered:: + + /sys/class/thermal/cooling_device[0-*]: + |---type: Type of the cooling device(processor/fan/...) + |---max_state: Maximum cooling state of the cooling device + |---cur_state: Current cooling state of the cooling device + |---stats: Directory containing cooling device's statistics + |---stats/reset: Writing any value resets the statistics + |---stats/time_in_state_ms: Time (msec) spent in various cooling states + |---stats/total_trans: Total number of times cooling state is changed + |---stats/trans_table: Cooing state transition table + + +Then next two dynamic attributes are created/removed in pairs. They represent +the relationship between a thermal zone and its associated cooling device. +They are created/removed for each successful execution of +thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device. + +:: + + /sys/class/thermal/thermal_zone[0-*]: + |---cdev[0-*]: [0-*]th cooling device in current thermal zone + |---cdev[0-*]_trip_point: Trip point that cdev[0-*] is associated with + |---cdev[0-*]_weight: Influence of the cooling device in + this thermal zone + +Besides the thermal zone device sysfs I/F and cooling device sysfs I/F, +the generic thermal driver also creates a hwmon sysfs I/F for each _type_ +of thermal zone device. E.g. the generic thermal driver registers one hwmon +class device and build the associated hwmon sysfs I/F for all the registered +ACPI thermal zones. + +:: + + /sys/class/hwmon/hwmon[0-*]: + |---name: The type of the thermal zone devices + |---temp[1-*]_input: The current temperature of thermal zone [1-*] + |---temp[1-*]_critical: The critical trip point of thermal zone [1-*] + +Please read Documentation/hwmon/sysfs-interface.rst for additional information. + +Thermal zone attributes +----------------------- + +type + Strings which represent the thermal zone type. + This is given by thermal zone driver as part of registration. + E.g: "acpitz" indicates it's an ACPI thermal device. + In order to keep it consistent with hwmon sys attribute; this should + be a short, lowercase string, not containing spaces nor dashes. + RO, Required + +temp + Current temperature as reported by thermal zone (sensor). + Unit: millidegree Celsius + RO, Required + +mode + One of the predefined values in [enabled, disabled]. + This file gives information about the algorithm that is currently + managing the thermal zone. It can be either default kernel based + algorithm or user space application. + + enabled + enable Kernel Thermal management. + disabled + Preventing kernel thermal zone driver actions upon + trip points so that user application can take full + charge of the thermal management. + + RW, Optional + +policy + One of the various thermal governors used for a particular zone. + + RW, Required + +available_policies + Available thermal governors which can be used for a particular zone. + + RO, Required + +`trip_point_[0-*]_temp` + The temperature above which trip point will be fired. + + Unit: millidegree Celsius + + RO, Optional + +`trip_point_[0-*]_type` + Strings which indicate the type of the trip point. + + E.g. it can be one of critical, hot, passive, `active[0-*]` for ACPI + thermal zone. + + RO, Optional + +`trip_point_[0-*]_hyst` + The hysteresis value for a trip point, represented as an integer + Unit: Celsius + RW, Optional + +`cdev[0-*]` + Sysfs link to the thermal cooling device node where the sys I/F + for cooling device throttling control represents. + + RO, Optional + +`cdev[0-*]_trip_point` + The trip point in this thermal zone which `cdev[0-*]` is associated + with; -1 means the cooling device is not associated with any trip + point. + + RO, Optional + +`cdev[0-*]_weight` + The influence of `cdev[0-*]` in this thermal zone. This value + is relative to the rest of cooling devices in the thermal + zone. For example, if a cooling device has a weight double + than that of other, it's twice as effective in cooling the + thermal zone. + + RW, Optional + +passive + Attribute is only present for zones in which the passive cooling + policy is not supported by native thermal driver. Default is zero + and can be set to a temperature (in millidegrees) to enable a + passive trip point for the zone. Activation is done by polling with + an interval of 1 second. + + Unit: millidegrees Celsius + + Valid values: 0 (disabled) or greater than 1000 + + RW, Optional + +emul_temp + Interface to set the emulated temperature method in thermal zone + (sensor). After setting this temperature, the thermal zone may pass + this temperature to platform emulation function if registered or + cache it locally. This is useful in debugging different temperature + threshold and its associated cooling action. This is write only node + and writing 0 on this node should disable emulation. + Unit: millidegree Celsius + + WO, Optional + + WARNING: + Be careful while enabling this option on production systems, + because userland can easily disable the thermal policy by simply + flooding this sysfs node with low temperature values. + +sustainable_power + An estimate of the sustained power that can be dissipated by + the thermal zone. Used by the power allocator governor. For + more information see Documentation/driver-api/thermal/power_allocator.rst + + Unit: milliwatts + + RW, Optional + +k_po + The proportional term of the power allocator governor's PID + controller during temperature overshoot. Temperature overshoot + is when the current temperature is above the "desired + temperature" trip point. For more information see + Documentation/driver-api/thermal/power_allocator.rst + + RW, Optional + +k_pu + The proportional term of the power allocator governor's PID + controller during temperature undershoot. Temperature undershoot + is when the current temperature is below the "desired + temperature" trip point. For more information see + Documentation/driver-api/thermal/power_allocator.rst + + RW, Optional + +k_i + The integral term of the power allocator governor's PID + controller. This term allows the PID controller to compensate + for long term drift. For more information see + Documentation/driver-api/thermal/power_allocator.rst + + RW, Optional + +k_d + The derivative term of the power allocator governor's PID + controller. For more information see + Documentation/driver-api/thermal/power_allocator.rst + + RW, Optional + +integral_cutoff + Temperature offset from the desired temperature trip point + above which the integral term of the power allocator + governor's PID controller starts accumulating errors. For + example, if integral_cutoff is 0, then the integral term only + accumulates error when temperature is above the desired + temperature trip point. For more information see + Documentation/driver-api/thermal/power_allocator.rst + + Unit: millidegree Celsius + + RW, Optional + +slope + The slope constant used in a linear extrapolation model + to determine a hotspot temperature based off the sensor's + raw readings. It is up to the device driver to determine + the usage of these values. + + RW, Optional + +offset + The offset constant used in a linear extrapolation model + to determine a hotspot temperature based off the sensor's + raw readings. It is up to the device driver to determine + the usage of these values. + + RW, Optional + +Cooling device attributes +------------------------- + +type + String which represents the type of device, e.g: + + - for generic ACPI: should be "Fan", "Processor" or "LCD" + - for memory controller device on intel_menlow platform: + should be "Memory controller". + + RO, Required + +max_state + The maximum permissible cooling state of this cooling device. + + RO, Required + +cur_state + The current cooling state of this cooling device. + The value can any integer numbers between 0 and max_state: + + - cur_state == 0 means no cooling + - cur_state == max_state means the maximum cooling. + + RW, Required + +stats/reset + Writing any value resets the cooling device's statistics. + WO, Required + +stats/time_in_state_ms: + The amount of time spent by the cooling device in various cooling + states. The output will have "