summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-27genksyms: fix syntax error for attribute before init-declaratorMasahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. For example, genksyms fails to parse the following valid code: int x, __attribute__((__section__(".init.data")))y; Here, only 'y' is annotated by the attribute, although I am not aware of actual uses of this pattern in the kernel tree. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. $ echo 'int x, __attribute__((__section__(".init.data")))y;' | scripts/genksyms/genksyms -w <stdin>:1: syntax error This commit allows attributes to be placed between a comma and init_declarator. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-27genksyms: fix syntax error for builtin (u)int*x*_t typesMasahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, genksyms fails to parse the following code in arch/arm64/lib/xor-neon.c: static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) { [ snip ] } The syntax error occurs because genksyms does not recognize the uint64x2_t keyword. This commit adds support for builtin types described in Arm Neon Intrinsics Reference. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-27genksyms: fix syntax error for attribute after 'union'Masahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, with CONFIG_MODVERSIONS=y on v6.13-rc1: $ make -s KCFLAGS=-D__GENKSYMS__ fs/lockd/svc.i $ cat fs/lockd/svc.i | scripts/genksyms/genksyms -w [ snip ] ./include/net/addrconf.h:35: syntax error The syntax error occurs in the following code in include/net/addrconf.h: union __packed { [ snip ] }; The issue arises from __packed, which is defined as __attribute__((__packed__)), immediately after the 'union' keyword. This commit allows the 'union' keyword to be followed by attributes. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix syntax error for attribute after 'struct'Masahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, with CONFIG_MODVERSIONS=y on v6.13-rc1: $ make -s KCFLAGS=-D__GENKSYMS__ arch/x86/kernel/cpu/mshyperv.i $ cat arch/x86/kernel/cpu/mshyperv.i | scripts/genksyms/genksyms -w [ snip ] ./arch/x86/include/asm/svm.h:122: syntax error The syntax error occurs in the following code in arch/x86/include/asm/svm.h: struct __attribute__ ((__packed__)) vmcb_control_area { [ snip ] }; The issue arises from __attribute__ immediately after the 'struct' keyword. This commit allows the 'struct' keyword to be followed by attributes. The lexer must be adjusted because dont_want_brace_phase should not be decremented while processing attributes. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix syntax error for attribute after abstact_declaratorMasahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, with CONFIG_MODVERSIONS=y on v6.13-rc1: $ make -s KCFLAGS=-D__GENKSYMS__ kernel/module/main.i $ cat kernel/module/main.i | scripts/genksyms/genksyms -w [ snip ] kernel/module/main.c:97: syntax error The syntax error occurs in the following code in kernel/module/main.c: static void __mod_update_bounds(enum mod_mem_type type __maybe_unused, void *base, unsigned int size, struct mod_tree_root *tree) { [ snip ] } The issue arises from __maybe_unused, which is defined as __attribute__((__unused__)). This commit allows direct_abstract_declarator to be followed with attributes. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix syntax error for attribute before nested_declaratorMasahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, with CONFIG_MODVERSIONS=y on v6.13-rc1: $ make -s KCFLAGS=-D__GENKSYMS__ drivers/acpi/prmt.i $ cat drivers/acpi/prmt.i | scripts/genksyms/genksyms -w [ snip ] drivers/acpi/prmt.c:56: syntax error The syntax error occurs in the following code in drivers/acpi/prmt.c: struct prm_handler_info { [ snip ] efi_status_t (__efiapi *handler_addr)(u64, void *); [ snip ] }; The issue arises from __efiapi, which is defined as either __attribute__((ms_abi)) or __attribute__((regparm(0))). This commit allows nested_declarator to be prefixed with attributes. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix syntax error for attribute before abstract_declaratorMasahiro Yamada
A longstanding issue with genksyms is that it has hidden syntax errors. When a syntax error occurs, yyerror() is called. However, error_with_pos() is a no-op unless the -w option is provided. You can observe syntax errors by manually passing the -w option. For example, with CONFIG_MODVERSIONS=y on v6.13-rc1: $ make -s KCFLAGS=-D__GENKSYMS__ init/main.i $ cat init/main.i | scripts/genksyms/genksyms -w [ snip ] ./include/linux/efi.h:1225: syntax error The syntax error occurs in the following code in include/linux/efi.h: efi_status_t efi_call_acpi_prm_handler(efi_status_t (__efiapi *handler_addr)(u64, void *), u64 param_buffer_addr, void *context); The issue arises from __efiapi, which is defined as either __attribute__((ms_abi)) or __attribute__((regparm(0))). This commit allows abstract_declarator to be prefixed with attributes. To avoid conflicts, I tweaked the rule for decl_specifier_seq. Due to this change, a standalone attribute cannot become decl_specifier_seq. Otherwise, I do not know how to resolve the conflicts. The following code, which was previously accepted by genksyms, will now result in a syntax error: void my_func(__attribute__((unused))x); I do not think it is a big deal because GCC also fails to parse it. $ echo 'void my_func(__attribute__((unused))x);' | gcc -c -x c - <stdin>:1:37: error: unknown type name 'x' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: decouple ATTRIBUTE_PHRASE from type-qualifierMasahiro Yamada
The __attribute__ keyword can appear in more contexts than 'const' or 'volatile'. To avoid grammatical conflicts with future changes, ATTRIBUTE_PHRASE should not be reduced into type_qualifier. No functional changes are intended. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: record attributes consistently for init-declaratorMasahiro Yamada
I believe the missing action here is a bug. For rules with no explicit action, the following default is used: { $$ = $1; } However, in this case, $1 is the value of attribute_opt itself. As a result, the value of attribute_opt is always NULL. The following test code demonstrates inconsistent behavior. int x __attribute__((__aligned__(4))); int y __attribute__((__aligned__(4))) = 0; The attribute is recorded only when followed by an initializer. This commit adds the correct action to propagate the value of the ATTRIBUTE_PHRASE token. With this change, the attribute in the example above is consistently recorded for both 'x' and 'y'. [Before] $ cat <<EOF | scripts/genksyms/genksyms -d int x __attribute__((__aligned__(4))); int y __attribute__((__aligned__(4))) = 0; EOF Defn for type0 x == <int x > Defn for type0 y == <int y __attribute__ ( ( __aligned__ ( 4 ) ) ) > Hash table occupancy 2/4096 = 0.000488281 [After] $ cat <<EOF | scripts/genksyms/genksyms -d int x __attribute__((__aligned__(4))); int y __attribute__((__aligned__(4))) = 0; EOF Defn for type0 x == <int x __attribute__ ( ( __aligned__ ( 4 ) ) ) > Defn for type0 y == <int y __attribute__ ( ( __aligned__ ( 4 ) ) ) > Hash table occupancy 2/4096 = 0.000488281 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: restrict direct-declarator to take one parameter-type-listMasahiro Yamada
Similar to the previous commit, this change makes the parser logic a little more accurate. Currently, genksyms accepts the following invalid code: struct foo { int (*callback)(int)(int)(int); }; A direct-declarator should not recursively absorb multiple ( parameter-type-list ) constructs. In the example above, (*callback) should be followed by at most one (int). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: restrict direct-abstract-declarator to take one parameter-type-listMasahiro Yamada
While there is no more grammatical ambiguity in genksyms, the parser logic is still inaccurate. For example, genksyms accepts the following invalid C code: void my_func(int ()(int)); This should result in a syntax error because () cannot be reduced to <direct-abstract-declarator>. ( <abstract-declarator> ) can be reduced, but <abstract-declarator> must not be empty in the following grammar from K&R [1]: <direct-abstract-declarator> ::= ( <abstract-declarator> ) | {<direct-abstract-declarator>}? [ {<constant-expression>}? ] | {<direct-abstract-declarator>}? ( {<parameter-type-list>}? ) Furthermore, genksyms accepts the following weird code: void my_func(int (*callback)(int)(int)(int)); The parser allows <direct-abstract-declarator> to recursively absorb multiple ( {<parameter-type-list>}? ), but this behavior is incorrect. In the example above, (*callback) should be followed by at most one (int). [1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: remove Makefile hackMasahiro Yamada
This workaround was introduced for suppressing the reduce/reduce conflict warnings because the %expect-rr directive, which is applicable only to GLR parsers, cannot be used for genksyms. Since there are no longer any conflicts, this Makefile hack is now unnecessary. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix last 3 shift/reduce conflictsMasahiro Yamada
The genksyms parser has ambiguities in its grammar, which are currently suppressed by a workaround in scripts/genksyms/Makefile. Building genksyms with W=1 generates the following warnings: YACC scripts/genksyms/parse.tab.[ch] scripts/genksyms/parse.y: warning: 3 shift/reduce conflicts [-Wconflicts-sr] scripts/genksyms/parse.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples The ambiguity arises when decl_specifier_seq is followed by '(' because the following two interpretations are possible: - decl_specifier_seq direct_abstract_declarator '(' parameter_declaration_clause ')' - decl_specifier_seq '(' abstract_declarator ')' This issue occurs because the current parser allows an empty string to be reduced to direct_abstract_declarator, which is incorrect. K&R [1] explains the correct grammar: <parameter-declaration> ::= {<declaration-specifier>}+ <declarator> | {<declaration-specifier>}+ <abstract-declarator> | {<declaration-specifier>}+ <abstract-declarator> ::= <pointer> | <pointer> <direct-abstract-declarator> | <direct-abstract-declarator> <direct-abstract-declarator> ::= ( <abstract-declarator> ) | {<direct-abstract-declarator>}? [ {<constant-expression>}? ] | {<direct-abstract-declarator>}? ( {<parameter-type-list>}? ) This commit resolves all remaining conflicts. We need to consider the difference between the following two examples: [Example 1] ( <abstract-declarator> ) can become <direct-abstract-declarator> void my_func(int (foo)); ... is equivalent to: void my_func(int foo); [Example 2] ( <parameter-type-list> ) can become <direct-abstract-declarator> typedef int foo; void my_func(int (foo)); ... is equivalent to: void my_func(int (*callback)(int)); Please note that the function declaration is identical in both examples, but the preceding typedef creates the distinction. I introduced a new term, open_paren, to enable the type lookup immediately after the '(' token. Without this, we cannot distinguish between [Example 1] and [Example 2]. [1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflictsMasahiro Yamada
The genksyms parser has ambiguities in its grammar, which are currently suppressed by a workaround in scripts/genksyms/Makefile. Building genksyms with W=1 generates the following warnings: YACC scripts/genksyms/parse.tab.[ch] scripts/genksyms/parse.y: warning: 9 shift/reduce conflicts [-Wconflicts-sr] scripts/genksyms/parse.y: warning: 5 reduce/reduce conflicts [-Wconflicts-rr] scripts/genksyms/parse.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples The comment in the parser describes the current problem: /* This wasn't really a typedef name but an identifier that shadows one. */ Consider the following simple C code: typedef int foo; void my_func(foo foo) {} In the function parameter list (foo foo), the first 'foo' is a type specifier (typedef'ed as 'int'), while the second 'foo' is an identifier. However, the lexer cannot distinguish between the two. Since 'foo' is already typedef'ed, the lexer returns TYPE for both instances, instead of returning IDENT for the second one. To support shadowed identifiers, TYPE can be reduced to either a simple_type_specifier or a direct_abstract_declarator, which creates a grammatical ambiguity. Without analyzing the grammar context, it is very difficult to resolve this correctly. This commit introduces a flag, dont_want_type_specifier, which allows the parser to inform the lexer whether an identifier is expected. When dont_want_type_specifier is true, the type lookup is suppressed, and the lexer returns IDENT regardless of any preceding typedef. After this commit, only 3 shift/reduce conflicts will remain. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: reduce type_qualifier directly to decl_specifierMasahiro Yamada
A type_qualifier (const, volatile, etc.) is not a type_specifier. According to K&R [1], a type-qualifier should be directly reduced to a declaration-specifier. <declaration-specifier> ::= <storage-class-specifier> | <type-specifier> | <type-qualifier> [1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: rename cvar_qualifier to type_qualifierMasahiro Yamada
I believe "cvar" stands for "Const, Volatile, Attribute, or Restrict". This is called "type-qualifier" in K&R. [1] Adopt this more generic naming. No functional changes are intended. [1] https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18genksyms: rename m_abstract_declarator to abstract_declaratorMasahiro Yamada
This is called "abstract-declarator" in K&R. [1] I am not sure what "m_" stands for, but the name is clear enough without it. No functional changes are intended. [1] https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nicolas Schier <n.schier@avm.de>
2025-01-18kbuild: Fix signing issue for external modulesTorsten Hilbrich
When running the sign script the kernel is within the source directory of external modules. This caused issues when the kernel uses relative paths, like: make[5]: Entering directory '/build/client/devel/kernel/work/linux-2.6' make[6]: Entering directory '/build/client/devel/addmodules/vtx/work/vtx' INSTALL /build/client/devel/addmodules/vtx/_/lib/modules/6.13.0-devel+/extra/vtx.ko SIGN /build/client/devel/addmodules/vtx/_/lib/modules/6.13.0-devel+/extra/vtx.ko /bin/sh: 1: scripts/sign-file: not found DEPMOD /build/client/devel/addmodules/vtx/_/lib/modules/6.13.0-devel+ Working around it by using absolute pathes here. Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=") Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-18ARC: migrate to the generic rule for built-in DTBMasahiro Yamada
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB to use the generic rule. To keep consistency across architectures, this commit also renames CONFIG_ARC_BUILTIN_DTB_NAME to CONFIG_BUILTIN_DTB_NAME. Now, "nsim_700" is the default value for CONFIG_BUILTIN_DTB_NAME, rather than a fallback in case it is empty. Acked-by: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-18rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONSSami Tolvanen
Previously, two things stopped Rust from using MODVERSIONS: 1. Rust symbols are occasionally too long to be represented in the original versions table 2. Rust types cannot be properly hashed by the existing genksyms approach because: * Looking up type definitions in Rust is more complex than C * Type layout is potentially dependent on the compiler in Rust, not just the source type declaration. CONFIG_EXTENDED_MODVERSIONS addresses the first point, and CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow it to do so by selecting both features. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Co-developed-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11Documentation/kbuild: Document storage of symbol informationMatthew Maurer
Document where exported and imported symbols are kept, format options, and limitations. Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11modpost: Allow extended modversions without basic MODVERSIONSMatthew Maurer
If you know that your kernel modules will only ever be loaded by a newer kernel, you can disable BASIC_MODVERSIONS to save space. This also allows easy creation of test modules to see how tooling will respond to modules that only have the new format. Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11modpost: Produce extended MODVERSIONS informationMatthew Maurer
Generate both the existing modversions format and the new extended one when running modpost. Presence of this metadata in the final .ko is guarded by CONFIG_EXTENDED_MODVERSIONS. We no longer generate an error on long symbols in modpost if CONFIG_EXTENDED_MODVERSIONS is set, as they can now be appropriately encoded in the extended section. These symbols will be skipped in the previous encoding. An error will still be generated if CONFIG_EXTENDED_MODVERSIONS is not set. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11modules: Support extended MODVERSIONS infoMatthew Maurer
Adds a new format for MODVERSIONS which stores each field in a separate ELF section. This initially adds support for variable length names, but could later be used to add additional fields to MODVERSIONS in a backwards compatible way if needed. Any new fields will be ignored by old user tooling, unlike the current format where user tooling cannot tolerate adjustments to the format (for example making the name field longer). Since PPC munges its version records to strip leading dots, we reproduce the munging for the new format. Other architectures do not appear to have architecture-specific usage of this information. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11Documentation/kbuild: Add DWARF module versioningSami Tolvanen
Add documentation for gendwarfksyms changes, and the kABI stability features that can be useful for distributions even though they're not used in mainline kernels. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11kbuild: Add gendwarfksyms as an alternative to genksymsSami Tolvanen
When MODVERSIONS is enabled, allow selecting gendwarfksyms as the implementation, but default to genksyms. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11export: Add __gendwarfksyms_ptr_ references to exported symbolsSami Tolvanen
With gendwarfksyms, we need each TU where the EXPORT_SYMBOL() macro is used to also contain DWARF type information for the symbols it exports. However, as a TU can also export external symbols and compilers may choose not to emit debugging information for symbols not defined in the current TU, the missing types will result in missing symbol versions. Stand-alone assembly code also doesn't contain type information for exported symbols, so we need to compile a temporary object file with asm-prototypes.h instead, and similarly need to ensure the DWARF in the temporary object file contains the necessary types. To always emit type information for external exports, add explicit __gendwarfksyms_ptr_<symbol> references to them in EXPORT_SYMBOL(). gendwarfksyms will use the type information for __gendwarfksyms_ptr_* if needed. Discard the pointers from the final binary to avoid further bloat. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add support for symbol type pointersSami Tolvanen
The compiler may choose not to emit type information in DWARF for external symbols. Clang, for example, does this for symbols not defined in the current TU. To provide a way to work around this issue, add support for __gendwarfksyms_ptr_<symbol> pointers that force the compiler to emit the necessary type information in DWARF also for the missing symbols. Example usage: #define GENDWARFKSYMS_PTR(sym) \ static typeof(sym) *__gendwarfksyms_ptr_##sym __used \ __section(".discard.gendwarfksyms") = &sym; extern int external_symbol(void); GENDWARFKSYMS_PTR(external_symbol); Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add support for reserved and ignored fieldsSami Tolvanen
Distributions that want to maintain a stable kABI need the ability to make ABI compatible changes to kernel data structures without affecting symbol versions, either because of LTS updates or backports. With genksyms, developers would typically hide these changes from version calculation with #ifndef __GENKSYMS__, which would result in the symbol version not changing even though the actual type has changed. When we process precompiled object files, this isn't an option. Change union processing to recognize field name prefixes that allow the user to ignore the union completely during symbol versioning with a __kabi_ignored prefix in a field name, or to replace the type of a placeholder field using a __kabi_reserved field name prefix. For example, assume we want to add a new field to an existing alignment hole in a data structure, and ignore the new field when calculating symbol versions: struct struct1 { int a; /* a 4-byte alignment hole */ unsigned long b; }; To add `int n` to the alignment hole, we can add a union that includes a __kabi_ignored field that causes gendwarfksyms to ignore the entire union: struct struct1 { int a; union { char __kabi_ignored_0; int n; }; unsigned long b; }; With --stable, both structs produce the same symbol version. Alternatively, when a distribution expects future modification to a data structure, they can explicitly add reserved fields: struct struct2 { long a; long __kabi_reserved_0; /* reserved for future use */ }; To take the field into use, we can again replace it with a union, with one of the fields keeping the __kabi_reserved name prefix to indicate the original type: struct struct2 { long a; union { long __kabi_reserved_0; struct { int b; int v; }; }; Here gendwarfksyms --stable replaces the union with the type of the placeholder field when calculating versions. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add support for kABI rulesSami Tolvanen
Distributions that want to maintain a stable kABI need the ability to make ABI compatible changes to kernel without affecting symbol versions, either because of LTS updates or backports. With genksyms, developers would typically hide these changes from version calculation with #ifndef __GENKSYMS__, which would result in the symbol version not changing even though the actual type has changed. When we process precompiled object files, this isn't an option. To support this use case, add a --stable command line flag that gates kABI stability features that are not needed in mainline kernels, but can be useful for distributions, and add support for kABI rules, which can be used to restrict gendwarfksyms output. The rules are specified as a set of null-terminated strings stored in the .discard.gendwarfksyms.kabi_rules section. Each rule consists of four strings as follows: "version\0type\0target\0value" The version string ensures the structure can be changed in a backwards compatible way. The type string indicates the type of the rule, and target and value strings contain rule-specific data. Initially support two simple rules: 1. Declaration-only types A type declaration can change into a full definition when additional includes are pulled in to the TU, which changes the versions of any symbol that references the type. Add support for defining declaration-only types whose definition is not expanded during versioning. 2. Ignored enumerators It's possible to add new enum fields without changing the ABI, but as the fields are included in symbol versioning, this would change the versions. Add support for ignoring specific fields. 3. Overridden enumerator values Add support for overriding enumerator values when calculating versions. This may be needed when the last field of the enum is used as a sentinel and new fields must be added before it. Add examples for using the rules under the examples/ directory. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add symbol versioningSami Tolvanen
Calculate symbol versions from the fully expanded type strings in type_map, and output the versions in a genksyms-compatible format. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add symtypes outputSami Tolvanen
Add support for producing genksyms-style symtypes files. Process die_map to find the longest expansions for each type, and use symtypes references in type definitions. The basic file format is similar to genksyms, with two notable exceptions: 1. Type names with spaces (common with Rust) in references are wrapped in single quotes. E.g.: s#'core::result::Result<u8, core::num::error::ParseIntError>' 2. The actual type definition is the simple parsed DWARF format we output with --dump-dies, not the preprocessed C-style format genksyms produces. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add die_map debuggingSami Tolvanen
Debugging the DWARF processing can be somewhat challenging, so add more detailed debugging output for die_map operations. Add the --dump-die-map flag, which adds color coded tags to the output for die_map changes. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Limit structure expansionSami Tolvanen
Expand each structure type only once per exported symbol. This is necessary to support self-referential structures, which would otherwise result in infinite recursion, and it's sufficient for catching ABI changes. Types defined in .c files are opaque to external users and thus cannot affect the ABI. Consider type definitions in .c files to be declarations to prevent opaque types from changing symbol versions. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Expand structure typesSami Tolvanen
Recursively expand DWARF structure types, i.e. structs, unions, and enums. Also include relevant DWARF attributes in type strings to encode structure layout, for example. Example output with --dump-dies: subprogram ( formal_parameter structure_type &str { member pointer_type { base_type u8 byte_size(1) encoding(7) } data_ptr data_member_location(0) , member base_type usize byte_size(8) encoding(7) length data_member_location(8) } byte_size(16) alignment(8) msg ) -> base_type void Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Expand array_typeSami Tolvanen
Add support for expanding DW_TAG_array_type, and the subrange type indicating array size. Example source code: const char *s[34]; Output with --dump-dies: variable array_type[34] { pointer_type { const_type { base_type char byte_size(1) encoding(6) } } byte_size(8) } Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Expand subroutine_typeSami Tolvanen
Add support for expanding DW_TAG_subroutine_type and the parameters in DW_TAG_formal_parameter. Use this to also expand subprograms. Example output with --dump-dies: subprogram ( formal_parameter pointer_type { const_type { base_type char byte_size(1) encoding(6) } } ) -> base_type unsigned long byte_size(8) encoding(7) Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Expand type modifiers and typedefsSami Tolvanen
Add support for expanding DWARF type modifiers, such as pointers, const values etc., and typedefs. These types all have DW_AT_type attribute pointing to the underlying type, and thus produce similar output. Also add linebreaks and indentation to debugging output to make it more readable. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add a cache for processed DIEsSami Tolvanen
Basic types in DWARF repeat frequently and traversing the DIEs using libdw is relatively slow. Add a simple hashtable based cache for the processed DIEs. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Expand base_typeSami Tolvanen
Start making gendwarfksyms more useful by adding support for expanding DW_TAG_base_type types and basic DWARF attributes. Example: $ echo loops_per_jiffy | \ scripts/gendwarfksyms/gendwarfksyms \ --debug --dump-dies vmlinux.o ... gendwarfksyms: process_symbol: loops_per_jiffy variable base_type unsigned long byte_size(8) encoding(7) ... Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11gendwarfksyms: Add address matchingSami Tolvanen
The compiler may choose not to emit type information in DWARF for all aliases, but it's possible for each alias to be exported separately. To ensure we find type information for the aliases as well, read {section, address} tuples from the symbol table and match symbols also by address. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-11tools: Add gendwarfksymsSami Tolvanen
Add a basic DWARF parser, which uses libdw to traverse the debugging information in an object file and looks for functions and variables. In follow-up patches, this will be expanded to produce symbol versions for CONFIG_MODVERSIONS from DWARF. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: use uint32_t instead of unsigned long for calculating CRCMasahiro Yamada
Currently, 'unsigned long' is used for intermediate variables when calculating CRCs. The size of 'long' differs depending on the architecture: it is 32 bits on 32-bit architectures and 64 bits on 64-bit architectures. The CRC values generated by genksyms represent the compatibility of exported symbols. Therefore, reproducibility is important. In other words, we need to ensure that the output is the same when the kernel source is identical, regardless of whether genksyms is running on a 32-bit or 64-bit build machine. Fortunately, the output from genksyms is not affected by the build machine's architecture because only the lower 32 bits of the 'unsigned long' variables are used. To make it even clearer that the CRC calculation is independent of the build machine's architecture, this commit explicitly uses the fixed-width type, uint32_t. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: use generic macros for hash table implementationMasahiro Yamada
Use macros provided by hashtable.h Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: refactor the return points in the for-loop in __add_symbol()Masahiro Yamada
free_list() must be called before returning from this for-loop. Swap 'break' and the combination of free_list() and 'return'. This reduces the code and minimizes the risk of introducing memory leaks in future changes. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: reduce the indentation in the for-loop in __add_symbol()Masahiro Yamada
To improve readability, reduce the indentation as follows: - Use 'continue' earlier when the symbol does not match - flip !sym->is_declared to flatten the if-else chain No functional changes are intended. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: fix memory leak when the same symbol is read from *.symref fileMasahiro Yamada
When a symbol that is already registered is read again from *.symref file, __add_symbol() removes the previous one from the hash table without freeing it. [Test Case] $ cat foo.c #include <linux/export.h> void foo(void); void foo(void) {} EXPORT_SYMBOL(foo); $ cat foo.symref foo void foo ( void ) foo void foo ( void ) When a symbol is removed from the hash table, it must be freed along with its ->name and ->defn members. However, sym->name cannot be freed because it is sometimes shared with node->string, but not always. If sym->name and node->string share the same memory, free(sym->name) could lead to a double-free bug. To resolve this issue, always assign a strdup'ed string to sym->name. Fixes: 64e6c1e12372 ("genksyms: track symbol checksum changes") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10genksyms: fix memory leak when the same symbol is added from sourceMasahiro Yamada
When a symbol that is already registered is added again, __add_symbol() returns without freeing the symbol definition, making it unreachable. The following test cases demonstrate different memory leak points. [Test Case 1] Forward declaration with exactly the same definition $ cat foo.c #include <linux/export.h> void foo(void); void foo(void) {} EXPORT_SYMBOL(foo); [Test Case 2] Forward declaration with a different definition (e.g. attribute) $ cat foo.c #include <linux/export.h> void foo(void); __attribute__((__section__(".ref.text"))) void foo(void) {} EXPORT_SYMBOL(foo); [Test Case 3] Preserving an overridden symbol (compile with KBUILD_PRESERVE=1) $ cat foo.c #include <linux/export.h> void foo(void); void foo(void) { } EXPORT_SYMBOL(foo); $ cat foo.symref override foo void foo ( int ) The memory leaks in Test Case 1 and 2 have existed since the introduction of genksyms into the kernel tree. [1] The memory leak in Test Case 3 was introduced by commit 5dae9a550a74 ("genksyms: allow to ignore symbol checksum changes"). When multiple init_declarators are reduced to an init_declarator_list, the decl_spec must be duplicated. Otherwise, the following Test Case 4 would result in a double-free bug. [Test Case 4] $ cat foo.c #include <linux/export.h> extern int foo, bar; int foo, bar; EXPORT_SYMBOL(foo); In this case, 'foo' and 'bar' share the same decl_spec, 'int'. It must be unshared before being passed to add_symbol(). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=46bd1da672d66ccd8a639d3c1f8a166048cca608 Fixes: 5dae9a550a74 ("genksyms: allow to ignore symbol checksum changes") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10modpost: zero-pad CRC values in modversion_info arrayMasahiro Yamada
I do not think the '#' flag is useful here because adding the explicit '0x' is clearer. Add the '0' flag to zero-pad the CRC values. This change gives better alignment in the generated *.mod.c files. There is no impact to the compiled modules. [Before] $ grep -A5 modversion_info fs/efivarfs/efivarfs.mod.c static const struct modversion_info ____versions[] __used __section("__versions") = { { 0x907d14d, "blocking_notifier_chain_register" }, { 0x53d3b64, "simple_inode_init_ts" }, { 0x65487097, "__x86_indirect_thunk_rax" }, { 0x122c3a7e, "_printk" }, [After] $ grep -A5 modversion_info fs/efivarfs/efivarfs.mod.c static const struct modversion_info ____versions[] __used __section("__versions") = { { 0x0907d14d, "blocking_notifier_chain_register" }, { 0x053d3b64, "simple_inode_init_ts" }, { 0x65487097, "__x86_indirect_thunk_rax" }, { 0x122c3a7e, "_printk" }, Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-10module: get symbol CRC back to unsignedMasahiro Yamada
Commit 71810db27c1c ("modversions: treat symbol CRCs as 32 bit quantities") changed the CRC fields to s32 because the __kcrctab and __kcrctab_gpl sections contained relative references to the actual CRC values stored in the .rodata section when CONFIG_MODULE_REL_CRCS=y. Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") removed this complexity. Now, the __kcrctab and __kcrctab_gpl sections directly contain the CRC values in all cases. The genksyms tool outputs unsigned 32-bit CRC values, so u32 is preferred over s32. No functional changes are intended. Regardless of this change, the CRC value is assigned to the u32 variable 'crcval' before the comparison, as seen in kernel/module/version.c: crcval = *crc; It was previously mandatory (but now optional) in order to avoid sign extension because the following line previously compared 'unsigned long' and 's32': if (versions[i].crc == crcval) return 1; versions[i].crc is still 'unsigned long' for backward compatibility. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>