summaryrefslogtreecommitdiff
path: root/scripts/show_delta
diff options
context:
space:
mode:
authorAlvin Šipraga <alsi@bang-olufsen.dk>2023-12-19 02:25:14 +0100
committerLinus Torvalds <torvalds@linux-foundation.org>2023-12-31 10:57:42 -0800
commit9c334eb9ce886247567573074b13c5ac29d1a41a (patch)
tree2528354c28d955a023dcc2a8f25fc2cde798b2e2 /scripts/show_delta
parent453f5db0619e2ad64076aab16ff5a00e0f7c53a2 (diff)
get_maintainer: correctly parse UTF-8 encoded names in files
While the script correctly extracts UTF-8 encoded names from the MAINTAINERS file, the regular expressions damage my name when parsing from .yaml files. Fix this by replacing the Latin-1-compatible regular expressions with the unicode property matcher \p{L}, which matches on any letter according to the Unicode General Category of letters. The proposed solution only works if the script uses proper string encoding from the outset, so instruct Perl to unconditionally open all files with UTF-8 encoding. This should be safe, as the entire source tree is either UTF-8 or ASCII encoded anyway. See [1] for a detailed analysis. Furthermore, to prevent the \w expression from matching non-ASCII when checking for whether a name should be escaped with quotes, add the /a flag to the regular expression. The escaping logic was duplicated in two places, so it has been factored out into its own function. The original issue was also identified on the tools mailing list [2]. This should solve the observed side effects there as well. Link: https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ [1] Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [2] Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'scripts/show_delta')
0 files changed, 0 insertions, 0 deletions