summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-22xfrm: add extack support to xfrm_init_replaySabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to __xfrm_init_stateSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to attach_*Sabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack support to xfrm_dev_state_addSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to verify_one_alg, verify_auth_trunc, verify_aeadSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to verify_replaySabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack support to verify_newsa_infoSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22Merge tag 'drm-intel-fixes-2022-09-21' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes 2 gem context related fixes: - to avoid a general protection failure when using perf/OA (Chris) - to avoid kernel warnings on driver release (Janusz) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Yyt1CV+YIjKQZZMB@intel.com
2022-09-21fscrypt: work on block_devices instead of request_queuesChristoph Hellwig
request_queues are a block layer implementation detail that should not leak into file systems. Change the fscrypt inline crypto code to retrieve block devices instead of request_queues from the file system. As part of that, clean up the interaction with multi-device file systems by returning both the number of devices and the actual device array in a single method call. Signed-off-by: Christoph Hellwig <hch@lst.de> [ebiggers: bug fixes and minor tweaks] Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-4-ebiggers@kernel.org
2022-09-21fscrypt: stop holding extra request_queue referencesEric Biggers
Now that the fscrypt_master_key lifetime has been reworked to not be subject to the quirks of the keyrings subsystem, blk_crypto_evict_key() no longer gets called after the filesystem has already been unmounted. Therefore, there is no longer any need to hold extra references to the filesystem's request_queue(s). (And these references didn't always do their intended job anyway, as pinning a request_queue doesn't necessarily pin the corresponding blk_crypto_profile.) Stop taking these extra references. Instead, just pass the super_block to fscrypt_destroy_inline_crypt_key(), and use it to get the list of block devices the key needs to be evicted from. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-3-ebiggers@kernel.org
2022-09-21fscrypt: stop using keyrings subsystem for fscrypt_master_keyEric Biggers
The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved: - When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is *not* guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free. - When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore. - More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped. - Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done. The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available. A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it. This patch does *not* change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch. I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support. Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org
2022-09-21ARM: decompressor: Include .data.rel.ro.localKees Cook
The .data.rel.ro.local section has the same semantics as .data.rel.ro here, so include it in the .rodata section of the decompressor. Additionally since the .printk_index section isn't usable outside of the core kernel, discard it in the decompressor. Avoids these warnings: arm-linux-gnueabi-ld: warning: orphan section `.data.rel.ro.local' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.data.rel.ro.local' arm-linux-gnueabi-ld: warning: orphan section `.printk_index' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.printk_index' Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/linux-mm/202209080545.qMIVj7YM-lkp@intel.com Cc: Russell King <linux@armlinux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Kees Cook <keescook@chromium.org>
2022-09-21Merge branch 'veristat: CSV output, comparison mode, filtering'Alexei Starovoitov
Andrii Nakryiko says: ==================== Add three more critical features to veristat tool, which make it sufficient for a practical work on BPF verifier: - CSV output, which allows easier programmatic post-processing of stats; - building upon CSV output, veristat now supports comparison mode, in which two previously captured CSV outputs from veristat are compared with each other in a convenient form; - flexible allow/deny filtering using globs for BPF object files and programs, allowing to narrow down target BPF programs to be verified. See individual patches for more details and examples. v1->v2: - split out double-free fix into patch #1 (Yonghong); - fixed typo in verbose flag (Quentin); - baseline and comparison stats were reversed in output table, fixed that. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21selftests/bpf: add ability to filter programs in veristatAndrii Nakryiko
Add -f (--filter) argument which accepts glob-based filters for narrowing down what BPF object files and programs within them should be processed by veristat. This filtering applies both to comparison and main (verification) mode. Filter can be of two forms: - file (object) filter: 'strobemeta*'; in this case all the programs within matching files are implicitly allowed (or denied, depending if it's positive or negative rule, see below); - file and prog filter: 'strobemeta*/*unroll*' will further filter programs within matching files to only allow those program names that match '*unroll*' glob. As mentioned, filters can be positive (allowlisting) and negative (denylisting). Negative filters should start with '!': '!strobemeta*' will deny any filename which basename starts with "strobemeta". Further, one extra special syntax is supported to allow more convenient use in practice. Instead of specifying rule on the command line, veristat allows to specify file that contains rules, both positive and negative, one line per one filter. This is achieved with -f @<filepath> use, where <filepath> points to a text file containing rules (negative and positive rules can be mixed). For convenience empty lines and lines starting with '#' are ignored. This feature is useful to have some pre-canned list of object files and program names that are tested repeatedly, allowing to check in a list of rules and quickly specify them on the command line. As a demonstration (and a short cut for nearest future), create a small list of "interesting" BPF object files from selftests/bpf and commit it as veristat.cfg. It currently includes 73 programs, most of which are the most complex and largest BPF programs in selftests, as judged by total verified instruction count and verifier states total. If there is overlap between positive or negative filters, negative filter takes precedence (denylisting is stronger than allowlisting). If no allow filter is specified, veristat implicitly assumes '*/*' rule. If no deny rule is specified, veristat (logically) assumes no negative filters. Also note that -f (just like -e and -s) can be specified multiple times and their effect is cumulative. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220921164254.3630690-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21selftests/bpf: add comparison mode to veristatAndrii Nakryiko
Add ability to compare and contrast two veristat runs, previously recorded with veristat using CSV output format. When veristat is called with -C (--compare) flag, veristat expects exactly two input files specified, both should be in CSV format. Expectation is that it's output from previous veristat runs, but as long as column names and formats match, it should just work. First CSV file is designated as a "baseline" provided, and the second one is comparison (experiment) data set. Establishing baseline matters later when calculating difference percentages, see below. Veristat parses these two CSV files and "reconstructs" verifier stats (it could be just a subset of all possible stats). File and program names are mandatory as they are used as joining key (these two "stats" are designated as "key stats" in the code). Veristat currently enforces that the set of stats recorded in both CSV has to exactly match, down to exact order. This is just a simplifying condition which can be lifted with a bit of additional pre-processing to reorded stat specs internally, which I didn't bother doing, yet. For all the non-key stats, veristat will output three columns: one for baseline data, one for comparison data, and one with an absolute and relative percentage difference. If either baseline or comparison values are missing (that is, respective CSV file doesn't have a row with *exactly* matching file and program name), those values are assumed to be empty or zero. In such case relative percentages are forced to +100% or -100% output, for consistency with a typical case. Veristat's -e (--emit) and -s (--sort) specs still apply, so even if CSV contains lots of stats, user can request to compare only a subset of them (and specify desired column order as well). Similarly, both CSV and human-readable table output is honored. Note that input is currently always expected to be CSV. Here's an example shell session, recording data for biosnoop tool on two different kernels and comparing them afterwards, outputting data in table format. # on slightly older production kernel $ sudo ./veristat biosnoop_bpf.o File Program Verdict Duration (us) Total insns Total states Peak states -------------- ------------------------ ------- ------------- ----------- ------------ ----------- biosnoop_bpf.o blk_account_io_merge_bio success 37 24 1 1 biosnoop_bpf.o blk_account_io_start failure 0 0 0 0 biosnoop_bpf.o block_rq_complete success 76 104 6 6 biosnoop_bpf.o block_rq_insert success 83 85 7 7 biosnoop_bpf.o block_rq_issue success 79 85 7 7 -------------- ------------------------ ------- ------------- ----------- ------------ ----------- Done. Processed 1 object files, 5 programs. $ sudo ./veristat ~/local/tmp/fbcode-bpf-objs/biosnoop_bpf.o -o csv > baseline.csv $ cat baseline.csv file_name,prog_name,verdict,duration,total_insns,total_states,peak_states biosnoop_bpf.o,blk_account_io_merge_bio,success,36,24,1,1 biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0 biosnoop_bpf.o,block_rq_complete,success,82,104,6,6 biosnoop_bpf.o,block_rq_insert,success,78,85,7,7 biosnoop_bpf.o,block_rq_issue,success,74,85,7,7 # on latest bpf-next kernel $ sudo ./veristat biosnoop_bpf.o File Program Verdict Duration (us) Total insns Total states Peak states -------------- ------------------------ ------- ------------- ----------- ------------ ----------- biosnoop_bpf.o blk_account_io_merge_bio success 31 24 1 1 biosnoop_bpf.o blk_account_io_start failure 0 0 0 0 biosnoop_bpf.o block_rq_complete success 76 104 6 6 biosnoop_bpf.o block_rq_insert success 83 91 7 7 biosnoop_bpf.o block_rq_issue success 74 91 7 7 -------------- ------------------------ ------- ------------- ----------- ------------ ----------- Done. Processed 1 object files, 5 programs. $ sudo ./veristat biosnoop_bpf.o -o csv > comparison.csv $ cat comparison.csv file_name,prog_name,verdict,duration,total_insns,total_states,peak_states biosnoop_bpf.o,blk_account_io_merge_bio,success,71,24,1,1 biosnoop_bpf.o,blk_account_io_start,failure,0,0,0,0 biosnoop_bpf.o,block_rq_complete,success,82,104,6,6 biosnoop_bpf.o,block_rq_insert,success,83,91,7,7 biosnoop_bpf.o,block_rq_issue,success,87,91,7,7 # now let's compare with human-readable output (note that no sudo needed) # we also ignore verification duration in this case to shortned output $ ./veristat -C baseline.csv comparison.csv -e file,prog,verdict,insns File Program Verdict (A) Verdict (B) Verdict (DIFF) Total insns (A) Total insns (B) Total insns (DIFF) -------------- ------------------------ ----------- ----------- -------------- --------------- --------------- ------------------ biosnoop_bpf.o blk_account_io_merge_bio success success MATCH 24 24 +0 (+0.00%) biosnoop_bpf.o blk_account_io_start failure failure MATCH 0 0 +0 (+100.00%) biosnoop_bpf.o block_rq_complete success success MATCH 104 104 +0 (+0.00%) biosnoop_bpf.o block_rq_insert success success MATCH 91 85 -6 (-6.59%) biosnoop_bpf.o block_rq_issue success success MATCH 91 85 -6 (-6.59%) -------------- ------------------------ ----------- ----------- -------------- --------------- --------------- ------------------ While not particularly exciting example (it turned out to be kind of hard to quickly find a nice example with significant difference just because of kernel version bump), it should demonstrate main features. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220921164254.3630690-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21selftests/bpf: add CSV output mode for veristatAndrii Nakryiko
Teach veristat to output results as CSV table for easier programmatic processing. Change what was --output/-o argument to now be --emit/-e. And then use --output-format/-o <fmt> to specify output format. Currently "table" and "csv" is supported, table being default. For CSV output mode veristat is using spec identifiers as column names. E.g., instead of "Total states" veristat uses "total_states" as a CSV header name. Internally veristat recognizes three formats, one of them (RESFMT_TABLE_CALCLEN) is a special format instructing veristat to calculate column widths for table output. This felt a bit cleaner and more uniform than either creating separate functions just for this. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220921164254.3630690-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21selftests/bpf: fix double bpf_object__close() in veristateAndrii Nakryiko
bpf_object__close(obj) is called twice for BPF object files with single BPF program in it. This causes crash. Fix this by not calling bpf_object__close() unnecessarily. Fixes: c8bc5e050976 ("selftests/bpf: Add veristat tool for mass-verifying BPF object files") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220921164254.3630690-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21Merge branch 'Introduce bpf_ct_set_nat_info kfunc helper'Alexei Starovoitov
Lorenzo Bianconi says: ==================== Introduce bpf_ct_set_nat_info kfunc helper in order to set source and destination nat addresses/ports in a new allocated ct entry not inserted in the connection tracking table yet. Introduce support for per-parameter trusted args. Changes since v2: - use int instead of a pointer for port in bpf_ct_set_nat_info signature - modify KF_TRUSTED_ARGS definition in order to referenced pointer constraint just for PTR_TO_BTF_ID - drop patch 2/4 Changes since v1: - enable CONFIG_NF_NAT in tools/testing/selftests/bpf/config Kumar Kartikeya Dwivedi (1): bpf: Tweak definition of KF_TRUSTED_ARGS ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21selftests/bpf: add tests for bpf_ct_set_nat_info kfuncLorenzo Bianconi
Introduce self-tests for bpf_ct_set_nat_info kfunc used to set the source or destination nat addresses/ports. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/803e33294e247744d466943105879414344d3235.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21net: netfilter: add bpf_ct_set_nat_info kfunc helperLorenzo Bianconi
Introduce bpf_ct_set_nat_info kfunc helper in order to set source and destination nat addresses/ports in a new allocated ct entry not inserted in the connection tracking table yet. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/9567db2fdfa5bebe7b7cc5870f7a34549418b4fc.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21bpf: Tweak definition of KF_TRUSTED_ARGSKumar Kartikeya Dwivedi
Instead of forcing all arguments to be referenced pointers with non-zero reg->ref_obj_id, tweak the definition of KF_TRUSTED_ARGS to mean that only PTR_TO_BTF_ID (and socket types translated to PTR_TO_BTF_ID) have that constraint, and require their offset to be set to 0. The rest of pointer types are also accomodated in this definition of trusted pointers, but with more relaxed rules regarding offsets. The inherent meaning of setting this flag is that all kfunc pointer arguments have a guranteed lifetime, and kernel object pointers (PTR_TO_BTF_ID, PTR_TO_CTX) are passed in their unmodified form (with offset 0). In general, this is not true for PTR_TO_BTF_ID as it can be obtained using pointer walks. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/cdede0043c47ed7a357f0a915d16f9ce06a1d589.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21ext4: use buckets for cr 1 block scan instead of rbtreeJan Kara
Using rbtree for sorting groups by average fragment size is relatively expensive (needs rbtree update on every block freeing or allocation) and leads to wide spreading of allocations because selection of block group is very sentitive both to changes in free space and amount of blocks allocated. Furthermore selecting group with the best matching average fragment size is not necessary anyway, even more so because the variability of fragment sizes within a group is likely large so average is not telling much. We just need a group with large enough average fragment size so that we have high probability of finding large enough free extent and we don't want average fragment size to be too big so that we are likely to find free extent only somewhat larger than what we need. So instead of maintaing rbtree of groups sorted by fragment size keep bins (lists) or groups where average fragment size is in the interval [2^i, 2^(i+1)). This structure requires less updates on block allocation / freeing, generally avoids chaotic spreading of allocations into block groups, and still is able to quickly (even faster that the rbtree) provide a block group which is likely to have a suitably sized free space extent. This patch reduces number of block groups used when untarring archive with medium sized files (size somewhat above 64k which is default mballoc limit for avoiding locality group preallocation) to about half and thus improves write speeds for eMMC flash significantly. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@kernel.org Reported-and-tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Link: https://lore.kernel.org/r/20220908092136.11770-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-21ext4: use locality group preallocation for small closed filesJan Kara
Curently we don't use any preallocation when a file is already closed when allocating blocks (from writeback code when converting delayed allocation). However for small files, using locality group preallocation is actually desirable as that is not specific to a particular file. Rather it is a method to pack small files together to reduce fragmentation and for that the fact the file is closed is actually even stronger hint the file would benefit from packing. So change the logic to allow locality group preallocation in this case. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@kernel.org Reported-and-tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Link: https://lore.kernel.org/r/20220908092136.11770-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-21ext4: make directory inode spreading reflect flexbg sizeJan Kara
Currently the Orlov inode allocator searches for free inodes for a directory only in flex block groups with at most inodes_per_group/16 more directory inodes than average per flex block group. However with growing size of flex block group this becomes unnecessarily strict. Scale allowed difference from average directory count per flex block group with flex block group size as we do with other metrics. Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Cc: stable@kernel.org Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220908092136.11770-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-21ext4: avoid unnecessary spreading of allocations among groupsJan Kara
mb_set_largest_free_order() updates lists containing groups with largest chunk of free space of given order. The way it updates it leads to always moving the group to the tail of the list. Thus allocations looking for free space of given order effectively end up cycling through all groups (and due to initialization in last to first order). This spreads allocations among block groups which reduces performance for rotating disks or low-end flash media. Change mb_set_largest_free_order() to only update lists if the order of the largest free chunk in the group changed. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@kernel.org Reported-and-tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Link: https://lore.kernel.org/r/20220908092136.11770-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-21ext4: make mballoc try target group first even with mb_optimize_scanJan Kara
One of the side-effects of mb_optimize_scan was that the optimized functions to select next group to try were called even before we tried the goal group. As a result we no longer allocate files close to corresponding inodes as well as we don't try to expand currently allocated extent in the same group. This results in reaim regression with workfile.disk workload of upto 8% with many clients on my test machine: baseline mb_optimize_scan Hmean disk-1 2114.16 ( 0.00%) 2099.37 ( -0.70%) Hmean disk-41 87794.43 ( 0.00%) 83787.47 * -4.56%* Hmean disk-81 148170.73 ( 0.00%) 135527.05 * -8.53%* Hmean disk-121 177506.11 ( 0.00%) 166284.93 * -6.32%* Hmean disk-161 220951.51 ( 0.00%) 207563.39 * -6.06%* Hmean disk-201 208722.74 ( 0.00%) 203235.59 ( -2.63%) Hmean disk-241 222051.60 ( 0.00%) 217705.51 ( -1.96%) Hmean disk-281 252244.17 ( 0.00%) 241132.72 * -4.41%* Hmean disk-321 255844.84 ( 0.00%) 245412.84 * -4.08%* Also this is causing huge regression (time increased by a factor of 5 or so) when untarring archive with lots of small files on some eMMC storage cards. Fix the problem by making sure we try goal group first. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@kernel.org Reported-and-tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/ Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220908092136.11770-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-21block/blk-rq-qos: delete useless enmu RQ_QOS_IOPRIOLi Jinlin
Since blk-ioprio handing was converted from a rqos policy to a direct call, RQ_QOS_IOPRIO is not used anymore, just delete it. Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220916023241.32926-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21block: aoe: use DEFINE_SHOW_ATTRIBUTE to simplify aoe_debugfsLiu Shixin
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Link: https://lore.kernel.org/r/20220915023424.3198940-1-liushixin2@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21block: move from strlcpy with unused retval to strscpyWolfram Sang
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Acked-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/20220818205958.6552-1-wsa+renesas@sang-engineering.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21block/drbd: remove useless comments in receive_DataReply()Gaosheng Cui
All implementations of req->collision, _req_may_be_done and drbd_fail_pending_reads have been removed, so remove the comments in receive_DataReply() that provide no useful information. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220920015216.782190-3-cuigaosheng1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21drbd: remove orphan _req_may_be_done() declarationGaosheng Cui
The _req_may_be_done() has been removed by commit 6870ca6d463e ("drbd: factor out master_bio completion and drbd_request destruction paths"), so remove the orphan declaration. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220920015216.782190-2-cuigaosheng1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter patches for net-next Remove GPL license copypastry in uapi files, those have SPDX tags. From Christophe Jaillet. Remove unused variable in rpfilter, from Guillaume Nault. Rework gc resched delay computation in conntrack, from Antoine Tenart. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: rpfilter: Remove unused variable 'ret'. headers: Remove some left-over license text in include/uapi/linux/netfilter/ netfilter: conntrack: revisit the gc initial rescheduling bias netfilter: conntrack: fix the gc rescheduling delay ==================== Link: https://lore.kernel.org/r/20220921095000.29569-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-20 (ice) Michal re-sets TC configuration when changing number of queues. Mateusz moves the check and call for link-down-on-close to the specific path for downing/closing the interface. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix interface being down after reset with link-down-on-close flag on ice: config netdev tc before setting queues number ==================== Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21io_uring: ensure local task_work marks task as runningJens Axboe
io_uring will run task_work from contexts that have been prepared for waiting, and in doing so it'll implicitly set the task running again to avoid issues with blocking conditions. The new deferred local task_work doesn't do that, which can result in spews on this being an invalid condition: 

[ 112.917576] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000ad64af64>] prepare_to_wait_exclusive+0x3f/0xd0 [ 112.983088] WARNING: CPU: 1 PID: 190 at kernel/sched/core.c:9819 __might_sleep+0x5a/0x60 [ 112.987240] Modules linked in: [ 112.990504] CPU: 1 PID: 190 Comm: io_uring Not tainted 6.0.0-rc6+ #1617 [ 113.053136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 113.133650] RIP: 0010:__might_sleep+0x5a/0x60 [ 113.136507] Code: ee 48 89 df 5b 31 d2 5d e9 33 ff ff ff 48 8b 90 30 0b 00 00 48 c7 c7 90 de 45 82 c6 05 20 8b 79 01 01 48 89 d1 e8 3a 49 77 00 <0f> 0b eb d1 66 90 0f 1f 44 00 00 9c 58 f6 c4 02 74 35 65 8b 05 ed [ 113.223940] RSP: 0018:ffffc90000537ca0 EFLAGS: 00010286 [ 113.232903] RAX: 0000000000000000 RBX: ffffffff8246782c RCX: ffffffff8270bcc8 IOPS=133.15K, BW=520MiB/s, IOS/call=32/31 [ 113.353457] RDX: ffffc90000537b50 RSI: 00000000ffffdfff RDI: 0000000000000001 [ 113.358970] RBP: 00000000000003bc R08: 0000000000000000 R09: c0000000ffffdfff [ 113.361746] R10: 0000000000000001 R11: ffffc90000537b48 R12: ffff888103f97280 [ 113.424038] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 [ 113.428009] FS: 00007f67ae7fc700(0000) GS:ffff88842fc80000(0000) knlGS:0000000000000000 [ 113.432794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 113.503186] CR2: 00007f67b8b9b3b0 CR3: 0000000102b9b005 CR4: 0000000000770ee0 [ 113.507291] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 113.512669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 113.574374] PKRU: 55555554 [ 113.576800] Call Trace: [ 113.578325] <TASK> [ 113.579799] set_page_dirty_lock+0x1b/0x90 [ 113.582411] __bio_release_pages+0x141/0x160 [ 113.673078] ? set_next_entity+0xd7/0x190 [ 113.675632] blk_rq_unmap_user+0xaa/0x210 [ 113.678398] ? timerqueue_del+0x2a/0x40 [ 113.679578] nvme_uring_task_cb+0x94/0xb0 [ 113.683025] __io_run_local_work+0x8a/0x150 [ 113.743724] ? io_cqring_wait+0x33d/0x500 [ 113.746091] io_run_local_work.part.76+0x2e/0x60 [ 113.750091] io_cqring_wait+0x2e7/0x500 [ 113.752395] ? trace_event_raw_event_io_uring_req_failed+0x180/0x180 [ 113.823533] __x64_sys_io_uring_enter+0x131/0x3c0 [ 113.827382] ? switch_fpu_return+0x49/0xc0 [ 113.830753] do_syscall_64+0x34/0x80 [ 113.832620] entry_SYSCALL_64_after_hwframe+0x5e/0xc8 Ensure that we mark current as TASK_RUNNING for deferred task_work as well. Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Reported-by: Stefan Roesch <shr@fb.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21ethernet: tundra: Drop forward declaration of static functionsUwe Kleine-König
Usually it's not necessary to declare static functions if the symbols are in the right order. Moving the definition of tsi_eth_driver down in the compilation unit allows to drop two such declarations. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20220919131515.885361-1-u.kleine-koenig@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21flow_dissector: Do not count vlan tags inside tunnel payloadQingqing Yang
We've met the problem that when there is a vlan tag inside GRE encapsulation, the match of num_of_vlans fails. It is caused by the vlan tag inside GRE payload has been counted into num_of_vlans, which is not expected. One example packet is like this: Ethernet II, Src: Broadcom_68:56:07 (00:10:18:68:56:07) Dst: Broadcom_68:56:08 (00:10:18:68:56:08) 802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100 Internet Protocol Version 4, Src: 192.168.1.4, Dst: 192.168.1.200 Generic Routing Encapsulation (Transparent Ethernet bridging) Ethernet II, Src: Broadcom_68:58:07 (00:10:18:68:58:07) Dst: Broadcom_68:58:08 (00:10:18:68:58:08) 802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200 ... It should match the (num_of_vlans 1) rule, but it matches the (num_of_vlans 2) rule. The vlan tags inside the GRE or other tunnel encapsulated payload should not be taken into num_of_vlans. The fix is to stop counting the vlan number when the encapsulation bit is set. Fixes: 34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector") Signed-off-by: Qingqing Yang <qingqing.yang@broadcom.com> Reviewed-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Link: https://lore.kernel.org/r/20220919074808.136640-1-qingqing.yang@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: sched: remove unused tcf_result extensionJamal Hadi Salim
Added by: commit e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") but no longer useful. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220919130627.3551233-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21Merge branch 'clean-up-ocelot_reset-routine'Jakub Kicinski
Colin Foster says: ==================== clean up ocelot_reset() routine ocelot_reset() will soon be exported to a common library to be used by the ocelot_ext system. This will make error values from regmap calls possible, so they must be checked. Additionally, readx_poll_timeout() can be substituted for the custom loop, as a simple cleanup. I don't have hardware to verify this set directly, but there shouldn't be any functional changes. ==================== Link: https://lore.kernel.org/r/20220917175127.161504-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: mscc: ocelot: check return values of writes during resetColin Foster
The ocelot_reset() function utilizes regmap_field_write() but wasn't checking return values. While this won't cause issues for the current MMIO regmaps, it could be an issue for externally controlled interfaces. Add checks for these return values. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: mscc: ocelot: utilize readx_poll_timeout() for chip resetColin Foster
Clean up the reset code by utilizing readx_poll_timeout instead of a custom loop. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21Merge branch 'net-ll_temac-cleanup-for-clearing-static-warnings'Jakub Kicinski
Haoyue Xu says: ==================== net: ll_temac: Cleanup for clearing static warnings Most static warnings are detected by Checkpatch.pl, mainly about: (1) #1: About the comments. (2) #2: About function name in a string. (3) #3: About the open parenthesis. (4) #4: About the else branch. (6) #6: About trailing statements. (7) #5,#7: About blank lines and spaces. ==================== Link: https://lore.kernel.org/r/20220917103843.526877-1-xuhaoyue1@hisilicon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: axienet: delete unnecessary blank lines and spaceshuangjunxian
Cleaning some static warnings of indent. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: move trailing statements to next linehuangjunxian
Cleaning some static warnings of trailing statements. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: fix the missing spaces around '='huangjunxian
Cleaning some static warnings of missing spaces around '='. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: delete unnecessary else branchhuangjunxian
Cleaning some static warnings of unnecessary else branch. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: axienet: align with open parenthesishuangjunxian
Cleaning some static warnings of open parenthesis. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: Cleanup for function name in a stringHaoyue Xu
As Checkpatch.pl warns, prefer using '"%s...", __func__' to using 'temac_device_reset', this function's name, in a string. Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ll_temac: fix the format of block commentshuangjunxian
Cleaning some static warnings of block comments. Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: macvtap: add __init/__exit annotations to module init/exit funcsXiu Jianfeng
Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220917083535.20040-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: hns3: add __init/__exit annotations to module init/exit funcsXiu Jianfeng
Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220917082118.7971-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>