summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2023-03-10bpf: btf: Remove unused btf_field_info_type enumDave Marchevsky
This enum was added and used in commit aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record"). Later refactoring in commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") resulted in the enum values no longer being used anywhere. Let's remove them. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: verifier: Rename kernel_type_name helper to btf_type_nameDave Marchevsky
kernel_type_name was introduced in commit 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF") with type signature: const char *kernel_type_name(u32 id) At that time the function used global btf_vmlinux BTF for all id lookups. Later, in commit 22dc4a0f5ed1 ("bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier"), the type signature was changed to: static const char *kernel_type_name(const struct btf* btf, u32 id) With the btf parameter used for lookups instead of global btf_vmlinux. The helper will function as expected for type name lookup using non-kernel BTFs, and will be used for such in further patches in the series. Let's rename it to avoid incorrect assumptions that might arise when seeing the current name. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Add bpf_local_storage_free()Martin KaFai Lau
This patch refactors local_storage freeing logic into bpf_local_storage_free(). It is a preparation work for a later patch that uses bpf_mem_cache_alloc/free. The other kfree(local_storage) cases are also changed to bpf_local_storage_free(..., reuse_now = true). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-12-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Add bpf_local_storage_rcu callbackMartin KaFai Lau
The existing bpf_local_storage_free_rcu is renamed to bpf_local_storage_free_trace_rcu. A new bpf_local_storage_rcu callback is added to do the kfree instead of using kfree_rcu. It is a preparation work for a later patch using bpf_mem_cache_alloc/free. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-11-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Add bpf_selem_free()Martin KaFai Lau
This patch refactors the selem freeing logic into bpf_selem_free(). It is a preparation work for a later patch using bpf_mem_cache_alloc/free. The other kfree(selem) cases are also changed to bpf_selem_free(..., reuse_now = true). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-10-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Add bpf_selem_free_rcu callbackMartin KaFai Lau
Add bpf_selem_free_rcu() callback to do the kfree() instead of using kfree_rcu. It is a preparation work for using bpf_mem_cache_alloc/free in a later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Remove bpf_selem_free_fields*_rcuMartin KaFai Lau
This patch removes the bpf_selem_free_fields*_rcu. The bpf_obj_free_fields() can be done before the call_rcu_trasks_trace() and kfree_rcu(). It is needed when a later patch uses bpf_mem_cache_alloc/free. In bpf hashtab, bpf_obj_free_fields() is also called before calling bpf_mem_cache_free. The discussion can be found in https://lore.kernel.org/bpf/f67021ee-21d9-bfae-6134-4ca542fab843@linux.dev/ Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-8-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storageMartin KaFai Lau
This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to reuse_now. Other than the boolean test is reversed, it should be a no-op. The following explains the reason for the rename and how it will be used in a later patch. In a later patch, bpf_mem_cache_alloc/free will be used in the bpf_local_storage. The bpf mem allocator will reuse the freed memory immediately. Some of the free paths in bpf_local_storage does not support memory to be reused immediately. These paths are the "delete" elem cases from the bpf_*_storage_delete() helper and the map_delete_elem() syscall. Note that "delete" elem before the owner's (sk/task/cgrp/inode) lifetime ended is not the common usage for the local storage. The common free path, bpf_local_storage_destroy(), can reuse the memory immediately. This common path means the storage stays with its owner until the owner is destroyed. The above mentioned "delete" elem paths that cannot reuse immediately always has the 'use_trace_rcu == true'. The cases that is safe for immediate reuse always have 'use_trace_rcu == false'. Instead of adding another arg in a later patch, this patch re-purpose this arg to reuse_now and have the test logic reversed. In a later patch, 'reuse_now == true' will free to the bpf_mem_cache_free() where the memory can be reused immediately. 'reuse_now == false' will go through the call_rcu_tasks_trace(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Remember smap in bpf_local_storageMartin KaFai Lau
This patch remembers which smap triggers the allocation of a 'struct bpf_local_storage' object. The local_storage is allocated during the very first selem added to the owner. The smap pointer is needed when using the bpf_mem_cache_free in a later patch because it needs to free to the correct smap's bpf_mem_alloc object. When a selem is being removed, it needs to check if it is the selem that triggers the creation of the local_storage. If it is, the local_storage->smap pointer will be reset to NULL. This NULL reset is done under the local_storage->lock in bpf_selem_unlink_storage_nolock() when a selem is being removed. Also note that the local_storage may not go away even local_storage->smap is NULL because there may be other selem still stored in the local_storage. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Remove the preceding __ from __bpf_selem_unlink_storageMartin KaFai Lau
__bpf_selem_unlink_storage is taking the spin lock and there is no name collision also. Having the preceding '__' is confusing when reviewing the later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Remove __bpf_local_storage_map_allocMartin KaFai Lau
bpf_local_storage_map_alloc() is the only caller of __bpf_local_storage_map_alloc(). The remaining logic in bpf_local_storage_map_alloc() is only a one liner setting the smap->cache_idx. Remove __bpf_local_storage_map_alloc() to simplify code. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Refactor codes into bpf_local_storage_destroyMartin KaFai Lau
This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's owner (sk/task/cgrp/inode) is being kfree(). All bpf_local_storage_destroy's caller is taking the spin lock and then free the storage. This patch also moves these two steps into the bpf_local_storage_destroy. This is a preparation work for a later patch that uses bpf_mem_cache_alloc/free in the bpf_local_storage. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: Move a few bpf_local_storage functions to static scopeMartin KaFai Lau
This patch moves the bpf_local_storage_free_rcu() and bpf_selem_unlink_map() to static because they are not used outside of bpf_local_storage.c. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: take into account liveness when propagating precisionAndrii Nakryiko
When doing state comparison, if old state has register that is not marked as REG_LIVE_READ, then we just skip comparison, regardless what's the state of corresponing register in current state. This is because not REG_LIVE_READ register is irrelevant for further program execution and correctness. All good here. But when we get to precision propagation, after two states were declared equivalent, we don't take into account old register's liveness, and thus attempt to propagate precision for register in current state even if that register in old state was not REG_LIVE_READ anymore. This is bad, because register in current state could be anything at all and this could cause -EFAULT due to internal logic bugs. Fix by taking into account REG_LIVE_READ liveness mark to keep the logic in state comparison in sync with precision propagation. Fixes: a3ce685dd01a ("bpf: fix precision tracking") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309224131.57449-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf: ensure state checkpointing at iter_next() call sitesAndrii Nakryiko
State equivalence check and checkpointing performed in is_state_visited() employs certain heuristics to try to save memory by avoiding state checkpoints if not enough jumps and instructions happened since last checkpoint. This leads to unpredictability of whether a particular instruction will be checkpointed and how regularly. While normally this is not causing much problems (except inconveniences for predictable verifier tests, which we overcome with BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded iterators. Checking and saving state checkpoints at iter_next() call is crucial for fast convergence of open-coded iterator loop logic, so we need to force it. If we don't do that, is_state_visited() might skip saving a checkpoint, causing unnecessarily long sequence of not checkpointed instructions and jumps, leading to exhaustion of jump history buffer, and potentially other undesired outcomes. It is expected that with correct open-coded iterators convergence will happen quickly, so we don't run a risk of exhausting memory. This patch adds, in addition to prune and jump instruction marks, also a "forced checkpoint" mark, and makes sure that any iter_next() call instruction is marked as such. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230310060149.625887-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09ftrace: Fix invalid address access in lookup_rec() when index is 0Chen Zhongjin
KASAN reported follow problem: BUG: KASAN: use-after-free in lookup_rec Read of size 8 at addr ffff000199270ff0 by task modprobe CPU: 2 Comm: modprobe Call trace: kasan_report __asan_load8 lookup_rec ftrace_location arch_check_ftrace_location check_kprobe_address_safe register_kprobe When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a pg which is newly added to ftrace_pages_start in ftrace_process_locs(). Before the first pg->index++, index is 0 and accessing pg->records[-1].ip will cause this problem. Don't check the ip when pg->index is 0. Link: https://lore.kernel.org/linux-trace-kernel/20230309080230.36064-1-chenzhongjin@huawei.com Cc: stable@vger.kernel.org Fixes: 9644302e3315 ("ftrace: Speed up search by skipping pages by address") Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09tracing: Check field value in hist_field_name()Steven Rostedt (Google)
The function hist_field_name() cannot handle being passed a NULL field parameter. It should never be NULL, but due to a previous bug, NULL was passed to the function and the kernel crashed due to a NULL dereference. Mark Rutland reported this to me on IRC. The bug was fixed, but to prevent future bugs from crashing the kernel, check the field and add a WARN_ON() if it is NULL. Link: https://lkml.kernel.org/r/20230302020810.762384440@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers") Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09tracing: Do not let histogram values have some modifiersSteven Rostedt (Google)
Histogram values can not be strings, stacktraces, graphs, symbols, syscalls, or grouped in buckets or log. Give an error if a value is set to do so. Note, the histogram code was not prepared to handle these modifiers for histograms and caused a bug. Mark Rutland reported: # echo 'p:copy_to_user __arch_copy_to_user n=$arg2' >> /sys/kernel/tracing/kprobe_events # echo 'hist:keys=n:vals=hitcount.buckets=8:sort=hitcount' > /sys/kernel/tracing/events/kprobes/copy_to_user/trigger # cat /sys/kernel/tracing/events/kprobes/copy_to_user/hist [ 143.694628] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 143.695190] Mem abort info: [ 143.695362] ESR = 0x0000000096000004 [ 143.695604] EC = 0x25: DABT (current EL), IL = 32 bits [ 143.695889] SET = 0, FnV = 0 [ 143.696077] EA = 0, S1PTW = 0 [ 143.696302] FSC = 0x04: level 0 translation fault [ 143.702381] Data abort info: [ 143.702614] ISV = 0, ISS = 0x00000004 [ 143.702832] CM = 0, WnR = 0 [ 143.703087] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000448f9000 [ 143.703407] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 143.704137] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 143.704714] Modules linked in: [ 143.705273] CPU: 0 PID: 133 Comm: cat Not tainted 6.2.0-00003-g6fc512c10a7c #3 [ 143.706138] Hardware name: linux,dummy-virt (DT) [ 143.706723] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 143.707120] pc : hist_field_name.part.0+0x14/0x140 [ 143.707504] lr : hist_field_name.part.0+0x104/0x140 [ 143.707774] sp : ffff800008333a30 [ 143.707952] x29: ffff800008333a30 x28: 0000000000000001 x27: 0000000000400cc0 [ 143.708429] x26: ffffd7a653b20260 x25: 0000000000000000 x24: ffff10d303ee5800 [ 143.708776] x23: ffffd7a6539b27b0 x22: ffff10d303fb8c00 x21: 0000000000000001 [ 143.709127] x20: ffff10d303ec2000 x19: 0000000000000000 x18: 0000000000000000 [ 143.709478] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 143.709824] x14: 0000000000000000 x13: 203a6f666e692072 x12: 6567676972742023 [ 143.710179] x11: 0a230a6d6172676f x10: 000000000000002c x9 : ffffd7a6521e018c [ 143.710584] x8 : 000000000000002c x7 : 7f7f7f7f7f7f7f7f x6 : 000000000000002c [ 143.710915] x5 : ffff10d303b0103e x4 : ffffd7a653b20261 x3 : 000000000000003d [ 143.711239] x2 : 0000000000020001 x1 : 0000000000000001 x0 : 0000000000000000 [ 143.711746] Call trace: [ 143.712115] hist_field_name.part.0+0x14/0x140 [ 143.712642] hist_field_name.part.0+0x104/0x140 [ 143.712925] hist_field_print+0x28/0x140 [ 143.713125] event_hist_trigger_print+0x174/0x4d0 [ 143.713348] hist_show+0xf8/0x980 [ 143.713521] seq_read_iter+0x1bc/0x4b0 [ 143.713711] seq_read+0x8c/0xc4 [ 143.713876] vfs_read+0xc8/0x2a4 [ 143.714043] ksys_read+0x70/0xfc [ 143.714218] __arm64_sys_read+0x24/0x30 [ 143.714400] invoke_syscall+0x50/0x120 [ 143.714587] el0_svc_common.constprop.0+0x4c/0x100 [ 143.714807] do_el0_svc+0x44/0xd0 [ 143.714970] el0_svc+0x2c/0x84 [ 143.715134] el0t_64_sync_handler+0xbc/0x140 [ 143.715334] el0t_64_sync+0x190/0x194 [ 143.715742] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (f9400000) [ 143.716510] ---[ end trace 0000000000000000 ]--- Segmentation fault Link: https://lkml.kernel.org/r/20230302020810.559462599@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers") Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09dyndbg: use the module notifier callbacksJason Baron
Bring dynamic debug in line with other subsystems by using the module notifier callbacks. This results in a net decrease in core module code. Additionally, Jim Cromie has a new dynamic debug classmap feature, which requires that jump labels be initialized prior to dynamic debug. Specifically, the new feature toggles a jump label from the existing dynamic_debug_setup() function. However, this does not currently work properly, because jump labels are initialized via the 'module_notify_list' notifier chain, which is invoked after the current call to dynamic_debug_setup(). Thus, this patch ensures that jump labels are initialized prior to dynamic debug by setting the dynamic debug notifier priority to 0, while jump labels have the higher priority of 1. Tested by Jim using his new test case, and I've verfied the correct printing via: # modprobe test_dynamic_debug dyndbg. Link: https://lore.kernel.org/lkml/20230113193016.749791-21-jim.cromie@gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302190427.9iIK2NfJ-lkp@intel.com/ Tested-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> CC: Jim Cromie <jim.cromie@gmail.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09module: make module_ktype structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09module: Remove the unused function withinJiapeng Chong
The function within is defined in the main.c file, but not called elsewhere, so remove this unused function. This routine became no longer used after commit ("module: replace module_layout with module_memory"). kernel/module/main.c:3007:19: warning: unused function 'within'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4035 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> [mcgrof: adjust commit log to explain why this change is needed] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09module: replace module_layout with module_memorySong Liu
module_layout manages different types of memory (text, data, rodata, etc.) in one allocation, which is problematic for some reasons: 1. It is hard to enable CONFIG_STRICT_MODULE_RWX. 2. It is hard to use huge pages in modules (and not break strict rwx). 3. Many archs uses module_layout for arch-specific data, but it is not obvious how these data are used (are they RO, RX, or RW?) Improve the scenario by replacing 2 (or 3) module_layout per module with up to 7 module_memory per module: MOD_TEXT, MOD_DATA, MOD_RODATA, MOD_RO_AFTER_INIT, MOD_INIT_TEXT, MOD_INIT_DATA, MOD_INIT_RODATA, and allocating them separately. This adds slightly more entries to mod_tree (from up to 3 entries per module, to up to 7 entries per module). However, this at most adds a small constant overhead to __module_address(), which is expected to be fast. Various archs use module_layout for different data. These data are put into different module_memory based on their location in module_layout. IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT; data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc. module_memory simplifies quite some of the module code. For example, ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a different allocator for the data. kernel/module/strict_rwx.c is also much cleaner with module_memory. Signed-off-by: Song Liu <song@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09Merge tag 'net-6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - core: avoid skb end_offset change in __skb_unclone_keeptruesize() - sched: - act_connmark: handle errno on tcf_idr_check_alloc - flower: fix fl_change() error recovery path - ieee802154: prevent user from crashing the host Current release - new code bugs: - eth: bnxt_en: fix the double free during device removal - tools: ynl: - fix enum-as-flags in the generic CLI - fully inherit attrs in subsets - re-license uniformly under GPL-2.0 or BSD-3-clause Previous releases - regressions: - core: use indirect calls helpers for sk_exit_memory_pressure() - tls: - fix return value for async crypto - avoid hanging tasks on the tx_lock - eth: ice: copy last block omitted in ice_get_module_eeprom() Previous releases - always broken: - core: avoid double iput when sock_alloc_file fails - af_unix: fix struct pid leaks in OOB support - tls: - fix possible race condition - fix device-offloaded sendpage straddling records - bpf: - sockmap: fix an infinite loop error - test_run: fix &xdp_frame misplacement for LIVE_FRAMES - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR - netfilter: tproxy: fix deadlock due to missing BH disable - phylib: get rid of unnecessary locking - eth: bgmac: fix *initial* chip reset to support BCM5358 - eth: nfp: fix csum for ipsec offload - eth: mtk_eth_soc: fix RX data corruption issue Misc: - usb: qmi_wwan: add telit 0x1080 composition" * tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) tools: ynl: fix enum-as-flags in the generic CLI tools: ynl: move the enum classes to shared code net: avoid double iput when sock_alloc_file fails af_unix: fix struct pid leaks in OOB support eth: fealnx: bring back this old driver net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC net: microchip: sparx5: fix deletion of existing DSCP mappings octeontx2-af: Unlock contexts in the queue context cache in case of fault detection net/smc: fix fallback failed while sendmsg with fastopen ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause mailmap: update entries for Stephen Hemminger mailmap: add entry for Maxim Mikityanskiy nfc: change order inside nfc_se_io error path ethernet: ice: avoid gcc-9 integer overflow warning ice: don't ignore return codes in VSI related code ice: Fix DSCP PFC TLV creation net: usb: qmi_wwan: add Telit 0x1080 composition net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 netfilter: conntrack: adopt safer max chain length net: tls: fix device-offloaded sendpage straddling records ...
2023-03-09livepatch: Make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230217-kobj_type-livepatch-v1-1-06ded292e897@weissschuh.net
2023-03-09livepatch: fix ELF typosAlexey Dobriyan
ELF is acronym. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/Y/3vWjQ/SBA5a0i5@p183
2023-03-08bpf: implement numbers iteratorAndrii Nakryiko
Implement the first open-coded iterator type over a range of integers. It's public API consists of: - bpf_iter_num_new() constructor, which accepts [start, end) range (that is, start is inclusive, end is exclusive). - bpf_iter_num_next() which will keep returning read-only pointer to int until the range is exhausted, at which point NULL will be returned. If bpf_iter_num_next() is kept calling after this, NULL will be persistently returned. - bpf_iter_num_destroy() destructor, which needs to be called at some point to clean up iterator state. BPF verifier enforces that iterator destructor is called at some point before BPF program exits. Note that `start = end = X` is a valid combination to setup an empty iterator. bpf_iter_num_new() will return 0 (success) for any such combination. If bpf_iter_num_new() detects invalid combination of input arguments, it returns error, resets iterator state to, effectively, empty iterator, so any subsequent call to bpf_iter_num_next() will keep returning NULL. BPF verifier has no knowledge that returned integers are in the [start, end) value range, as both `start` and `end` are not statically known and enforced: they are runtime values. While the implementation is pretty trivial, some care needs to be taken to avoid overflows and underflows. Subsequent selftests will validate correctness of [start, end) semantics, especially around extremes (INT_MIN and INT_MAX). Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can be specified. bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded BPF loops and bpf_loop() helper and is the basis for implementing ergonomic BPF loops with no statically known or verified bounds. Subsequent patches implement bpf_for() macro, demonstrating how this can be wrapped into something that works and feels like a normal for() loop in C language. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08bpf: add support for open-coded iterator loopsAndrii Nakryiko
Teach verifier about the concept of the open-coded (or inline) iterators. This patch adds generic iterator loop verification logic, new STACK_ITER stack slot type to contain iterator state, and necessary kfunc plumbing for iterator's constructor, destructor and next methods. Next patch implements first specific iterator (numbers iterator for implementing for() loop logic). Such split allows to have more focused commits for verifier logic and separate commit that we could point later to demonstrating what does it take to add a new kind of iterator. Each kind of iterator has its own associated struct bpf_iter_<type>, where <type> denotes a specific type of iterator. struct bpf_iter_<type> state is supposed to live on BPF program stack, so there will be no way to change its size later on without breaking backwards compatibility, so choose wisely! But given this struct is specific to a given <type> of iterator, this allows a lot of flexibility: simple iterators could be fine with just one stack slot (8 bytes), like numbers iterator in the next patch, while some other more complicated iterators might need way more to keep their iterator state. Either way, such design allows to avoid runtime memory allocations, which otherwise would be necessary if we fixed on-the-stack size and it turned out to be too small for a given iterator implementation. The way BPF verifier logic is implemented, there are no artificial restrictions on a number of active iterators, it should work correctly using multiple active iterators at the same time. This also means you can have multiple nested iteration loops. struct bpf_iter_<type> reference can be safely passed to subprograms as well. General flow is easiest to demonstrate with a simple example using number iterator implemented in next patch. Here's the simplest possible loop: struct bpf_iter_num it; int *v; bpf_iter_num_new(&it, 2, 5); while ((v = bpf_iter_num_next(&it))) { bpf_printk("X = %d", *v); } bpf_iter_num_destroy(&it); Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is exclusive and is not returned. This matches similar APIs (e.g., slices in Go or Rust) that implement a range of elements, where end index is non-inclusive. In the above example, we see a trio of function: - constructor, bpf_iter_num_new(), which initializes iterator state (struct bpf_iter_num it) on the stack. If any of the input arguments are invalid, constructor should make sure to still initialize it such that subsequent bpf_iter_num_next() calls will return NULL. I.e., on error, return error and construct empty iterator. - next method, bpf_iter_num_next(), which accepts pointer to iterator state and produces an element. Next method should always return a pointer. The contract between BPF verifier is that next method will always eventually return NULL when elements are exhausted. Once NULL is returned, subsequent next calls should keep returning NULL. In the case of numbers iterator, bpf_iter_num_next() returns a pointer to an int (storage for this integer is inside the iterator state itself), which can be dereferenced after corresponding NULL check. - once done with the iterator, it's mandated that user cleans up its state with the call to destructor, bpf_iter_num_destroy() in this case. Destructor frees up any resources and marks stack space used by struct bpf_iter_num as usable for something else. Any other iterator implementation will have to implement at least these three methods. It is enforced that for any given type of iterator only applicable constructor/destructor/next are callable. I.e., verifier ensures you can't pass number iterator state into, say, cgroup iterator's next method. It is important to keep the naming pattern consistent to be able to create generic macros to help with BPF iter usability. E.g., one of the follow up patches adds generic bpf_for_each() macro to bpf_misc.h in selftests, which allows to utilize iterator "trio" nicely without having to code the above somewhat tedious loop explicitly every time. This is enforced at kfunc registration point by one of the previous patches in this series. At the implementation level, iterator state tracking for verification purposes is very similar to dynptr. We add STACK_ITER stack slot type, reserve necessary number of slots, depending on sizeof(struct bpf_iter_<type>), and keep track of necessary extra state in the "main" slot, which is marked with non-zero ref_obj_id. Other slots are also marked as STACK_ITER, but have zero ref_obj_id. This is simpler than having a separate "is_first_slot" flag. Another big distinction is that STACK_ITER is *always refcounted*, which simplifies implementation without sacrificing usability. So no need for extra "iter_id", no need to anticipate reuse of STACK_ITER slots for new constructors, etc. Keeping it simple here. As far as the verification logic goes, there are two extensive comments: in process_iter_next_call() and iter_active_depths_differ() explaining some important and sometimes subtle aspects. Please refer to them for details. But from 10,000-foot point of view, next methods are the points of forking a verification state, which are conceptually similar to what verifier is doing when validating conditional jump. We branch out at a `call bpf_iter_<type>_next` instruction and simulate two outcomes: NULL (iteration is done) and non-NULL (new element is returned). NULL is simulated first and is supposed to reach exit without looping. After that non-NULL case is validated and it either reaches exit (for trivial examples with no real loop), or reaches another `call bpf_iter_<type>_next` instruction with the state equivalent to already (partially) validated one. State equivalency at that point means we technically are going to be looping forever without "breaking out" out of established "state envelope" (i.e., subsequent iterations don't add any new knowledge or constraints to the verifier state, so running 1, 2, 10, or a million of them doesn't matter). But taking into account the contract stating that iterator next method *has to* return NULL eventually, we can conclude that loop body is safe and will eventually terminate. Given we validated logic outside of the loop (NULL case), and concluded that loop body is safe (though potentially looping many times), verifier can claim safety of the overall program logic. The rest of the patch is necessary plumbing for state tracking, marking, validation, and necessary further kfunc plumbing to allow implementing iterator constructor, destructor, and next methods. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08bpf: add iterator kfuncs registration and validation logicAndrii Nakryiko
Add ability to register kfuncs that implement BPF open-coded iterator contract and enforce naming and function proto convention. Enforcement happens at the time of kfunc registration and significantly simplifies the rest of iterators logic in the verifier. More details follow in subsequent patches, but we enforce the following conditions. All kfuncs (constructor, next, destructor) have to be named consistenly as bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents iterator type, and iterator state should be represented as a matching `struct bpf_iter_<type>` state type. Also, all iter kfuncs should have a pointer to this `struct bpf_iter_<type>` as the very first argument. Additionally: - Constructor, i.e., bpf_iter_<type>_new(), can have arbitrary extra number of arguments. Return type is not enforced either. - Next method, i.e., bpf_iter_<type>_next(), has to return a pointer type and should have exactly one argument: `struct bpf_iter_<type> *` (const/volatile/restrict and typedefs are ignored). - Destructor, i.e., bpf_iter_<type>_destroy(), should return void and should have exactly one argument, similar to the next method. - struct bpf_iter_<type> size is enforced to be positive and a multiple of 8 bytes (to fit stack slots correctly). Such strictness and consistency allows to build generic helpers abstracting important, but boilerplate, details to be able to use open-coded iterators effectively and ergonomically (see bpf_for_each() in subsequent patches). It also simplifies the verifier logic in some places. At the same time, this doesn't hurt generality of possible iterator implementations. Win-win. Constructor kfunc is marked with a new KF_ITER_NEW flags, next method is marked with KF_ITER_NEXT (and should also have KF_RET_NULL, of course), while destructor kfunc is marked as KF_ITER_DESTROY. Additionally, we add a trivial kfunc name validation: it should be a valid non-NULL and non-empty string. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08bpf: factor out fetching basic kfunc metadataAndrii Nakryiko
Factor out logic to fetch basic kfunc metadata based on struct bpf_insn. This is not exactly short or trivial code to just copy/paste and this information is sometimes necessary in other parts of the verifier logic. Subsequent patches will rely on this to determine if an instruction is a kfunc call to iterator next method. No functional changes intended, including that verbose() warning behavior when kfunc is not allowed for a particular program type. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08sched/idle: Mark arch_cpu_idle_dead() __noreturnJosh Poimboeuf
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead() return"), in Xen, when a previously offlined CPU was brought back online, it unexpectedly resumed execution where it left off in the middle of the idle loop. There were some hacks to make that work, but the behavior was surprising as do_idle() doesn't expect an offlined CPU to return from the dead (in arch_cpu_idle_dead()). Now that Xen has been fixed, and the arch-specific implementations of arch_cpu_idle_dead() also don't return, give it a __noreturn attribute. This will cause the compiler to complain if an arch-specific implementation might return. It also improves code generation for both caller and callee. Also fixes the following warning: vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08sched/idle: Make sure weak version of arch_cpu_idle_dead() doesn't returnJosh Poimboeuf
arch_cpu_idle_dead() should never return. Make it so. Link: https://lore.kernel.org/r/cf5ad95eef50f7704bb30e7770c59bfe23372af7.1676358308.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08fork: allow CLONE_NEWTIME in clone3 flagsTobias Klauser
Currently, calling clone3() with CLONE_NEWTIME in clone_args->flags fails with -EINVAL. This is because CLONE_NEWTIME intersects with CSIGNAL. However, CSIGNAL was deprecated when clone3 was introduced in commit 7f192e3cd316 ("fork: add clone3"), allowing re-use of that part of clone flags. Fix this by explicitly allowing CLONE_NEWTIME in clone3_args_valid. This is also in line with the respective check in check_unshare_flags which allow CLONE_NEWTIME for unshare(). Fixes: 769071ac9f20 ("ns: Introduce Time Namespace") Cc: Andrey Vagin <avagin@openvz.org> Cc: Christian Brauner <brauner@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-08watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error pathsDavid Disseldorp
The watch_queue_set_size() allocation error paths return the ret value set via the prior pipe_resize_ring() call, which will always be zero. As a result, IOC_WATCH_QUEUE_SET_SIZE callers such as "keyctl watch" fail to detect kernel wqueue->notes allocation failures and proceed to KEYCTL_WATCH_KEY, with any notifications subsequently lost. Fixes: c73be61cede58 ("pipe: Add general notification queue support") Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-07locktorture: Add raw_spinlock* torture tests for PREEMPT_RT kernelsZqiang
In PREEMPT_RT kernels, both spin_lock() and spin_lock_irq() are converted to sleepable rt_spin_lock(). This means that the interrupt related suffixes for spin_lock/unlock(_irq, irqsave/irqrestore) do not affect the CPU's interrupt state. This commit therefore adds raw spin-lock torture tests. This in turn permits pure spin locks to be tested in PREEMPT_RT kernels. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: With nested locks, occasionally skip main lockJohn Stultz
If we're using nested locking to stress things, occasionally skip taking the main lock, so that we can get some different contention patterns between the writers (to hopefully get two disjoint blocked trees) Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested locking to rtmutex torture testsJohn Stultz
This patch adds randomized nested locking to the rtmutex torture tests. Additionally it adds LOCK09 config files for testing rtmutexes with nested locking. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested locking to mutex torture testsJohn Stultz
This patch adds randomized nested locking to the mutex torture tests, as well as new LOCK08 config files for testing mutexes with nested locking Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested_[un]lock() hooks and nlocks parameterJohn Stultz
In order to extend locktorture to support lock nesting, add nested_lock() and nested_unlock() hooks to the torture ops. These take a 32bit lockset mask which is generated at random, so some number of locks will be taken before the main lock is taken and released afterwards. Additionally, add nested_locks module parameter to allow specifying the number of nested locks to be used. This has been helpful to uncover issues in the proxy-exec series development. This was inspired by locktorture extensions originally implemented by Connor O'Brien, for stress testing the proxy-execution series: https://lore.kernel.org/lkml/20221003214501.2050087-12-connoro@google.com/ Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07bpf: enforce all maps having memory usage callbackYafang Shao
We have implemented memory usage callback for all maps, and we enforce any newly added map having a callback as well. We check this callback at map creation time. If it doesn't have the callback, we will return EINVAL. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-19-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: offload map memory usageYafang Shao
A new helper is introduced to calculate offload map memory usage. But currently the memory dynamically allocated in netdev dev_ops, like nsim_map_update_elem, is not counted. Let's just put it aside now. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf, net: bpf_local_storage memory usageYafang Shao
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc. Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: local_storage memory usageYafang Shao
A new helper is introduced to calculate local_storage map memory usage. Currently the dynamically allocated elements are not counted, since it will take runtime overhead in the element update or delete path. So let's put it aside currently, and implement it in the future if the user really needs it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-14-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: bpf_struct_ops memory usageYafang Shao
A new helper is introduced to calculate bpf_struct_ops memory usage. The result as follows, - before 1: struct_ops name count_map flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 73 - after 1: struct_ops name count_map flags 0x0 key 4B value 256B max_entries 1 memlock 5016B btf_id 73 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-13-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: queue_stack_maps memory usageYafang Shao
A new helper is introduced to calculate queue_stack_maps memory usage. The result as follows, - before 20: queue name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 266240B 21: stack name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 266240B - after 20: queue name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 524288B 21: stack name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 524288B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-12-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: devmap memory usageYafang Shao
A new helper is introduced to calculate the memory usage of devmap and devmap_hash. The number of dynamically allocated elements are recored for devmap_hash already, but not for devmap. To track the memory size of dynamically allocated elements, this patch also count the numbers for devmap. The result as follows, - before 40: devmap name count_map flags 0x80 key 4B value 4B max_entries 65536 memlock 524288B 41: devmap_hash name count_map flags 0x80 key 4B value 4B max_entries 65536 memlock 524288B - after 40: devmap name count_map flags 0x80 <<<< no elements key 4B value 4B max_entries 65536 memlock 524608B 41: devmap_hash name count_map flags 0x80 <<<< no elements key 4B value 4B max_entries 65536 memlock 524608B Note that the number of buckets is same with max_entries for devmap_hash in this case. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-11-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: cpumap memory usageYafang Shao
A new helper is introduced to calculate cpumap memory usage. The size of cpu_entries can be dynamically changed when we update or delete a cpumap element, but this patch doesn't include the memory size of cpu_entry yet. We can dynamically calculate the memory usage when we alloc or free a cpu_entry, but it will take extra runtime overhead, so let just put it aside currently. Note that the size of different cpu_entry may be different as well. The result as follows, - before 48: cpumap name count_map flags 0x4 key 4B value 4B max_entries 64 memlock 4096B - after 48: cpumap name count_map flags 0x4 key 4B value 4B max_entries 64 memlock 832B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-10-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: bloom_filter memory usageYafang Shao
Introduce a new helper to calculate the bloom_filter memory usage. The result as follows, - before 16: bloom_filter flags 0x0 key 0B value 8B max_entries 65536 memlock 524288B - after 16: bloom_filter flags 0x0 key 0B value 8B max_entries 65536 memlock 65856B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: ringbuf memory usageYafang Shao
A new helper ringbuf_map_mem_usage() is introduced to calculate ringbuf memory usage. The result as follows, - before 15: ringbuf name count_map flags 0x0 key 0B value 0B max_entries 65536 memlock 0B - after 15: ringbuf name count_map flags 0x0 key 0B value 0B max_entries 65536 memlock 78424B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230305124615.12358-8-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: reuseport_array memory usageYafang Shao
A new helper is introduced to calculate reuseport_array memory usage. The result as follows, - before 14: reuseport_sockarray name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 1048576B - after 14: reuseport_sockarray name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 524544B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-7-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>