summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-01-24genirq: Remove unneeded forward declarationDawei Li
The protoype of irq_flow_handler_t is independent of irq_data, so remove unneeded forward declaration. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240122085716.2999875-4-dawei.li@shingroup.cn
2024-01-24irqchip/gic(v3): Replace gic_irq() with irqd_to_hwirq()Dawei Li
GIC & GIC-v3 share same gic_irq() implementations, both of which serve exact same purpose as irqd_to_hwirq(). irqd_to_hwirq() is a generic and top level API of the interrupt subsystem, it's independent of any chip implementation. Replace gic_irq() with irqd_to_hwirq() and convert struct irq_data::hwirq to irq_hw_number_t explicitly. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240122085716.2999875-3-dawei.li@shingroup.cn
2024-01-24RAS: Introduce AMD Address Translation LibraryYazen Ghannam
AMD Zen-based systems report memory errors through Machine Check banks representing Unified Memory Controllers (UMCs). The address value reported for DRAM ECC errors is a "normalized address" that is relative to the UMC. This normalized address must be converted to a system physical address to be usable by the OS. Support for this address translation was introduced to the MCA subsystem with Zen1 systems. The code was later moved to the AMD64 EDAC module, since this was the only user of the code at the time. However, there are uses for this translation outside of EDAC. The system physical address can be used in MCA for preemptive page offlining as done in some MCA notifier functions. Also, this translation is needed as the basis of similar functionality needed for some CXL configurations on AMD systems. Introduce a common address translation library that can be used for multiple subsystems including MCA, EDAC, and CXL. Include support for UMC normalized to system physical address translation for current CPU systems. The Data Fabric Indirect register access offsets and one of the register fields were changed. Default to the current offsets and register field definition. And fallback to the older values if running on a "legacy" system. Provide built-in code to facilitate the loading and unloading of the library module without affecting other modules or built-in code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240123041401.79812-2-yazen.ghannam@amd.com
2024-01-24x86/entry/ia32: Ensure s32 is sign extended to s64Richard Palethorpe
Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended. This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value. This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events. It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up. Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/
2024-01-24net/mlx5: Bridge, fix multicast packets sent to uplinkMoshe Shemesh
To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules. Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5: Fix query of sd_group fieldTariq Toukan
The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5e: Use the correct lag ports number when creating TISesSaeed Mahameed
The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. This fixes the following errors that might appear in kernel log: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22) mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22 Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-23scsi: ufs: mcq: Add definition for REG_UFS_MEM_CFG registerChanWoo Lee
Instead of hardcoding the register field, add the proper definition. While at it, let's also use ufshcd_rmwl() to simplify updating this register. Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: ChanWoo Lee <cw9316.lee@samsung.com> Link: https://lore.kernel.org/r/20240102014222.23351-1-cw9316.lee@samsung.com Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-01-23scsi: core: Move autosuspend timer delay to Scsi_HostPeter Wang
The runtime suspend timer delay is a const value in scsi_host_template which a host driver cannot modify at runtime. Move the delay to Scsi_Host to allow a driver to update it. Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20240109124015.31359-2-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-01-23scsi: ufs: core: Add CPU latency QoS support for UFS driverMaramaina Naresh
Register UFS driver to CPU latency PM QoS framework to improve UFS device random I/O performance. PM QoS initialization will insert new QoS request into the CPU latency QoS list with the maximum latency PM_QOS_DEFAULT_VALUE value. The UFS driver will vote for performance mode on scale up and power save mode for scale down. If clock scaling feature is not enabled then voting will be based on clock on or off condition. Also provide a sysfs interface to enable/disable PM QoS feature. tiotest benchmark tool I/O performance results on sm8550 platform: 1. Without PM QoS support Type (Speed in) | Average of 18 iterations Random Write(IPOS) | 41065.13 Random Read(IPOS) | 37101.3 2. With PM QoS support Type (Speed in) | Average of 18 iterations Random Write(IPOS) | 46784.9 Random Read(IPOS) | 42943.4 (Improvement with PM QoS = ~15%). Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Co-developed-by: Nitin Rawat <quic_nitirawa@quicinc.com> Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com> Co-developed-by: Naveen Kumar Goud Arepalli <quic_narepall@quicinc.com> Signed-off-by: Naveen Kumar Goud Arepalli <quic_narepall@quicinc.com> Signed-off-by: Maramaina Naresh <quic_mnaresh@quicinc.com> Link: https://lore.kernel.org/r/20231219123706.6463-2-quic_mnaresh@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-01-24net/sched: flower: Fix chain template offloadIdo Schimmel
When a qdisc is deleted from a net device the stack instructs the underlying driver to remove its flow offload callback from the associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack then continues to replay the removal of the filters in the block for this driver by iterating over the chains in the block and invoking the 'reoffload' operation of the classifier being used. In turn, the classifier in its 'reoffload' operation prepares and emits a 'FLOW_CLS_DESTROY' command for each filter. However, the stack does not do the same for chain templates and the underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when a qdisc is deleted. This results in a memory leak [1] which can be reproduced using [2]. Fix by introducing a 'tmplt_reoffload' operation and have the stack invoke it with the appropriate arguments as part of the replay. Implement the operation in the sole classifier that supports chain templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}' command based on whether a flow offload callback is being bound to a filter block or being unbound from one. As far as I can tell, the issue happens since cited commit which reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains() in __tcf_block_put(). The order cannot be reversed as the filter block is expected to be freed after flushing all the chains. [1] unreferenced object 0xffff888107e28800 (size 2048): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[...... 01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................ backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab374e>] __kmalloc+0x4e/0x90 [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0 unreferenced object 0xffff88816d2c0400 (size 1024): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): 40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8..... 10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m.... backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90 [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0 [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460 [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0 [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [2] # tc qdisc add dev swp1 clsact # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32 # tc qdisc del dev swp1 clsact # devlink dev reload pci/0000:06:00.0 Fixes: bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-23net/ipv6: resolve warning in ip6_fib.cBreno Leitao
In some configurations, the 'iter' variable in function fib6_repair_tree() is unused, resulting the following warning when compiled with W=1. net/ipv6/ip6_fib.c:1781:6: warning: variable 'iter' set but not used [-Wunused-but-set-variable] 1781 | int iter = 0; | ^ It is unclear what is the advantage of this RT6_TRACE() macro[1], since users can control pr_debug() in runtime, which is better than at compilation time. pr_debug() has no overhead when disabled. Remove the RT6_TRACE() in favor of simple pr_debug() helpers. [1] Link: https://lore.kernel.org/all/ZZwSEJv2HgI0cD4J@gmail.com/ Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240122181955.2391676-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23bpf, net: switch to dynamic registrationKui-Feng Lee
Replace the static list of struct_ops types with per-btf struct_ops_tab to enable dynamic registration. Both bpf_dummy_ops and bpf_tcp_ca now utilize the registration function instead of being listed in bpf_struct_ops_types.h. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: validate value_typeKui-Feng Lee
A value_type should consist of three components: refcnt, state, and data. refcnt and state has been move to struct bpf_struct_ops_common_value to make it easier to check the value type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-11-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: hold module refcnt in bpf_struct_ops map creation and prog verification.Kui-Feng Lee
To ensure that a module remains accessible whenever a struct_ops object of a struct_ops type provided by the module is still in use. struct bpf_struct_ops_map doesn't hold a refcnt to btf anymore since a module will hold a refcnt to it's btf already. But, struct_ops programs are different. They hold their associated btf, not the module since they need only btf to assure their types (signatures). However, verifier holds the refcnt of the associated module of a struct_ops type temporarily when verify a struct_ops prog. Verifier needs the help from the verifier operators (struct bpf_verifier_ops) provided by the owner module to verify data access of a prog, provide information, and generate code. This patch also add a count of links (links_cnt) to bpf_struct_ops_map. It avoids bpf_struct_ops_map_put_progs() from accessing btf after calling module_put() in bpf_struct_ops_map_free(). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-10-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: pass attached BTF to the bpf_struct_ops subsystemKui-Feng Lee
Pass the fd of a btf from the userspace to the bpf() syscall, and then convert the fd into a btf. The btf is generated from the module that defines the target BPF struct_ops type. In order to inform the kernel about the module that defines the target struct_ops type, the userspace program needs to provide a btf fd for the respective module's btf. This btf contains essential information on the types defined within the module, including the target struct_ops type. A btf fd must be provided to the kernel for struct_ops maps and for the bpf programs attached to those maps. In the case of the bpf programs, the attach_btf_obj_fd parameter is passed as part of the bpf_attr and is converted into a btf. This btf is then stored in the prog->aux->attach_btf field. Here, it just let the verifier access attach_btf directly. In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added for map_flags to indicate that the value of value_type_btf_obj_fd is set. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: lookup struct_ops types from a given module BTF.Kui-Feng Lee
This is a preparation for searching for struct_ops types from a specified module. BTF is always btf_vmlinux now. This patch passes a pointer of BTF to bpf_struct_ops_find_value() and bpf_struct_ops_find(). Once the new registration API of struct_ops types is used, other BTFs besides btf_vmlinux can also be passed to them. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-8-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: pass btf object id in bpf_map_info.Kui-Feng Lee
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex: bpftools struct_ops dump) know the correct btf from the kernel to look up type information of struct_ops types. Since struct_ops types can be defined and registered in a module. The type information of a struct_ops type are defined in the btf of the module defining it. The userspace tools need to know which btf is for the module defining a struct_ops type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf, net: introduce bpf_struct_ops_desc.Kui-Feng Lee
Move some of members of bpf_struct_ops to bpf_struct_ops_desc. type_id is unavailabe in bpf_struct_ops anymore. Modules should get it from the btf received by kmod's init function. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: refactory struct_ops type initialization to a function.Kui-Feng Lee
Move the majority of the code to bpf_struct_ops_init_one(), which can then be utilized for the initialization of newly registered dynamically allocated struct_ops types in the following patches. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23bpf: Store cookies in kprobe_multi bpf_link_info dataJiri Olsa
Storing cookies in kprobe_multi bpf_link_info data. The cookies field is optional and if provided it needs to be an array of __u64 with kprobe_multi.count length. Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: Add cookie to perf_event bpf_link_info recordsJiri Olsa
At the moment we don't store cookie for perf_event probes, while we do that for the rest of the probes. Adding cookie fields to struct bpf_link_info perf event probe records: perf_event.uprobe perf_event.kprobe perf_event.tracepoint perf_event.perf_event And the code to store that in bpf_link_info struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: Define struct bpf_tcp_req_attrs when CONFIG_SYN_COOKIES=n.Kuniyuki Iwashima
kernel test robot reported the warning below: >> net/core/filter.c:11842:13: warning: declaration of 'struct bpf_tcp_req_attrs' will not be visible outside of this function [-Wvisibility] 11842 | struct bpf_tcp_req_attrs *attrs, int attrs__sz) | ^ 1 warning generated. struct bpf_tcp_req_attrs is defined under CONFIG_SYN_COOKIES but used in kfunc without the config. Let's move struct bpf_tcp_req_attrs definition outside of CONFIG_SYN_COOKIES guard. Fixes: e472f88891ab ("bpf: tcp: Support arbitrary SYN Cookie.") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401180418.CUVc0hxF-lkp@intel.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240118211751.25790-1-kuniyu@amazon.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf, docs: Fix bpf_redirect_peer header docVictor Stewart
Amend the bpf_redirect_peer() header documentation to also mention support for the netkit device type. Signed-off-by: Victor Stewart <v@nametag.social> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240116202952.241009-1-v@nametag.social Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: tcp: Support arbitrary SYN Cookie.Kuniyuki Iwashima
This patch adds a new kfunc available at TC hook to support arbitrary SYN Cookie. The basic usage is as follows: struct bpf_tcp_req_attrs attrs = { .mss = mss, .wscale_ok = wscale_ok, .rcv_wscale = rcv_wscale, /* Server's WScale < 15 */ .snd_wscale = snd_wscale, /* Client's WScale < 15 */ .tstamp_ok = tstamp_ok, .rcv_tsval = tsval, .rcv_tsecr = tsecr, /* Server's Initial TSval */ .usec_ts_ok = usec_ts_ok, .sack_ok = sack_ok, .ecn_ok = ecn_ok, } skc = bpf_skc_lookup_tcp(...); sk = (struct sock *)bpf_skc_to_tcp_sock(skc); bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs)); bpf_sk_release(skc); bpf_sk_assign_tcp_reqsk() takes skb, a listener sk, and struct bpf_tcp_req_attrs and allocates reqsk and configures it. Then, bpf_sk_assign_tcp_reqsk() links reqsk with skb and the listener. The notable thing here is that we do not hold refcnt for both reqsk and listener. To differentiate that, we mark reqsk->syncookie, which is only used in TX for now. So, if reqsk->syncookie is 1 in RX, it means that the reqsk is allocated by kfunc. When skb is freed, sock_pfree() checks if reqsk->syncookie is 1, and in that case, we set NULL to reqsk->rsk_listener before calling reqsk_free() as reqsk does not hold a refcnt of the listener. When the TCP stack looks up a socket from the skb, we steal the listener from the reqsk in skb_steal_sock() and create a full sk in cookie_v[46]_check(). The refcnt of reqsk will finally be set to 1 in tcp_get_cookie_sock() after creating a full sk. Note that we can extend struct bpf_tcp_req_attrs in the future when we add a new attribute that is determined in 3WHS. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-6-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().Kuniyuki Iwashima
We will support arbitrary SYN Cookie with BPF in the following patch. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to cookie_[46]_check() as skb->sk. If skb->sk is not NULL, we call cookie_bpf_check(). Then, we clear skb->sk and skb->destructor, which are needed not to hold refcnt for reqsk and the listener. See the following patch for details. After that, we finish initialisation for the remaining fields with cookie_tcp_reqsk_init(). Note that the server side WScale is set only for non-BPF SYN Cookie. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-5-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().Kuniyuki Iwashima
We will support arbitrary SYN Cookie with BPF. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to TCP stack as skb->sk with req->syncookie 1. Also, the reqsk has its listener as req->rsk_listener with no refcnt taken. When the TCP stack looks up a socket from the skb, we steal inet_reqsk(skb->sk)->rsk_listener in skb_steal_sock() so that the skb will be processed in cookie_v[46]_check() with the listener. Note that we do not clear skb->sk and skb->destructor so that we can carry the reqsk to cookie_v[46]_check(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-4-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23tcp: Move skb_steal_sock() to request_sock.hKuniyuki Iwashima
We will support arbitrary SYN Cookie with BPF. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to TCP stack as skb->sk with req->syncookie 1. In skb_steal_sock(), we need to check inet_reqsk(sk)->syncookie to see if the reqsk is created by kfunc. However, inet_reqsk() is not available in sock.h. Let's move skb_steal_sock() to request_sock.h. While at it, we refactor skb_steal_sock() so it returns early if skb->sk is NULL to minimise the following patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240115205514.68364-3-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23tcp: Move tcp_ns_to_ts() to tcp.hKuniyuki Iwashima
We will support arbitrary SYN Cookie with BPF. When BPF prog validates ACK and kfunc allocates a reqsk, we need to call tcp_ns_to_ts() to calculate an offset of TSval for later use: time t0 : Send SYN+ACK -> tsval = Initial TSval (Random Number) t1 : Recv ACK of 3WHS -> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns()) = Initial TSval - t1 t2 : Send ACK -> tsval = t2 + tsoff = Initial TSval + (t2 - t1) = Initial TSval + Time Delta (x) (x) Note that the time delta does not include the initial RTT from t0 to t1. Let's move tcp_ns_to_ts() to tcp.h. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240115205514.68364-2-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: Make bpf_for_each_spilled_reg consider narrow spillsMaxim Mikityanskiy
Adjust the check in bpf_get_spilled_reg to take into account spilled registers narrower than 64 bits. That allows find_equal_scalars to properly adjust the range of all spilled registers that have the same ID. Before this change, it was possible for a register and a spilled register to have the same IDs but different ranges if the spill was narrower than 64 bits and a range check was performed on the register. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-5-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: support multiple tags per argumentAndrii Nakryiko
Add ability to iterate multiple decl_tag types pointed to the same function argument. Use this to support multiple __arg_xxx tags per global subprog argument. We leave btf_find_decl_tag_value() intact, but change its implementation to use a new btf_find_next_decl_tag() which can be straightforwardly used to find next BTF type ID of a matching btf_decl_tag type. btf_prepare_func_args() is switched from btf_find_decl_tag_value() to btf_find_next_decl_tag() to gain multiple tags per argument support. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23bpf: Support inlining bpf_kptr_xchg() helperHou Tao
The motivation of inlining bpf_kptr_xchg() comes from the performance profiling of bpf memory allocator benchmark. The benchmark uses bpf_kptr_xchg() to stash the allocated objects and to pop the stashed objects for free. After inling bpf_kptr_xchg(), the performance for object free on 8-CPUs VM increases about 2%~10%. The inline also has downside: both the kasan and kcsan checks on the pointer will be unavailable. bpf_kptr_xchg() can be inlined by converting the calling of bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion depends on two conditions: 1) JIT backend supports atomic_xchg() on pointer-sized word 2) For the specific arch, the implementation of xchg is the same as atomic_xchg() on pointer-sized words. It seems most 64-bit JIT backends satisfies these two conditions. But as a precaution, defining a weak function bpf_jit_supports_ptr_xchg() to state whether such conversion is safe and only supporting inline for 64-bit host. For x86-64, it supports BPF_XCHG atomic operation and both xchg() and atomic_xchg() use arch_xchg() to implement the exchange, so enabling the inline of bpf_kptr_xchg() on x86-64 first. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240105104819.3916743-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23Compiler Attributes: counted_by: fixup clang URLSergey Senozhatsky
The URL in question 404 now, fix it up (and switch to github). Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/b7babeb9c5b14af9189f0d6225673e6e9a8f4ad3.1704855496.git.senozhatsky@chromium.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-23Compiler Attributes: counted_by: bump min gcc versionSergey Senozhatsky
GCC is expected to implement this feature in version 15, so bump the version. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/e1c27b64ae7abe2ebe647be11b71cf1bca84f677.1704855495.git.senozhatsky@chromium.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-23Merge tag 'exportfs-6.9' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux Merge exportfs fixes from Chuck Lever: * tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs: Create a generic is_dot_dotdot() utility exportfs: fix the fallback implementation of the get_name export operation Link: https://lore.kernel.org/r/BDC2AEB4-7085-4A7C-8DE8-A659FE1DBA6A@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23nvmet: unify aer type enumGuixin Liu
The host and target use two definition of aer type, unify them into a single one. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-23fs: Create a generic is_dot_dotdot() utilityChuck Lever
De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-23sock_diag: allow concurrent operation in sock_diag_rcv_msg()Eric Dumazet
TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized on sock_diag_table_mutex. This is to make sure inet_diag module is not unloaded while diag was ongoing. It is time to get rid of this mutex and use RCU protection, allowing full parallelism. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-23sock_diag: add module pointer to "struct sock_diag_handler"Eric Dumazet
Following patch is going to use RCU instead of sock_diag_table_mutex acquisition. This patch is a preparation, no change of behavior yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-23inet_diag: add module pointer to "struct inet_diag_handler"Eric Dumazet
Following patch is going to use RCU instead of inet_diag_table_mutex acquisition. This patch is a preparation, no change of behavior yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-23dt-bindings: clock: exynos850: Add PDMA clocksSam Protsenko
Add constants for Peripheral DMA (PDMA) clocks in CMU_CORE controller: - PDMA_ACLK: clock for PDMA0 (regular DMA) - SPDMA_ACLK: clock for PDMA1 (secure DMA) Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Link: https://lore.kernel.org/r/20240120012948.8836-2-semen.protsenko@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2024-01-23dt-bindings: clock: google,gs101-clock: add PERIC0 clock management unitTudor Ambarus
Add dt-schema documentation for the Connectivity Peripheral 0 (PERIC0) clock management unit. Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://lore.kernel.org/r/20240119111132.1290455-2-tudor.ambarus@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2024-01-23drm/etnaviv: Expose a few more chipspecs to userspaceTomeu Vizoso
These ones will be needed to make use fo the NN and TP units in the NPUs based on Vivante IP. Also fix the number of NN cores in the VIPNano-qi. Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Acked-by: Christian Gmeiner <cgmeiner@igalia.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2024-01-23video/nomodeset: Select nomodeset= parameter with CONFIG_VIDEOThomas Zimmermann
Enable support for nomodeset= parameter via CONFIG_VIDEO. Both, DRM and fbdev, already select this option. Remove the existing option CONFIG_VIDEO_NOMODESET. Simplifies the Kconfig rules. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240118090721.7995-4-tzimmermann@suse.de
2024-01-23video/cmdline: Hide __video_get_options() behind CONFIG_FB_COREThomas Zimmermann
The function __video_get_options() only exists for compatibility with old fbdev drivers that cannot be refactored easily. Hide it behind CONFIG_FB_CORE. v2: * support CONFIG_FB_CORE=m via IS_ENABLED() (kernel test robot) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240118090721.7995-3-tzimmermann@suse.de
2024-01-23video/cmdline: Introduce CONFIG_VIDEO for video= parameterThomas Zimmermann
Add CONFIG_VIDEO for common code in drivers/video/. Use the option to select helpers for the video= parameter. Replaces CONFIG_VIDEO_CMDLINE. Other common code in drivers/video/ can be moved behind CONFIG_VIDEO, which will simplify the Kconfig rules. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240118090721.7995-2-tzimmermann@suse.de
2024-01-23of: Add for_each_reserved_child_of_node()Kuninori Morimoto
We would like to use for_each loop for status = "reserved" nodes. Add for_each_reserved_child_of_node() for it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Tested-by: Yusuke Goda <yusuke.goda.sx@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/87a5pegfau.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2024-01-22Merge drm/drm-next into drm-xe-nextLucas De Marchi
Sync to v6.8-rc1. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-01-22afs: Fix error handling with lookup via FS.InlineBulkStatusDavid Howells
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively look up a bunch of files in the parent directory and cache this locally, on the basis that we might want to look at them too (for example if someone does an ls on a directory, they may want want to then stat every file listed). FS.InlineBulkStatus can be considered a compound op with the normal abort code applying to the compound as a whole. Each status fetch within the compound is then given its own individual abort code - but assuming no error that prevents the bulk fetch from returning the compound result will be 0, even if all the constituent status fetches failed. At the conclusion of afs_do_lookup(), we should use the abort code from the appropriate status to determine the error to return, if any - but instead it is assumed that we were successful if the op as a whole succeeded and we return an incompletely initialised inode, resulting in ENOENT, no matter the actual reason. In the particular instance reported, a vnode with no permission granted to be accessed is being given a UAEACCES abort code which should be reported as EACCES, but is instead being reported as ENOENT. Fix this by abandoning the inode (which will be cleaned up with the op) if file[1] has an abort code indicated and turn that abort code into an error instead. Whilst we're at it, add a tracepoint so that the abort codes of the individual subrequests of FS.InlineBulkStatus can be logged. At the moment only the container abort code can be 0. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-22Merge tag 'for-6.8-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - zoned mode fixes: - fix slowdown when writing large file sequentially by looking up block groups with enough space faster - locking fixes when activating a zone - new mount API fixes: - preserve mount options for a ro/rw mount of the same subvolume - scrub fixes: - fix use-after-free in case the chunk length is not aligned to 64K, this does not happen normally but has been reported on images converted from ext4 - similar alignment check was missing with raid-stripe-tree - subvolume deletion fixes: - prevent calling ioctl on already deleted subvolume - properly track flag tracking a deleted subvolume - in subpage mode, fix decompression of an inline extent (zlib, lzo, zstd) - fix crash when starting writeback on a folio, after integration with recent MM changes this needs to be started conditionally - reject unknown flags in defrag ioctl - error handling, API fixes, minor warning fixes * tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: limit RST scrub to chunk boundary btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned btrfs: don't unconditionally call folio_start_writeback in subpage btrfs: use the original mount's mount options for the legacy reconfigure btrfs: don't warn if discard range is not aligned to sector btrfs: tree-checker: fix inline ref size in error messages btrfs: zstd: fix and simplify the inline extent decompression btrfs: lzo: fix and simplify the inline extent decompression btrfs: zlib: fix and simplify the inline extent decompression btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted btrfs: don't abort filesystem when attempting to snapshot deleted subvolume btrfs: zoned: fix lock ordering in btrfs_zone_activate() btrfs: fix unbalanced unlock of mapping_tree_lock btrfs: ref-verify: free ref cache before clearing mount opt btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send() btrfs: zoned: optimize hint byte for zoned allocator btrfs: zoned: factor out prepare_allocation_zoned()