summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-20net: sched: Make tc-related drop reason more flexible for remaining qdiscsVictor Nogueira
Incrementing on Daniel's patch[1], make tc-related drop reason more flexible for remaining qdiscs - that is, all qdiscs aside from clsact. In essence, the drop reason will be set by cls_api and act_api in case any error occurred in the data path. With that, we can give the user more detailed information so that they can distinguish between a policy drop or an error drop. [1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20net: sched: Move drop_reason to struct tc_skb_cbVictor Nogueira
Move drop_reason from struct tcf_result to skb cb - more specifically to struct tc_skb_cb. With that, we'll be able to also set the drop reason for the remaining qdiscs (aside from clsact) that do not have access to tcf_result when time comes to set the skb drop reason. Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20r8169: add support for LED's on RTL8168/RTL8101Heiner Kallweit
This adds support for the LED's on most chip versions. Excluded are the old non-PCIe versions and RTL8125. RTL8125 has a different LED register layout, support for it will follow later. LED's can be controlled from userspace using the netdev LED trigger. Tested on RTL8168h. Note: The driver can't know which LED's are actually physically wired. Therefore not every LED device may represent a physically available LED. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20gfs2: use is_subdir()Al Viro
... instead of reimplementing it with misguiding name (is_ancestor(x, y) would normally imply "x is an ancestor of y", not the other way round). With races, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20gfs2: d_obtain_alias(ERR_PTR(...)) will do the right thingAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20Merge branch 'bridge-mdb-bulk-delete'David S. Miller
Ido Schimmel says: ==================== Add MDB bulk deletion support This patchset adds MDB bulk deletion support, allowing user space to request the deletion of matching entries instead of dumping the entire MDB and issuing a separate deletion request for each matching entry. Support is added in both the bridge and VXLAN drivers in a similar fashion to the existing FDB bulk deletion support. The parameters according to which bulk deletion can be performed are similar to the FDB ones, namely: Destination port, VLAN ID, state (e.g., "permanent"), routing protocol, source / destination VNI, destination IP and UDP port. Flushing based on flags (e.g., "offload", "fast_leave", "added_by_star_ex", "blocked") is not currently supported, but can be added in the future, if a use case arises. Patch #1 adds a new uAPI attribute to allow specifying the state mask according to which bulk deletion will be performed, if any. Patch #2 adds a new policy according to which bulk deletion requests (with 'NLM_F_BULK' flag set) will be parsed. Patches #3-#4 add a new NDO for MDB bulk deletion and invoke it from the rtnetlink code when a bulk deletion request is made. Patches #5-#6 implement the MDB bulk deletion NDO in the bridge and VXLAN drivers, respectively. Patch #7 allows user space to issue MDB bulk deletion requests by no longer rejecting the 'NLM_F_BULK' flag when it is set in 'RTM_DELMDB' requests. Patches #8-#9 add selftests for both drivers, for both good and bad flows. iproute2 changes can be found here [1]. https://github.com/idosch/iproute2/tree/submit/mdb_flush_v1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20selftests: vxlan_mdb: Add MDB bulk deletion testIdo Schimmel
Add test cases to verify the behavior of the MDB bulk deletion functionality in the VXLAN driver. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20selftests: bridge_mdb: Add MDB bulk deletion testIdo Schimmel
Add test cases to verify the behavior of the MDB bulk deletion functionality in the bridge driver. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Enable MDB bulk deletionIdo Schimmel
Now that both the common code as well as individual drivers support MDB bulk deletion, allow user space to make such requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20vxlan: mdb: Add MDB bulk deletion supportIdo Schimmel
Implement MDB bulk deletion support in the VXLAN driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20bridge: mdb: Add MDB bulk deletion supportIdo Schimmel
Implement MDB bulk deletion support in the bridge driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Invoke MDB bulk deletion when neededIdo Schimmel
Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK' flag is set in the netlink message header. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20net: Add MDB bulk deletion device operationIdo Schimmel
Add MDB net device operation that will be invoked by rtnetlink code in response to received 'RTM_DELMDB' messages with the 'NLM_F_BULK' flag set. Subsequent patches will implement the operation in the bridge and VXLAN drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20rtnetlink: bridge: Use a different policy for MDB bulk deleteIdo Schimmel
For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY' differently compared to regular delete. Specifically, allow the ifindex to be zero (in case not filtering on bridge port) and force the address to be zero as bulk delete based on address is not supported. Do that by introducing a new policy and choosing the correct policy based on the presence of the 'NLM_F_BULK' flag in the netlink message header. Use nlmsg_parse() for strict validation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20bridge: add MDB state mask uAPI attributeIdo Schimmel
Currently, the 'state' field in 'struct br_port_msg' can be set to 1 if the MDB entry is permanent or 0 if it is temporary. Additional states might be added in the future. In a similar fashion to 'NDA_NDM_STATE_MASK', add an MDB state mask uAPI attribute that will allow the upcoming bulk deletion API to bulk delete MDB entries with a certain state or any state. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20net: stmmac: fix incorrect flag check in timestamp interruptLai Peter Jun Ann
The driver should continue get the timestamp if STMMAC_FLAG_EXT_SNAPSHOT_EN flag is set. Fixes: aa5513f5d95f ("net: stmmac: replace the ext_snapshot_en field with a flag") Cc: <stable@vger.kernel.org> # 6.6 Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20octeontx2-af: insert space after includeWang Jinchao
Maintain Consistent Formatting: Insert Space after #include Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20Merge tag 'for-net-2023-12-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Add encryption key size check when acting as peripheral - Shut up false-positive build warning - Send reject if L2CAP command request is corrupted - Fix Use-After-Free in bt_sock_recvmsg - Fix not notifying when connection encryption changes - Fix not checking if HCI_OP_INQUIRY has been sent - Fix address type send over to the MGMT interface - Fix deadlock in vhci_send_frame ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20x86/asm: Add DB flag to 32-bit percpu GDT entryVegard Nossum
The D/B size flag for the 32-bit percpu GDT entry was not set. The Intel manual (vol 3, section 3.4.5) only specifies the meaning of this flag for three cases: 1) code segments used for %cs -- doesn't apply here 2) stack segments used for %ss -- doesn't apply 3) expand-down data segments -- but we don't have the expand-down flag set, so it also doesn't apply here The flag likely doesn't do anything here, although the manual does also say: "This flag should always be set to 1 for 32-bit code and data segments [...]" so we should probably do it anyway. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-6-vegard.nossum@oracle.com
2023-12-20x86/asm: Always set A (accessed) flag in GDT descriptorsVegard Nossum
We have no known use for having the CPU track whether GDT descriptors have been accessed or not. Simplify the code by adding the flag to the common flags and removing it everywhere else. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, script-generated changeVegard Nossum
Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, preparationsVegard Nossum
We'd like to replace all the magic numbers in various GDT descriptors with new, semantically meaningful, symbolic values. In order to be able to verify that the change doesn't cause any actual changes to the compiled binary code, I've split the change into two patches: - Part 1 (this commit): everything _but_ actually replacing the numbers - Part 2 (the following commit): _only_ replacing the numbers The reason we need this split for verification is that including new headers causes some spurious changes to the object files, mostly line number changes in the debug info but occasionally other subtle codegen changes. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
2023-12-20x86/asm: Provide new infrastructure for GDT descriptorsVegard Nossum
Linus suggested replacing the magic numbers in the GDT descriptors using preprocessor macros. Designing the interface properly is actually pretty hard -- there are several constraints: - you want the final expressions to be readable at a glance; something like GDT_ENTRY_FLAGS(5, 1, 0, 1, 0, 1, 1, 0) isn't because you need to visit the definition to understand what each parameter represents and then match up parameters in the user and the definition (which is hard when there are so many of them) - you want the final expressions to be fairly short/information-dense; something like GDT_ENTRY_PRESENT | GDT_ENTRY_DATA_WRITABLE | GDT_ENTRY_SYSTEM | GDT_ENTRY_DB | GDT_ENTRY_GRANULARITY_4K is a bit too verbose to write out every time and is actually hard to read as well because of all the repetition - you may want to assume defaults for some things (e.g. entries are DPL-0 a.k.a. kernel segments by default) and allow the user to override the default -- but this works best if you can OR in the override; if you want DPL-3 by default and override with DPL-0 you would need to start masking off bits instead of OR-ing them in and that just becomes harder to read - you may want to parameterize some things (e.g. CODE vs. DATA or KERNEL vs. USER) since both values are used and you don't really want prefer either one by default -- or DPL, which is always some value that is always specified This patch tries to balance these requirements and has two layers of definitions -- low-level and high-level: - the low-level defines are the mapping between human-readable names and the actual bit numbers - the high-level defines are the mapping from high-level intent to combinations of low-level flags, representing roughly a tuple (data/code/tss, 64/32/16-bits) plus an override for DPL-3 (= USER), since that's relatively rare but still very important to mark properly for those segments. - we have *_BIOS variants for 32-bit code and data segments that don't have the G flag set and give the limit in terms of bytes instead of pages [ mingo: Improved readability bit more. ] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-2-vegard.nossum@oracle.com
2023-12-20netfilter: nf_tables: set transport offset from mac header for netdev/egressPablo Neira Ayuso
Before this patch, transport offset (pkt->thoff) provides an offset relative to the network header. This is fine for the inet families because skb->data points to the network header in such case. However, from netdev/egress, skb->data points to the mac header (if available), thus, pkt->thoff is missing the mac header length. Add skb_network_offset() to the transport offset (pkt->thoff) for netdev, so transport header mangling works as expected. Adjust payload fast eval function to use skb->data now that pkt->thoff provides an absolute offset. This explains why users report that matching on egress/netdev works but payload mangling does not. This patch implicitly fixes payload mangling for IPv4 packets in netdev/egress given skb_store_bits() requires an offset from skb->data to reach the transport header. I suspect that nft_exthdr and the trace infra were also broken from netdev/egress because they also take skb->data as start, and pkt->thoff was not correct. Note that IPv6 is fine because ipv6_find_hdr() already provides a transport offset starting from skb->data, which includes skb_network_offset(). The bridge family also uses nft_set_pktinfo_ipv4_validate(), but there skb_network_offset() is zero, so the update in this patch does not alter the existing behaviour. Fixes: 42df6e1d221d ("netfilter: Introduce egress hook") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-20mtd: rawnand: s3c2410: fix Excess struct member description kernel-doc warningsRandy Dunlap
Delete 2 lines to prevent warnings from scripts/kernel-doc: s3c2410.c:117: warning: Excess struct member 'mtd' description in 's3c2410_nand_mtd' s3c2410.c:168: warning: Excess struct member 'freq_transition' description in 's3c2410_nand_info' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312150611.EZBAQYqf-lkp@intel.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-mtd@lists.infradead.org Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Alim Akhtar <alim.akhtar@samsung.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231216044146.18645-1-rdunlap@infradead.org
2023-12-20MAINTAINERS: change my mail to the kernel.org oneMichael Walle
As I'm doing more and more work professionally, move away from my private mail address. Signed-off-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/r/20231219091218.2846297-1-michael@walle.cc Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2023-12-20mtd: spi-nor: sfdp: get the 1-1-8 and 1-8-8 protocol from SFDPJaimeLiao
BFPT 17th DWORD contains the information about 1-1-8 and 1-8-8. Parse BFPT DWORD[17] instruction to determine whether flash supports 1-1-8 and 1-8-8, and set its dummy cycles accordingly. Validated only the 1-1-8 read using a macronix flash with Xilinx board zynq-picozed. Signed-off-by: JaimeLiao <jaimeliao@mxic.com.tw> Reviewed-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/20231219102103.92738-2-jaimeliao.tw@gmail.com [ta: update commit message, get rid of extra dereference] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
2023-12-19selftests/bpf: Close cgrp fd before calling cleanup_cgroup_environment()Hou Tao
There is error log when htab-mem benchmark completes. The error log looks as follows: $ ./bench htab-mem -d1 Setting up benchmark 'htab-mem'... Benchmark 'htab-mem' started. ...... (cgroup_helpers.c:353: errno: Device or resource busy) umount cgroup2 Fix it by closing cgrp fd before invoking cleanup_cgroup_environment(). Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231219135727.2661527-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19Merge branch 'enhance-bpf-global-subprogs-with-argument-tags'Alexei Starovoitov
Andrii Nakryiko says: ==================== Enhance BPF global subprogs with argument tags This patch set adds verifier support for annotating user's global BPF subprog arguments with few commonly requested annotations, to improve global subprog verification experience. These tags are: - ability to annotate a special PTR_TO_CTX argument; - ability to annotate a generic PTR_TO_MEM as non-null. We utilize btf_decl_tag attribute for this and provide two helper macros as part of bpf_helpers.h in libbpf (patch #8). Besides this we also add abilit to pass a pointer to dynptr into global subprog. This is done based on type name match (struct bpf_dynptr *). This allows to pass dynptrs into global subprogs, for use cases that deal with variable-sized generic memory pointers. Big chunk of the patch set (patches #1 through #5) are various refactorings to make verifier internals around global subprog validation logic easier to extend and support long term, eliminating BTF parsing logic duplication, factoring out argument expectation definitions from BTF parsing, etc. New functionality is added in patch #6 (ctx and non-null) and patch #7 (dynptr), extending global subprog checks with awareness for arg tags. Patch #9 adds simple tests validating each of the added tags and dynptr argument passing. Patch #10 adds a simple negative case for freplace programs to make sure that target BPF programs with "unreliable" BTF func proto cannot be freplaced. v2->v3: - patch #10 improved by checking expected verifier error (Eduard); v1->v2: - dropped packet args for now (Eduard); - added back unreliable=true detection for entry BPF programs (Eduard); - improved subprog arg validation (Eduard); - switched dynptr arg from tag to just type name based check (Eduard). ==================== Link: https://lore.kernel.org/r/20231215011334.2307144-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19selftests/bpf: add freplace of BTF-unreliable main prog testAndrii Nakryiko
Add a test validating that freplace'ing another main (entry) BPF program fails if the target BPF program doesn't have valid/expected func proto BTF. We extend fexit_bpf2bpf test to allow to specify expected log message for negative test cases (where freplace program is expected to fail to load). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19selftests/bpf: add global subprog annotation testsAndrii Nakryiko
Add test cases to validate semantics of global subprog argument annotations: - non-null pointers; - context argument; - const dynptr passing; - packet pointers (data, metadata, end). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19libbpf: add __arg_xxx macros for annotating global func argsAndrii Nakryiko
Add a set of __arg_xxx macros which can be used to augment BPF global subprogs/functions with extra information for use by BPF verifier. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: add support for passing dynptr pointer to global subprogAndrii Nakryiko
Add ability to pass a pointer to dynptr into global functions. This allows to have global subprogs that accept and work with generic dynptrs that are created by caller. Dynptr argument is detected based on the name of a struct type, if it's "bpf_dynptr", it's assumed to be a proper dynptr pointer. Both actual struct and forward struct declaration types are supported. This is conceptually exactly the same semantics as bpf_user_ringbuf_drain()'s use of dynptr to pass a variable-sized pointer to ringbuf record. So we heavily rely on CONST_PTR_TO_DYNPTR bits of already existing logic in the verifier. During global subprog validation, we mark such CONST_PTR_TO_DYNPTR as having LOCAL type, as that's the most unassuming type of dynptr and it doesn't have any special helpers that can try to free or acquire extra references (unlike skb, xdp, or ringbuf dynptr). So that seems like a safe "choice" to make from correctness standpoint. It's still possible to pass any type of dynptr to such subprog, though, because generic dynptr helpers, like getting data/slice pointers, read/write memory copying routines, dynptr adjustment and getter routines all work correctly with any type of dynptr. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: support 'arg:xxx' btf_decl_tag-based hints for global subprog argsAndrii Nakryiko
Add support for annotating global BPF subprog arguments to provide more information about expected semantics of the argument. Currently, verifier relies purely on argument's BTF type information, and supports three general use cases: scalar, pointer-to-context, and pointer-to-fixed-size-memory. Scalar and pointer-to-fixed-mem work well in practice and are quite natural to use. But pointer-to-context is a bit problematic, as typical BPF users don't realize that they need to use a special type name to signal to verifier that argument is not just some pointer, but actually a PTR_TO_CTX. Further, even if users do know which type to use, it is limiting in situations where the same BPF program logic is used across few different program types. Common case is kprobes, tracepoints, and perf_event programs having a helper to send some data over BPF perf buffer. bpf_perf_event_output() requires `ctx` argument, and so it's quite cumbersome to share such global subprog across few BPF programs of different types, necessitating extra static subprog that is context type-agnostic. Long story short, there is a need to go beyond types and allow users to add hints to global subprog arguments to define expectations. This patch adds such support for two initial special tags: - pointer to context; - non-null qualifier for generic pointer arguments. All of the above came up in practice already and seem generally useful additions. Non-null qualifier is an often requested feature, which currently has to be worked around by having unnecessary NULL checks inside subprogs even if we know that arguments are never NULL. Pointer to context was discussed earlier. As for implementation, we utilize btf_decl_tag attribute and set up an "arg:xxx" convention to specify argument hint. As such: - btf_decl_tag("arg:ctx") is a PTR_TO_CTX hint; - btf_decl_tag("arg:nonnull") marks pointer argument as not allowed to be NULL, making NULL check inside global subprog unnecessary. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: reuse subprog argument parsing logic for subprog call checksAndrii Nakryiko
Remove duplicated BTF parsing logic when it comes to subprog call check. Instead, use (potentially cached) results of btf_prepare_func_args() to abstract away expectations of each subprog argument in generic terms (e.g., "this is pointer to context", or "this is a pointer to memory of size X"), and then use those simple high-level argument type expectations to validate actual register states to check if they match expectations. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: move subprog call logic back to verifier.cAndrii Nakryiko
Subprog call logic in btf_check_subprog_call() currently has both a lot of BTF parsing logic (which is, presumably, what justified putting it into btf.c), but also a bunch of register state checks, some of each utilize deep verifier logic helpers, necessarily exported from verifier.c: check_ptr_off_reg(), check_func_arg_reg_off(), and check_mem_reg(). Going forward, btf_check_subprog_call() will have a minimum of BTF-related logic, but will get more internal verifier logic related to register state manipulation. So move it into verifier.c to minimize amount of verifier-specific logic exposed to btf.c. We do this move before refactoring btf_check_func_arg_match() to preserve as much history post-refactoring as possible. No functional changes. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: prepare btf_prepare_func_args() for handling static subprogsAndrii Nakryiko
Generalize btf_prepare_func_args() to support both global and static subprogs. We are going to utilize this property in the next patch, reusing btf_prepare_func_args() for subprog call logic instead of reparsing BTF information in a completely separate implementation. btf_prepare_func_args() now detects whether subprog is global or static makes slight logic adjustments for static func cases, like not failing fatally (-EFAULT) for conditions that are allowable for static subprogs. Somewhat subtle (but major!) difference is the handling of pointer arguments. Both global and static functions need to handle special context arguments (which are pointers to predefined type names), but static subprogs give up on any other pointers, falling back to marking subprog as "unreliable", disabling the use of BTF type information altogether. For global functions, though, we are assuming that such pointers to unrecognized types are just pointers to fixed-sized memory region (or error out if size cannot be established, like for `void *` pointers). This patch accommodates these small differences and sets up a stage for refactoring in the next patch, eliminating a separate BTF-based parsing logic in btf_check_func_arg_match(). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: reuse btf_prepare_func_args() check for main program BTF validationAndrii Nakryiko
Instead of btf_check_subprog_arg_match(), use btf_prepare_func_args() logic to validate "trustworthiness" of main BPF program's BTF information, if it is present. We ignored results of original BTF check anyway, often times producing confusing and ominously-sounding "reg type unsupported for arg#0 function" message, which has no apparent effect on program correctness and verification process. All the -EFAULT returning sanity checks are already performed in check_btf_info_early(), so there is zero reason to have this duplication of logic between btf_check_subprog_call() and btf_check_subprog_arg_match(). Dropping btf_check_subprog_arg_match() simplifies btf_check_func_arg_match() further removing `bool processing_call` flag. One subtle bit that was done by btf_check_subprog_arg_match() was potentially marking main program's BTF as unreliable. We do this explicitly now with a dedicated simple check, preserving the original behavior, but now based on well factored btf_prepare_func_args() logic. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: abstract away global subprog arg preparation logic from reg state setupAndrii Nakryiko
btf_prepare_func_args() is used to understand expectations and restrictions on global subprog arguments. But current implementation is hard to extend, as it intermixes BTF-based func prototype parsing and interpretation logic with setting up register state at subprog entry. Worse still, those registers are not completely set up inside btf_prepare_func_args(), requiring some more logic later in do_check_common(). Like calling mark_reg_unknown() and similar initialization operations. This intermixing of BTF interpretation and register state setup is problematic. First, it causes duplication of BTF parsing logic for global subprog verification (to set up initial state of global subprog) and global subprog call sites analysis (when we need to check that whatever is being passed into global subprog matches expectations), performed in btf_check_subprog_call(). Given we want to extend global func argument with tags later, this duplication is problematic. So refactor btf_prepare_func_args() to do only BTF-based func proto and args parsing, returning high-level argument "expectations" only, with no regard to specifics of register state. I.e., if it's a context argument, instead of setting register state to PTR_TO_CTX, we return ARG_PTR_TO_CTX enum for that argument as "an argument specification" for further processing inside do_check_common(). Similarly for SCALAR arguments, PTR_TO_MEM, etc. This allows to reuse btf_prepare_func_args() in following patches at global subprog call site analysis time. It also keeps register setup code consistently in one place, do_check_common(). Besides all this, we cache this argument specs information inside env->subprog_info, eliminating the need to redo these potentially expensive BTF traversals, especially if BPF program's BTF is big and/or there are lots of global subprog calls. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19Merge branch 'bpf-support-to-track-bpf_jne'Alexei Starovoitov
Menglong Dong says: ==================== bpf: support to track BPF_JNE For now, the reg bounds is not handled for BPF_JNE case, which can cause the failure of following case: /* The type of "a" is u32 */ if (a > 0 && a < 100) { /* the range of the register for a is [0, 99], not [1, 99], * and will cause the following error: * * invalid zero-sized read * * as a can be 0. */ bpf_skb_store_bytes(skb, xx, xx, a, 0); } In the code above, "a > 0" will be compiled to "if a == 0 goto xxx". In the TRUE branch, the dst_reg will be marked as known to 0. However, in the fallthrough(FALSE) branch, the dst_reg will not be handled, which makes the [min, max] for a is [0, 99], not [1, 99]. In the 1st patch, we reduce the range of the dst reg if the src reg is a const and is exactly the edge of the dst reg For BPF_JNE. In the 2nd patch, we remove reduplicated s32 casting in "crafted_cases". In the 3rd patch, we just activate the test case for this logic in range_cond(), which is committed by Andrii in the commit 8863238993e2 ("selftests/bpf: BPF register range bounds tester"). In the 4th patch, we convert the case above to a testcase and add it to verifier_bounds.c. Changes since v4: - add the 2nd patch - add "{U32, U32, {0, U32_MAX}, {U32_MAX, U32_MAX}}" that we missed in the 3rd patch - add some comments to the function that we add in the 4th patch - add reg_not_equal_const() in the 4th patch Changes since v3: - do some adjustment to the crafted cases that we added in the 2nd patch - add the 3rd patch Changes since v2: - fix a typo in the subject of the 1st patch - add some comments to the 1st patch, as Eduard advised - add some cases to the "crafted_cases" Changes since v1: - simplify the code in the 1st patch - introduce the 2nd patch for the testing ==================== Link: https://lore.kernel.org/r/20231219134800.1550388-1-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19selftests/bpf: add testcase to verifier_bounds.c for BPF_JNEMenglong Dong
Add testcase for the logic that the verifier tracks the BPF_JNE for regs. The assembly function "reg_not_equal_const()" and "reg_equal_const" that we add is exactly converted from the following case: u32 a = bpf_get_prandom_u32(); u64 b = 0; a %= 8; /* the "a > 0" here will be optimized to "a != 0" */ if (a > 0) { /* now the range of a should be [1, 7] */ bpf_skb_store_bytes(skb, 0, &b, a, 0); } Signed-off-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231219134800.1550388-5-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19selftests/bpf: activate the OP_NE logic in range_cond()Menglong Dong
The edge range checking for the registers is supported by the verifier now, so we can activate the extended logic in tools/testing/selftests/bpf/prog_tests/reg_bounds.c/range_cond() to test such logic. Besides, I added some cases to the "crafted_cases" array for this logic. These cases are mainly used to test the edge of the src reg and dst reg. All reg bounds testings has passed in the SLOW_TESTS mode: $ export SLOW_TESTS=1 && ./test_progs -t reg_bounds -j Summary: 65/18959832 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231219134800.1550388-4-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19selftests/bpf: remove reduplicated s32 casting in "crafted_cases"Menglong Dong
The "S32_MIN" is already defined with s32 casting, so there is no need to do it again. Signed-off-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231219134800.1550388-3-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bpf: make the verifier tracks the "not equal" for regsMenglong Dong
We can derive some new information for BPF_JNE in regs_refine_cond_op(). Take following code for example: /* The type of "a" is u32 */ if (a > 0 && a < 100) { /* the range of the register for a is [0, 99], not [1, 99], * and will cause the following error: * * invalid zero-sized read * * as a can be 0. */ bpf_skb_store_bytes(skb, xx, xx, a, 0); } In the code above, "a > 0" will be compiled to "jmp xxx if a == 0". In the TRUE branch, the dst_reg will be marked as known to 0. However, in the fallthrough(FALSE) branch, the dst_reg will not be handled, which makes the [min, max] for a is [0, 99], not [1, 99]. For BPF_JNE, we can reduce the range of the dst reg if the src reg is a const and is exactly the edge of the dst reg. Signed-off-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20231219134800.1550388-2-menglong8.dong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19bcachefs: fix BCH_FSCK_ERR enumKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19bcachefs: Fix bch2_alloc_sectors_start_trans() error handlingKent Overstreet
When we fail to allocate because of insufficient open buckets, we don't want to retry from the full set of devices - we just want to retry in blocking mode. But if the retry in blocking mode fails with a different error code, we end up squashing the -BCH_ERR_open_buckets_empty error with an error that makes us thing we won't be able to allocate (insufficient_devices) - which is incorrect when we didn't try to allocate from the full set of devices, and causes the write to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19module: Remove redundant TASK_UNINTERRUPTIBLEKevin Hao
TASK_KILLABLE already includes TASK_UNINTERRUPTIBLE, so there is no need to add a separate TASK_UNINTERRUPTIBLE. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-19bcachefs; guard against overflow in btree node splitKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19bcachefs: btree_node_u64s_with_format() takes nr keysKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19i2c: aspeed: Handle the coalesced stop conditions with the start conditions.Quan Nguyen
Some masters may drive the transfers with low enough latency between the nak/stop phase of the current command and the start/address phase of the following command that the interrupts are coalesced by the time we process them. Handle the stop conditions before processing SLAVE_MATCH to fix the complaints that sometimes occur below. "aspeed-i2c-bus 1e78a040.i2c-bus: irq handled != irq. Expected 0x00000086, but was 0x00000084" Fixes: f9eb91350bb2 ("i2c: aspeed: added slave support for Aspeed I2C driver") Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>