summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-29btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()Zhihao Cheng
Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable@vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-29selftests/bpf: Add test for trie_get_next_key()Byeonguk Jeong
Add a test for out-of-bounds write in trie_get_next_key() when a full path from root to leaf exists and bpf_map_get_next_key() is called with the leaf node. It may crashes the kernel on failure, so please run in a VM. Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/Zxx4ep78tsbeWPVM@localhost.localdomain Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29bpf: Fix out-of-bounds write in trie_get_next_key()Byeonguk Jeong
trie_get_next_key() allocates a node stack with size trie->max_prefixlen, while it writes (trie->max_prefixlen + 1) nodes to the stack when it has full paths from the root to leaves. For example, consider a trie with max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ... 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with .prefixlen = 8 make 9 nodes be written on the node stack with size 8. Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Tested-by: Hou Tao <houtao1@huawei.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/Zxx384ZfdlFYnz6J@localhost.localdomain Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-29io_uring/rsrc: kill io_charge_rsrc_node()Jens Axboe
It's only used from __io_req_set_rsrc_node(), and it takes both the ctx and node itself, while never using the ctx. Just open-code the basic refs++ in __io_req_set_rsrc_node() instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/splice: open code 2nd direct file assignmentJens Axboe
In preparation for not pinning the whole registered file table, open code the second potential direct file assignment. This will be handled by appropriate helpers in the future, for now just do it manually. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: specify freeptr usage for SLAB_TYPESAFE_BY_RCU io_kiocb cacheJens Axboe
Doesn't matter right now as there's still some bytes left for it, but let's prepare for the io_kiocb potentially growing and add a specific freeptr offset for it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/rsrc: move struct io_fixed_file to rsrc.h headerJens Axboe
There's no need for this internal structure to be visible, move it to the private rsrc.h header instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/nop: add support for testing registered files and buffersJens Axboe
Useful for testing performance/efficiency impact of registered files and buffers, vs (particularly) non-registered files. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: add support for fixed wait regionsJens Axboe
Generally applications have 1 or a few waits of waiting, yet they pass in a struct io_uring_getevents_arg every time. This needs to get copied and, in turn, the timeout value needs to get copied. Rather than do this for every invocation, allow the application to register a fixed set of wait regions that can simply be indexed when asking the kernel to wait on events. At ring setup time, the application can register a number of these wait regions and initialize region/index 0 upfront: struct io_uring_reg_wait *reg; reg = io_uring_setup_reg_wait(ring, nr_regions, &ret); /* set timeout and mark as set, sigmask/sigmask_sz as needed */ reg->ts.tv_sec = 0; reg->ts.tv_nsec = 100000; reg->flags = IORING_REG_WAIT_TS; where nr_regions >= 1 && nr_regions <= PAGE_SIZE / sizeof(*reg). The above initializes index 0, but 63 other regions can be initialized, if needed. Now, instead of doing: struct __kernel_timespec timeout = { .tv_nsec = 100000, }; io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL); to wait for events for each submit_and_wait, or just wait, operation, it can just reference the above region at offset 0 and do: io_uring_submit_and_wait_reg(ring, &cqe, nr, 0); to achieve the same goal of waiting 100usec without needing to copy both struct io_uring_getevents_arg (24b) and struct __kernel_timeout (16b) for each invocation. Struct io_uring_reg_wait looks as follows: struct io_uring_reg_wait { struct __kernel_timespec ts; __u32 min_wait_usec; __u32 flags; __u64 sigmask; __u32 sigmask_sz; __u32 pad[3]; __u64 pad2[2]; }; embedding the timeout itself in the region, rather than passing it as a pointer as well. Note that the signal mask is still passed as a pointer, both for compatability reasons, but also because there doesn't seem to be a lot of high frequency waits scenarios that involve setting and resetting the signal mask for each wait. The application is free to modify any region before a wait call, or it can use keep multiple regions with different settings to avoid needing to modify the same one for wait calls. Up to a page size of regions is mapped by default, allowing PAGE_SIZE / 64 available regions for use. The registered region must fit within a page. On a 4kb page size system, that allows for 64 wait regions if a full page is used, as the size of struct io_uring_reg_wait is 64b. The region registered must be aligned to io_uring_reg_wait in size. It's valid to register less than 64 entries. In network performance testing with zero-copy, this reduced the time spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4% to 0.3%. Wait regions are fixed for the lifetime of the ring - once registered, they are persistent until the ring is torn down. The regions support minimum wait timeout as well as the regular waits. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: change io_get_ext_arg() to use uaccess begin + endJens Axboe
In scenarios where a high frequency of wait events are seen, the copy of the struct io_uring_getevents_arg is quite noticeable in the profiles in terms of time spent. It can be seen as up to 3.5-4.5%. Rewrite the copy-in logic, saving about 0.5% of the time. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: switch struct ext_arg from __kernel_timespec to timespec64Jens Axboe
This avoids intermediate storage for turning a __kernel_timespec user pointer into an on-stack struct timespec64, only then to turn it into a ktime_t. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/sqpoll: wait on sqd->wait for thread parkingJens Axboe
io_sqd_handle_event() just does a mutex unlock/lock dance when it's supposed to park, somewhat relying on full ordering with the thread trying to park it which does a similar unlock/lock dance on sqd->lock. However, with adaptive spinning on mutexes, this can waste an awful lot of time. Normally this isn't very noticeable, as parking and unparking the thread isn't a common (or fast path) occurence. However, in testing ring resizing, it's testing exactly that, as each resize will require the SQPOLL to safely park and unpark. Have io_sq_thread_park() explicitly wait on sqd->park_pending being zero before attempting to grab the sqd->lock again. In a resize test, this brings the runtime of SQPOLL down from about 60 seconds to a few seconds, just like the !SQPOLL tests. And saves a ton of spinning time on the mutex, on both sides. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/register: add IORING_REGISTER_RESIZE_RINGSJens Axboe
Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls the available number of requests that can be submitted in a single system call, and there's rarely a need to change that. For the CQ ring, it's a different story. For most efficient use of io_uring, it's important that the CQ ring never overflows. This means that applications must size it for the worst case scenario, which can be wasteful. Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize the existing rings. It takes a struct io_uring_params argument, the same one which is used to setup the ring initially, and resizes rings according to the sizes given. Certain properties are always inherited from the original ring setup, like SQE128/CQE32 and other setup options. The implementation only allows flag associated with how the CQ ring is sized and clamped. Existing unconsumed SQE and CQE entries are copied as part of the process. If either the SQ or CQ resized destination ring cannot hold the entries already present in the source rings, then the operation is failed with -EOVERFLOW. Any register op holds ->uring_lock, which prevents new submissions, and the internal mapping holds the completion lock as well across moving CQ ring state. To prevent races between mmap and ring resizing, add a mutex that's solely used to serialize ring resize and mmap. mmap_sem can't be used here, as as fork'ed process may be doing mmaps on the ring as well. The ctx->resize_lock is held across mmap operations, and the resize will grab it before swapping out the already mapped new data. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/memmap: explicitly return -EFAULT for mmap on NULL ringsJens Axboe
The later mapping will actually check this too, but in terms of code clarify, explicitly check for whether or not the rings and sqes are valid during validation. That makes it explicit that if they are non-NULL, they are valid and can get mapped. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: abstract out a bit of the ring filling logicJens Axboe
Abstract out a io_uring_fill_params() helper, which fills out the necessary bits of struct io_uring_params. Add it to io_uring.h as well, in preparation for having another internal user of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: move max entry definition and ring sizing into headerJens Axboe
In preparation for needing this somewhere else, move the definitions for the maximum CQ and SQ ring size into io_uring.h. Make the rings_size() helper available as well, and have it take just the setup flags argument rather than the fill ring pointer. That's all that is needed. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/net: clean up io_msg_copy_hdrPavel Begunkov
Put sr->umsg into a local variable, so it doesn't repeat "sr->umsg->" for every field. It looks nicer, and likely without the patch it compiles into a bunch of umsg memory reads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26c2f30b491ea7998bfdb5bb290662572a61064d.1729607201.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/net: don't alias send user pointer readsPavel Begunkov
We keep user pointers in an union, which could be a user buffer or a user pointer to msghdr. What is confusing is that it potenitally reads and assigns sqe->addr as one type but then uses it as another via the union. Even more, it's not even consistent across copy and zerocopy versions. Make send and sendmsg setup helpers read sqe->addr and treat it as the right type from the beginning. The end goal would be to get rid of the use of struct io_sr_msg::umsg for send requests as we only need it at the prep side. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/685d788605f5d78af18802fcabf61ba65cfd8002.1729607201.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/net: don't store send address ptrPavel Begunkov
For non "msg" requests we copy the address at the prep stage and there is no need to store the address user pointer long term. Pass the SQE into io_send_setup(), let it parse it, and remove struct io_sr_msg addr addr_len fields. It saves some space and also less confusing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/db3dce544e17ca9d4b17d2506fbbac1da8a87824.1729607201.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/net: split send and sendmsg prep helpersPavel Begunkov
A preparation patch splitting io_sendmsg_prep_setup into two separate helpers for send and sendmsg variants. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1a2319471ba040e053b7f1d22f4af510d1118eca.1729607201.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: kill 'imu' from struct io_kiocbJens Axboe
It's no longer being used, remove it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/net: move send zc fixed buffer import to issue pathJens Axboe
Let's keep it close with the actual import, there's no reason to do this on the prep side. With that, we can drop one of the branches checking for whether or not IORING_RECVSEND_FIXED_BUF is set. As a side-effect, get rid of req->imu usage. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: remove 'issue_flags' argument for io_req_set_rsrc_node()Jens Axboe
All callers already hold the ring lock and hence are passing '0', remove the argument and the conditional locking that it controlled. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/rw: get rid of using req->imuJens Axboe
It's assigned in the same function that it's being used, get rid of it. A local variable will do just fine. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/uring_cmd: get rid of using req->imuJens Axboe
It's pretty pointless to use io_kiocb as intermediate storage for this, so split the validity check and the actual usage. The resource node is assigned upfront at prep time, to prevent it from going away. The actual import is never called with the ctx->uring_lock held, so grab it for the import. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/rsrc: don't assign bvec twice in io_import_fixed()Jens Axboe
iter->bvec is already set to imu->bvec - remove the one dead assignment and turn the other one into an addition instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: clean up cqe trace pointsPavel Begunkov
We have too many helpers posting CQEs, instead of tracing completion events before filling in a CQE and thus having to pass all the data, set the CQE first, pass it to the tracing helper and let it extract everything it needs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b83c1ca9ee5aed2df0f3bb743bf5ed699cce4c86.1729267437.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: static_key for !IORING_SETUP_NO_SQARRAYPavel Begunkov
IORING_SETUP_NO_SQARRAY should be preferred and used by default by liburing, optimise flag checking in io_get_sqe() with a static key. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c164a48542fbb080115e2377ecf160c758562742.1729264988.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: kill io_llist_xchgPavel Begunkov
io_llist_xchg is only used to set the list to NULL, which can also be done with llist_del_all(). Use the latter and kill io_llist_xchg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6765112680d2e86a58b76166b7513391ff4e5d7.1729264960.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: move cancel hash tables to kvmalloc/kvfreeJens Axboe
Convert to using kvmalloc/kfree() for the hash tables, and while at it, make it handle low memory situations better. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/cancel: get rid of init_hash_table() helperJens Axboe
All it does is initialize the lists, just move the INIT_HLIST_HEAD() into the one caller. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/poll: get rid of per-hashtable bucket locksJens Axboe
Any access to the table is protected by ctx->uring_lock now anyway, the per-bucket locking doesn't buy us anything. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/poll: get rid of io_poll_tw_hash_eject()Jens Axboe
It serves no purposes anymore, all it does is delete the hash list entry. task_work always has the ring locked. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/poll: get rid of unlocked cancel hashJens Axboe
io_uring maintains two hash lists of inflight requests: 1) ctx->cancel_table_locked. This is used when the caller has the ctx->uring_lock held already. This is only an issue side parameter, as removal or task_work will always have it held. 2) ctx->cancel_table. This is used when the issuer does NOT have the ctx->uring_lock held, and relies on the table spinlocks for access. However, it's pretty trivial to simply grab the lock in the one spot where we care about it, for insertion. With that, we can kill the unlocked table (and get rid of the _locked postfix for the other one). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/poll: remove 'ctx' argument from io_poll_req_delete()Jens Axboe
It's always req->ctx being used anyway, having this as a separate argument (that is then not even used) just makes it more confusing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/msg_ring: add support for sending a sync messageJens Axboe
Normally MSG_RING requires both a source and a destination ring. But some users don't always have a ring avilable to send a message from, yet they still need to notify a target ring. Add support for using io_uring_register(2) without having a source ring, using a file descriptor of -1 for that. Internally those are called blind registration opcodes. Implement IORING_REGISTER_SEND_MSG_RING as a blind opcode, which simply takes an sqe that the application can put on the stack and use the normal liburing helpers to initialize it. Then the app can call: io_uring_register(-1, IORING_REGISTER_SEND_MSG_RING, &sqe, 1); and get the same behavior in terms of the target, where a CQE is posted with the details given in the sqe. For now this takes a single sqe pointer argument, and hence arg must be set to that, and nr_args must be 1. Could easily be extended to take an array of sqes, but for now let's keep it simple. Link: https://lore.kernel.org/r/20240924115932.116167-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/msg_ring: refactor a few helper functionsJens Axboe
Mostly just to skip them taking an io_kiocb, rather just pass in the ctx and io_msg directly. In preparation for being able to issue a MSG_RING request without having an io_kiocb. No functional changes in this patch. Link: https://lore.kernel.org/r/20240924115932.116167-2-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: move ctx->evfd_last_cq_tail into io_ev_fdJens Axboe
Everything else about the io_uring eventfd support is nicely kept private to that code, except the cached_cq_tail tracking. With everything else in place, move io_eventfd_flush_signal() to using the ev_fd grab+release helpers, which then enables the direct use of io_ev_fd for this tracking too. Link: https://lore.kernel.org/r/20240921080307.185186-7-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: abstract out ev_fd grab + release helpersJens Axboe
In preparation for needing the ev_fd grabbing (and releasing) from another path, abstract out two helpers for that. Link: https://lore.kernel.org/r/20240921080307.185186-6-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: move trigger check into a helperJens Axboe
It's a bit hard to read what guards the triggering, move it into a helper and add a comment explaining it too. This additionally moves the ev_fd == NULL check in there as well. Link: https://lore.kernel.org/r/20240921080307.185186-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: move actual signaling part into separate helperJens Axboe
In preparation for using this from multiple spots, move the signaling into a helper. Link: https://lore.kernel.org/r/20240921080307.185186-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: check for the need to async notifier earlierJens Axboe
It's not necessary to do this post grabbing a reference. With that, we can drop the out goto path as well. Link: https://lore.kernel.org/r/20240921080307.185186-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/eventfd: abstract out ev_fd put helperJens Axboe
We call this in two spot, have a helper for it. In preparation for extending this part. Link: https://lore.kernel.org/r/20240921080307.185186-2-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29NFSD: Initialize struct nfsd4_copy earlierChuck Lever
Ensure the refcount and async_copies fields are initialized early. cleanup_async_copy() will reference these fields if an error occurs in nfsd4_copy(). If they are not correctly initialized, at the very least, a refcount underflow occurs. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-29wcd937x codec fixesMark Brown
Merge series from Alexey Klimov <alexey.klimov@linaro.org>: This sent as RFC because of the following: - regarding the LO switch patch. I've got info about that from two persons independently hence not sure what tags to put there and who should be the author. Please let me know if that needs to be corrected. - the wcd937x pdm watchdog is a problem for audio playback and needs to be fixed. The minimal fix would be to at least increase timeout value but it will still trigger in case of plenty of dbg messages or other delay-generating things. Unfortunately, I can't test HPHL/R outputs hence the patch is only for AUX. The other options would be introducing module parameter for debugging and using HOLD_OFF bit for that or adding Kconfig option. Alexey Klimov (2): ASoC: codecs: wcd937x: add missing LO Switch control ASoC: codecs: wcd937x: relax the AUX PDM watchdog sound/soc/codecs/wcd937x.c | 12 ++++++++++-- sound/soc/codecs/wcd937x.h | 4 ++++ 2 files changed, 14 insertions(+), 2 deletions(-) -- 2.45.2
2024-10-29net: usb: qmi_wwan: add Quectel RG650VBenoît Monin
Add support for Quectel RG650V which is based on Qualcomm SDX65 chip. The composition is DIAG / NMEA / AT / AT / QMI. T: Bus=02 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 4 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=2c7c ProdID=0122 Rev=05.15 S: Manufacturer=Quectel S: Product=RG650V-EU S: SerialNumber=xxxxxxx C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=9ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=9ms I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=9ms Signed-off-by: Benoît Monin <benoit.monin@gmx.fr> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241024151113.53203-1-benoit.monin@gmx.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29net/sched: sch_api: fix xa_insert() error path in tcf_block_get_ext()Vladimir Oltean
This command: $ tc qdisc replace dev eth0 ingress_block 1 egress_block 1 clsact Error: block dev insert failed: -EBUSY. fails because user space requests the same block index to be set for both ingress and egress. [ side note, I don't think it even failed prior to commit 913b47d3424e ("net/sched: Introduce tc block netdev tracking infra"), because this is a command from an old set of notes of mine which used to work, but alas, I did not scientifically bisect this ] The problem is not that it fails, but rather, that the second time around, it fails differently (and irrecoverably): $ tc qdisc replace dev eth0 ingress_block 1 egress_block 1 clsact Error: dsa_core: Flow block cb is busy. [ another note: the extack is added by me for illustration purposes. the context of the problem is that clsact_init() obtains the same &q->ingress_block pointer as &q->egress_block, and since we call tcf_block_get_ext() on both of them, "dev" will be added to the block->ports xarray twice, thus failing the operation: once through the ingress block pointer, and once again through the egress block pointer. the problem itself is that when xa_insert() fails, we have emitted a FLOW_BLOCK_BIND command through ndo_setup_tc(), but the offload never sees a corresponding FLOW_BLOCK_UNBIND. ] Even correcting the bad user input, we still cannot recover: $ tc qdisc replace dev swp3 ingress_block 1 egress_block 2 clsact Error: dsa_core: Flow block cb is busy. Basically the only way to recover is to reboot the system, or unbind and rebind the net device driver. To fix the bug, we need to fill the correct error teardown path which was missed during code movement, and call tcf_block_offload_unbind() when xa_insert() fails. [ last note, fundamentally I blame the label naming convention in tcf_block_get_ext() for the bug. The labels should be named after what they do, not after the error path that jumps to them. This way, it is obviously wrong that two labels pointing to the same code mean something is wrong, and checking the code correctness at the goto site is also easier ] Fixes: 94e2557d086a ("net: sched: move block device tracking into tcf_block_get/put_ext()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20241023100541.974362-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29netdevsim: Add trailing zero to terminate the string in ↵Zichen Xie
nsim_nexthop_bucket_activity_write() This was found by a static analyzer. We should not forget the trailing zero after copy_from_user() if we will further do some string operations, sscanf() in this case. Adding a trailing zero will ensure that the function performs properly. Fixes: c6385c0b67c5 ("netdevsim: Allow reporting activity on nexthop buckets") Signed-off-by: Zichen Xie <zichenxie0106@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241022171907.8606-1-zichenxie0106@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29selftests/bpf: Test with a very short loopEduard Zingerman
The test added is a simplified reproducer from syzbot report [1]. If verifier does not insert checkpoint somewhere inside the loop, verification of the program would take a very long time. This would happen because mark_chain_precision() for register r7 would constantly trace jump history of the loop back, processing many iterations for each mark_chain_precision() call. [1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/ Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241029172641.1042523-2-eddyz87@gmail.com
2024-10-29bpf: Force checkpoint when jmp history is too longEduard Zingerman
A specifically crafted program might trick verifier into growing very long jump history within a single bpf_verifier_state instance. Very long jump history makes mark_chain_precision() unreasonably slow, especially in case if verifier processes a loop. Mitigate this by forcing new state in is_state_visited() in case if current state's jump history is too long. Use same constant as in `skip_inf_loop_check`, but multiply it by arbitrarily chosen value 2 to account for jump history containing not only information about jumps, but also information about stack access. For an example of problematic program consider the code below, w/o this patch the example is processed by verifier for ~15 minutes, before failing to allocate big-enough chunk for jmp_history. 0: r7 = *(u16 *)(r1 +0);" 1: r7 += 0x1ab064b9;" 2: if r7 & 0x702000 goto 1b; 3: r7 &= 0x1ee60e;" 4: r7 += r1;" 5: if r7 s> 0x37d2 goto +0;" 6: r0 = 0;" 7: exit;" Perf profiling shows that most of the time is spent in mark_chain_precision() ~95%. The easiest way to explain why this program causes problems is to apply the following patch: diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0c216e71cec7..4b4823961abe 100644 \--- a/include/linux/bpf.h \+++ b/include/linux/bpf.h \@@ -1926,7 +1926,7 @@ struct bpf_array { }; }; -#define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */ +#define BPF_COMPLEXITY_LIMIT_INSNS 256 /* yes. 1M insns */ #define MAX_TAIL_CALL_CNT 33 /* Maximum number of loops for bpf_loop and bpf_iter_num. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f514247ba8ba..75e88be3bb3e 100644 \--- a/kernel/bpf/verifier.c \+++ b/kernel/bpf/verifier.c \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) skip_inf_loop_check: if (!force_new_state && env->jmps_processed - env->prev_jmps_processed < 20 && - env->insn_processed - env->prev_insn_processed < 100) + env->insn_processed - env->prev_insn_processed < 100) { + verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n", + env->insn_idx, + env->jmps_processed - env->prev_jmps_processed, + cur->jmp_history_cnt); add_new_state = false; + } goto miss; } /* If sl->state is a part of a loop and this loop's entry is a part of \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) if (!add_new_state) return 0; + verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n", + env->insn_idx); + /* There were no equivalent states, remember the current one. * Technically the current state is not proven to be safe yet, * but it will either reach outer most bpf_exit (which means it's safe) And observe verification log: ... is_state_visited: new checkpoint at 5, resetting env->jmps_processed 5: R1=ctx() R7=ctx(...) 5: (65) if r7 s> 0x37d2 goto pc+0 ; R7=ctx(...) 6: (b7) r0 = 0 ; R0_w=0 7: (95) exit from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0 6: R1=ctx() R7=ctx(...) R10=fp0 6: (b7) r0 = 0 ; R0_w=0 7: (95) exit is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74 from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0 1: R1=ctx() R7_w=scalar(...) R10=fp0 1: (07) r7 += 447767737 is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75 2: R7_w=scalar(...) 2: (45) if r7 & 0x702000 goto pc-2 ... mark_precise 152 steps for r7 ... 2: R7_w=scalar(...) is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75 1: (07) r7 += 447767737 is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76 2: R7_w=scalar(...) 2: (45) if r7 & 0x702000 goto pc-2 ... BPF program is too large. Processed 257 insn The log output shows that checkpoint at label (1) is never created, because it is suppressed by `skip_inf_loop_check` logic: a. When 'if' at (2) is processed it pushes a state with insn_idx (1) onto stack and proceeds to (3); b. At (5) checkpoint is created, and this resets env->{jmps,insns}_processed. c. Verification proceeds and reaches `exit`; d. State saved at step (a) is popped from stack and is_state_visited() considers if checkpoint needs to be added, but because env->{jmps,insns}_processed had been just reset at step (b) the `skip_inf_loop_check` logic forces `add_new_state` to false. e. Verifier proceeds with current state, which slowly accumulates more and more entries in the jump history. The accumulation of entries in the jump history is a problem because of two factors: - it eventually exhausts memory available for kmalloc() allocation; - mark_chain_precision() traverses the jump history of a state, meaning that if `r7` is marked precise, verifier would iterate ever growing jump history until parent state boundary is reached. (note: the log also shows a REG INVARIANTS VIOLATION warning upon jset processing, but that's another bug to fix). With this patch applied, the example above is rejected by verifier under 1s of time, reaching 1M instructions limit. The program is a simplified reproducer from syzbot report. Previous discussion could be found at [1]. The patch does not cause any changes in verification performance, when tested on selftests from veristat.cfg and cilium programs taken from [2]. [1] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@gmail.com/ [2] https://github.com/anakryiko/cilium Changelog: - v1 -> v2: - moved patch to bpf tree; - moved force_new_state variable initialization after declaration and shortened the comment. v1: https://lore.kernel.org/bpf/20241018020307.1766906-1-eddyz87@gmail.com/ Fixes: 2589726d12a1 ("bpf: introduce bounded loops") Reported-by: syzbot+7e46cdef14bf496a3ab4@syzkaller.appspotmail.com Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241029172641.1042523-1-eddyz87@gmail.com Closes: https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/