summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-24io_uring: banish non-hot data to end of io_ring_ctxPavel Begunkov
Let's move all slow path, setup/init and so on fields to the end of io_ring_ctx, that makes ctx reorganisation later easier. That includes, page arrays used only on tear down, CQ overflow list, old provided buffer caches and used by io-wq poll hashes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fc471b63925a0bf90a34943c4d36163c523cfb43.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: move non aligned field to the endPavel Begunkov
Move not cache aligned fields down in io_ring_ctx, should change anything, but makes further refactoring easier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/518e95d7888e9d481b2c5968dcf3f23db9ea47a5.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: add option to remove SQ indirectionPavel Begunkov
Not many aware, but io_uring submission queue has two levels. The first level usually appears as sq_array and stores indexes into the actual SQ. To my knowledge, no one has ever seriously used it, nor liburing exposes it to users. Add IORING_SETUP_NO_SQARRAY, when set we don't bother creating and using the sq_array and SQ heads/tails will be pointing directly into the SQ. Improves memory footprint, in term of both allocations as well as cache usage, and also should make io_get_sqe() less branchy in the end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ffa3268a5ef61d326201ff43a233315c96312e0.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: compact SQ/CQ heads/tailsPavel Begunkov
Queues heads and tails cache line aligned. That makes sq, cq taking 4 lines or 5 lines if we include the rest of struct io_rings (e.g. sq_flags is frequently accessed). Since modern io_uring is mostly single threaded, it doesn't make much send to spread them as such, it wastes space and puts additional pressure on caches. Put them all into a single line. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9c8deddf9a7ed32069235a530d1e117fb460bc4c.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: force inline io_fill_cqe_reqPavel Begunkov
There are only 2 callers of io_fill_cqe_req left, and one of them is extremely hot. Force inline the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ffce4fc5e3521966def848a4d930586dfe33ae11.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: merge iopoll and normal completion pathsPavel Begunkov
io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and make iopoll use __io_submit_flush_completions(), which also helps with inlining and other optimisations. For that, we need to first find all completed iopoll requests and splice them from the iopoll list and then pass it down. This adds one extra list traversal, which should be fine as requests will stay hot in cache. CQ locking is already conditional, introduce ->lockless_cq and skip locking for IOPOLL as it's protected by ->uring_lock. We also add a wakeup optimisation for IOPOLL to __io_cq_unlock_post(), so it works just like io_cqring_ev_posted_iopoll(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3840473f5e8a960de35b77292026691880f6bdbc.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: reorder cqring_flush and wakeupsPavel Begunkov
Unlike in the past, io_commit_cqring_flush() doesn't do anything that may need io_cqring_wake() to be issued after, all requests it completes will go via task_work. Do io_commit_cqring_flush() after io_cqring_wake() to clean up __io_cq_unlock_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed32dcfeec47e6c97bd6b18c152ddce5b218403f.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: optimise extra io_get_cqe null checkPavel Begunkov
If the cached cqe check passes in io_get_cqe*() it already means that the cqe we return is valid and non-zero, however the compiler is unable to optimise null checks like in io_fill_cqe_req(). Do a bit of trickery, return success/fail boolean from io_get_cqe*() and store cqe in the cqe parameter. That makes it do the right thing, erasing the check together with the introduced indirection. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/322ea4d3377d3d4efd8ae90ab8ed28a99f518210.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: refactor __io_get_cqe()Pavel Begunkov
Make __io_get_cqe simpler by not grabbing the cqe from refilled cached, but letting io_get_cqe() do it for us. That's cleaner and removes some duplication. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/74dc8fdf2657e438b2e05e1d478a3596924604e9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: simplify big_cqe handlingPavel Begunkov
Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid of ugly REQ_F_CQE32_INIT handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/447aa1b2968978c99e655ba88db536e903df0fe9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: cqe init hardeningPavel Begunkov
io_kiocb::cqe stores the completion info which we'll memcpy to userspace, and we rely on callbacks and other later steps to populate it with right values. We have never had problems with that, but it would still be safer to zero it on allocation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b16a3b64dde678686460d3c3792c3ba6d3d1bc7a.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-24io_uring: improve cqe !tracing hot pathPavel Begunkov
While looking at io_fill_cqe_req()'s asm I stumbled on our trace points turning into the chunk below: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, req->extra1, req->extra2); io_uring/io_uring.c:898: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, movq 232(%rbx), %rdi # req_44(D)->big_cqe.extra2, _5 movq 224(%rbx), %rdx # req_44(D)->big_cqe.extra1, _6 movl 84(%rbx), %r9d # req_44(D)->cqe.D.81184.flags, _7 movl 80(%rbx), %r8d # req_44(D)->cqe.res, _8 movq 72(%rbx), %rcx # req_44(D)->cqe.user_data, _9 movq 88(%rbx), %rsi # req_44(D)->ctx, _10 ./arch/x86/include/asm/jump_label.h:27: asm_volatile_goto("1:" 1:jmp .L1772 # objtool NOPs this # ... It does a jump_label for actual tracing, but those 6 moves will stay there in the hottest io_uring path. As an optimisation, add a trace_io_uring_complete_enabled() check, which is also uses jump_labels, it tricks the compiler into behaving. It removes the junk without changing anything else int the hot path. Note: apparently, it's not only me noticing it, and people are also working it around. We should remove the check when it's solved generically or rework tracing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/555d8312644b3776f4be7e23f9b92943875c4bc7.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-25Merge tag 'drm-intel-fixes-2023-08-24' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix power consumption at s2idle on DG2 (Anshuman) - Fix documentation build warning (Jani) - Fix Display HPD (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZOdPRFSJpo0ErPX/@intel.com
2023-08-24regulator: userspace-consumer: Drop event support for this cycleMark Brown
Drop commit 22475bcc2083 ("regulator: userspace-consumer: Add regulator event support") since Zev Weiss points out that it leaks the constants we use for notifications out as ABI which isn't ideal, we should have something more abstracted there. There's a definite need for this feature but it needs some more work on the interface. Signed-off-by: Mark Brown <broonie@kernel.org Link: https://lore.kernel.org/r/20230824-regulator-remove-status-sysfs-v1-1-554956e8c1ca@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org
2023-08-24merge mm-hotfixes-stable into mm-stable to pick up depended-upon changesAndrew Morton
2023-08-24shmem: fix smaps BUG sleeping while atomicHugh Dickins
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if need_resched(): "BUG: sleeping function called from invalid context". Since shmem_partial_swap_usage() is designed to count across a range, but smaps_pte_hole_lookup() only calls it for a single page slot, just break out of the loop on the last or only page, before checking need_resched(). Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24selftests: cachestat: catch failing fsync test on tmpfsAndre Przywara
The cachestat kselftest runs a test on a normal file, which is created temporarily in the current directory. Among the tests it runs there is a call to fsync(), which is expected to clean all dirty pages used by the file. However the tmpfs filesystem implements fsync() as noop_fsync(), so the call will not even attempt to clean anything when this test file happens to live on a tmpfs instance. This happens in an initramfs, or when the current directory is in /dev/shm or sometimes /tmp. To avoid this test failing wrongly, use statfs() to check which filesystem the test file lives on. If that is "tmpfs", we skip the fsync() test. Since the fsync test is only one part of the "normal file" test, we now execute this twice, skipping the fsync part on the first call. This way only the second test, including the fsync part, would be skipped. Link: https://lkml.kernel.org/r/20230821160534.3414911-3-andre.przywara@arm.com Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24selftests: cachestat: test for cachestat availabilityAndre Przywara
Patch series "selftests: cachestat: fix run on older kernels", v2. I ran all kernel selftests on some test machine, and stumbled upon cachestat failing (among others). These patches fix the run on older kernels and when the current directory is on a tmpfs instance. This patch (of 2): As cachestat is a new syscall, it won't be available on older kernels, for instance those running on a development machine. At the moment the test reports all tests as "not ok" in this case. Test for the cachestat syscall availability first, before doing further tests, and bail out early with a TAP SKIP comment. This also uses the opportunity to add the proper TAP headers, and add one check for proper error handling (illegal file descriptor). Link: https://lkml.kernel.org/r/20230821160534.3414911-1-andre.przywara@arm.com Link: https://lkml.kernel.org/r/20230821160534.3414911-2-andre.przywara@arm.com Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24maple_tree: disable mas_wr_append() when other readers are possibleLiam R. Howlett
The current implementation of append may cause duplicate data and/or incorrect ranges to be returned to a reader during an update. Although this has not been reported or seen, disable the append write operation while the tree is in rcu mode out of an abundance of caution. During the analysis of the mas_next_slot() the following was artificially created by separating the writer and reader code: Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last slot write is to start of slot store current contents in slot overwrite old end pivot mas_next_slot(): read end metadata read old end pivot return with incorrect range store new value Alternatively: Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last lost write to end of slot store value mas_next_slot(): read end metadata read old end pivot read new end pivot return with incorrect range set old end pivot There may be other accesses that are not safe since we are now updating both metadata and pointers, so disabling append if there could be rcu readers is the safest action. Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24madvise:madvise_free_pte_range(): don't use mapcount() against large folio ↵Yin Fengwei
for sharing check Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. It's not correct for large folios. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here. User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then. NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready. Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio ↵Yin Fengwei
for sharing check Commit fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to use a folio") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. It's not correct for large folios. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here. User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then. NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready. Link: https://lkml.kernel.org/r/20230808020917.2230692-3-fengwei.yin@intel.com Fixes: fc986a38b670 ("mm: huge_memory: convert madvise_free_huge_pmd to use a folio") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against ↵Yin Fengwei
large folio for sharing check Patch series "don't use mapcount() to check large folio sharing", v2. In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), folio_mapcount() is used to check whether the folio is shared. But it's not correct as folio_mapcount() returns total mapcount of large folio. Use folio_estimated_sharers() here as the estimated number is enough. This patchset will fix the cases: User space application call madvise() with MADV_FREE, MADV_COLD and MADV_PAGEOUT for specific address range. There are THP mapped to the range. Without the patchset, the THP is skipped. With the patch, the THP will be split and handled accordingly. David reported the cow self test skip some cases because of MADV_PAGEOUT skip THP: https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81 and I confirmed this patchset make it work again. This patch (of 3): Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. It's not correct for large folio. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here. User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then. NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready. Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24kallsyms: Add more debug output for selftestKees Cook
While debugging a recent kallsyms_selftest failure[1], I needed more details on what specifically was failing. This adds those details for each failure state that is checked. [1] https://lore.kernel.org/all/202308232200.1c932a90-oliver.sang@intel.com/ Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Yonghong Song <yhs@meta.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-24Merge tag 'nfsd-6.5-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Two last-minute one-liners for v6.5-rc. One got lost in the shuffle, and the other was reported just this morning" - Close race window when handling FREE_STATEID operations - Fix regression in /proc/fs/nfsd/v4_end_grace introduced in v6.5-rc" * tag 'nfsd-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix a thinko introduced by recent trace point changes nfsd: Fix race to FREE_STATEID and cl_revoked
2023-08-24Merge tag 'spi-fix-v6.5-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple more small driver specific fixes for v6.5. The device mode for Cadence had been broken by some recent updates done for host mode and large transfers for multi-byte words on stm32 had been broken by an API update in what I think was a rebasing incident" * tag 'spi-fix-v6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-cadence: Fix data corruption issues in slave mode spi: stm32: fix accidential revert to byte-sized transfer splitting
2023-08-24regulator: aw37503: Switch back to use struct i2c_driver's .probe()Uwe Kleine-König
struct i2c_driver::probe_new is about to go away. Switch the driver to use the probe callback with the same prototype. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de Link: https://lore.kernel.org/r/20230824195617.8888-1-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org
2023-08-24e1000e: Add support for the next LOM generationSasha Neftin
Add devices IDs for the next LOM generations that will be available on the next Intel Client platforms. This patch provides the initial support for these devices. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Decrease PTM short interval from 10 us to 1 usSasha Neftin
With the 10us interval, we were seeing PTM transactions take around 12us. Hardware team suggested this interval could be lowered to 1us which was confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took around 2us. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Add support for multiple in-flight TX timestampsVinicius Costa Gomes
Add support for using the four sets of timestamping registers that i225/i226 have available for TX. In some workloads, where multiple applications request hardware transmission timestamps, it was possible that some of those requests were denied because the only in use register was already occupied. This is also in preparation to future support for hardware timestamping with multiple PTP domains. With multiple domains chances of multiple TX timestamps being requested at the same time increase. Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% +1 +41 +73 13 1500 150 0.00% 0.00% 0.00% 100.00% +9 +49 +87 15 2250 225 0.00% 0.00% 0.00% 100.00% +9 +42 +79 13 3375 337 0.00% 0.00% 0.00% 100.00% +11 +46 +81 13 5062 506 0.00% 0.00% 0.00% 100.00% +7 +44 +80 13 7593 759 0.00% 0.00% 0.00% 100.00% +9 +44 +79 12 11389 1138 0.00% 0.00% 0.00% 100.00% +14 +51 +87 13 17083 1708 0.00% 0.00% 0.00% 100.00% +1 +41 +80 14 25624 2562 0.00% 0.00% 0.00% 100.00% +11 +50 +5107 51 38436 3843 0.00% 0.00% 0.00% 100.00% -2 +36 +7843 38 57654 5765 0.00% 0.00% 0.00% 100.00% +4 +42 +10503 69 86481 8648 0.00% 0.00% 0.00% 100.00% +11 +54 +5492 65 129721 12972 0.00% 0.00% 0.00% 100.00% +31 +2680 +6942 2606 194581 16384 16.79% 0.00% 0.87% 82.34% +73 +4444 +15879 3116 291871 16384 35.05% 0.00% 1.53% 63.42% +188 +5381 +17019 3035 437806 16384 54.95% 0.00% 2.55% 42.50% +233 +6302 +13885 2846 After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -20 +12 +43 13 1500 150 0.00% 0.00% 0.00% 100.00% -23 +18 +57 14 2250 225 0.00% 0.00% 0.00% 100.00% -2 +33 +67 13 3375 337 0.00% 0.00% 0.00% 100.00% +1 +38 +76 13 5062 506 0.00% 0.00% 0.00% 100.00% +9 +52 +93 14 7593 759 0.00% 0.00% 0.00% 100.00% +11 +47 +82 13 11389 1138 0.00% 0.00% 0.00% 100.00% -9 +27 +74 13 17083 1708 0.00% 0.00% 0.00% 100.00% -13 +25 +66 14 25624 2562 0.00% 0.00% 0.00% 100.00% -8 +28 +65 13 38436 3843 0.00% 0.00% 0.00% 100.00% -13 +28 +69 13 57654 5765 0.00% 0.00% 0.00% 100.00% -11 +32 +71 14 86481 8648 0.00% 0.00% 0.00% 100.00% +2 +44 +83 14 129721 12972 15.36% 0.00% 0.35% 84.29% -2 +2248 +22907 4252 194581 16384 42.98% 0.00% 1.98% 55.04% -4 +5278 +65039 5856 291871 16384 54.33% 0.00% 2.21% 43.46% -3 +6306 +22608 5665 We can see that with 4 registers, as expected, we are able to handle a increasing number of requests more consistently, but as soon as all registers are in use, the decrease in quality of service happens in a sharp step. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24riscv: Fix build errors using binutils2.37 toolchainsMingzheng Xing
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0, the following error occurs: Assembler messages: Error: cannot find default versions of the ISA extension `zicsr' Error: cannot find default versions of the ISA extension `zifencei' The above error originated from this commit of binutils[0], which has been resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2]. So fix this by change the GCC version in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0. Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1995608fbf6648fcee4e9e0c [0] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba712505dd598 [1] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e35ae9fd6671 [2] Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils") Reported-by: Conor Dooley <conor.dooley@microchip.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20230824190852.45470-1-xingmingzheng@iscas.ac.cn Closes: https://lore.kernel.org/all/20230823-captive-abdomen-befd942a4a73@wendy/ Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24sched/eevdf/doc: Modify the documented knob to base_slice_ns as wellShrikanth Hegde
After committing the scheduler to EEVDF, we renamed the 'min_granularity_ns' sysctl to 'base_slice_ns': e4ec3318a17f ("sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice") ... but we forgot to rename it in the documentation. Do that now. Fixes: e4ec3318a17f ("sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice") Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230824080342.543396-1-sshegde@linux.vnet.ibm.com
2023-08-24perf/x86/uncore: Remove unnecessary ?: operator around ↵Ilpo Järvinen
pcibios_err_to_errno() call If err == 0, pcibios_err_to_errno(err) returns 0 so the ?: construct can be removed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230824132832.78705-15-ilpo.jarvinen@linux.intel.com
2023-08-24Bluetooth: btusb: Do not call kfree_skb() under spin_lock_irqsave()Jinjie Ruan
It is not allowed to call kfree_skb() from hardware interrupt context or with hardware interrupts being disabled. So replace kfree_skb() with dev_kfree_skb_irq() under spin_lock_irqsave(). Compile tested only. Fixes: baac6276c0a9 ("Bluetooth: btusb: handle mSBC audio over USB Endpoints") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: btusb: Fix quirks table namingBastien Nocera
The quirks table was named "blacklist_table" which isn't a good description for that table as devices detected using it weren't ignored by the driver. Rename the table to match what it actually does. Signed-off-by: Bastien Nocera <hadess@hadess.net> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODEDLuiz Augusto von Dentz
This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate that LE Coded PHY shall not be used, it is then set for some Intel models that claim to support it but when used causes many problems. Cc: stable@vger.kernel.org # 6.4.y+ Link: https://github.com/bluez/bluez/issues/577 Link: https://github.com/bluez/bluez/issues/582 Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=dZKbRE1qxT1h4fFbQ@mail.gmail.com/T/# Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: btintel: Send new command for PPAGLokendra Singh
Added support for the new command opcode FE0B (HCI Intel PPAG Enable). btmon log: < HCI Command: Intel PPAG Enable (0x3f|0x020b) plen 4 Enable: 0x00000002 > HCI Event: Command Complete (0x0e) plen 4 Intel PPAG Enable (0x3f|0x020b) ncmd 1 Status: Success (0x00) Signed-off-by: Seema Sreemantha <seema.sreemantha@intel.com> Signed-off-by: Lokendra Singh <lokendra.singh@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: ISO: Add support for periodic adv reports processingClaudia Draghicescu
In the case of a Periodic Synchronized Receiver, the PA report received from a Broadcaster contains the BASE, which has information about codec and other parameters of a BIG. This isnformation is stored and the application can retrieve it using getsockopt(BT_ISO_BASE). Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24x86/platform/uv: Refactor code using deprecated strncpy() interface to use ↵Justin Stitt
strscpy() `strncpy` is deprecated for use on NUL-terminated destination strings [1]. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`! In this case, it means we can drop the `...-1` from: | strncpy(to, from, len-1); as well as remove the comment mentioning NUL-termination as `strscpy` implicitly grants us this behavior. There should be no functional change as I don't believe the padding from `strncpy` is needed here. If it turns out that the padding is necessary we should use `strscpy_pad` as a direct replacement. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Dimitri Sivanich <sivanich@hpe.com> Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Link: https://lore.kernel.org/r/20230822-strncpy-arch-x86-kernel-apic-x2apic_uv_x-v1-1-91d681d0b3f3@google.com
2023-08-24x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy()Justin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`! In this case, it is a simple swap from `strncpy` to `strscpy`. There is one slight difference, though. If NUL-padding is a functional requirement here we should opt for `strscpy_pad`. It seems like this shouldn't be needed as I see no obvious signs of any padding being required. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Link: https://lore.kernel.org/r/20230822-strncpy-arch-x86-kernel-hpet-v1-1-2c7d3be86f4a@google.com
2023-08-24Bluetooth: hci_conn: fail SCO/ISO via hci_conn_failed if ACL gone earlyPauli Virtanen
Not calling hci_(dis)connect_cfm before deleting conn referred to by a socket generally results to use-after-free. When cleaning up SCO connections when the parent ACL is deleted too early, use hci_conn_failed to do the connection cleanup properly. We also need to clean up ISO connections in a similar situation when connecting has started but LE Create CIS is not yet sent, so do it too here. Fixes: ca1fd42e7dbf ("Bluetooth: Fix potential double free caused by hci_conn_unlink") Reported-by: syzbot+cf54c1da6574b6c1b049@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-bluetooth/00000000000013b93805fbbadc50@google.com/ Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() ↵Justin Stitt
interfaces to use strscpy() Both `strncpy` and `strcpy` are deprecated for use on NUL-terminated destination strings [1]. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy` or `strcpy`! In this case, we can drop both the forced NUL-termination and the `... -1` from: | strncpy(arg, val, ACTION_LEN - 1); as `strscpy` implicitly has this behavior. Also include slight refactor to code removing possible new-line chars as per Yang Yang's work at [3]. This reduces code size and complexity by using more robust and better understood interfaces. Co-developed-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Dimitri Sivanich <sivanich@hpe.com> Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://lore.kernel.org/all/202212091545310085328@zte.com.cn/ [3] Link: https://github.com/KSPP/linux/issues/90 Link: https://lore.kernel.org/r/20230824-strncpy-arch-x86-platform-uv-uv_nmi-v2-1-e16d9a3ec570@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-08-24Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTHLuiz Augusto von Dentz
There a few instances still using HCI_MAX_AD_LENGTH instead of using max_adv_len which takes care of detecting what is the actual maximum length depending on if the controller supports EA or not. Fixes: 112b5090c219 ("Bluetooth: MGMT: Fix always using HCI_MAX_AD_LENGTH") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: ISO: Use defer setup to separate PA sync and BIG syncIulia Tanasescu
This commit implements defer setup support for the Broadcast Sink scenario: By setting defer setup on a broadcast socket before calling listen, the user is able to trigger the PA sync and BIG sync procedures separately. This is useful if the user first wants to synchronize to the periodic advertising transmitted by a Broadcast Source, and trigger the BIG sync procedure later on. If defer setup is set, once a PA sync established event arrives, a new hcon is created and notified to the ISO layer. A child socket associated with the PA sync connection will be added to the accept queue of the listening socket. Once the accept call returns the fd for the PA sync child socket, the user should call read on that fd. This will trigger the BIG create sync procedure, and the PA sync socket will become a listening socket itself. When the BIG sync established event is notified to the ISO layer, the bis connections will be added to the accept queue of the PA sync parent. The user should call accept on the PA sync socket to get the final bis connections. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: qca: add support for WCN7850Neil Armstrong
Add support for the WCN7850 Bluetooth chipset. Tested on the SM8550 QRD platform. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: qca: use switch case for soc type behaviorNeil Armstrong
Use switch/case to handle soc type specific behaviour, the permit dropping the qca_is_xxx() inline functions and make the code clearer and easier to update for new SoCs. Suggested-by: Konrad Dybcio <konrad.dybcio@linaro.org> Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24dt-bindings: net: bluetooth: qualcomm: document WCN7850 chipsetNeil Armstrong
Document the WCN7850 Bluetooth chipset. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: hci_conn: Fix sending BT_HCI_CMD_LE_CREATE_CONN_CANCELLuiz Augusto von Dentz
This fixes sending BT_HCI_CMD_LE_CREATE_CONN_CANCEL when hci_le_create_conn_sync has not been called because HCI_CONN_SCANNING has been clear too early before its cmd_sync callback has been run. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_syncLuiz Augusto von Dentz
Use-after-free can occur in hci_disconnect_all_sync if a connection is deleted by concurrent processing of a controller event. To prevent this the code now tries to iterate over the list backwards to ensure the links are cleanup before its parents, also it no longer relies on a cursor, instead it always uses the last element since hci_abort_conn_sync is guaranteed to call hci_conn_del. UAF crash log: ================================================================== BUG: KASAN: slab-use-after-free in hci_set_powered_sync (net/bluetooth/hci_sync.c:5424) [bluetooth] Read of size 8 at addr ffff888009d9c000 by task kworker/u9:0/124 CPU: 0 PID: 124 Comm: kworker/u9:0 Tainted: G W 6.5.0-rc1+ #10 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: hci0 hci_cmd_sync_work [bluetooth] Call Trace: <TASK> dump_stack_lvl+0x5b/0x90 print_report+0xcf/0x670 ? __virt_addr_valid+0xdd/0x160 ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth] kasan_report+0xa6/0xe0 ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth] ? __pfx_set_powered_sync+0x10/0x10 [bluetooth] hci_set_powered_sync+0x2c9/0x4a0 [bluetooth] ? __pfx_hci_set_powered_sync+0x10/0x10 [bluetooth] ? __pfx_lock_release+0x10/0x10 ? __pfx_set_powered_sync+0x10/0x10 [bluetooth] hci_cmd_sync_work+0x137/0x220 [bluetooth] process_one_work+0x526/0x9d0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x1a/0x90 worker_thread+0x92/0x630 ? __pfx_worker_thread+0x10/0x10 kthread+0x196/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Allocated by task 1782: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x8f/0xa0 hci_conn_add+0xa5/0xa80 [bluetooth] hci_bind_cis+0x881/0x9b0 [bluetooth] iso_connect_cis+0x121/0x520 [bluetooth] iso_sock_connect+0x3f6/0x790 [bluetooth] __sys_connect+0x109/0x130 __x64_sys_connect+0x40/0x50 do_syscall_64+0x60/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 695: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10a/0x180 __kmem_cache_free+0x14d/0x2e0 device_release+0x5d/0xf0 kobject_put+0xdf/0x270 hci_disconn_complete_evt+0x274/0x3a0 [bluetooth] hci_event_packet+0x579/0x7e0 [bluetooth] hci_rx_work+0x287/0xaa0 [bluetooth] process_one_work+0x526/0x9d0 worker_thread+0x92/0x630 kthread+0x196/0x1e0 ret_from_fork+0x2c/0x50 ================================================================== Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: btnxpuart: Improve inband Independent Reset handlingNeeraj Sanjay Kale
This improves the inband IR command handling for NXP BT chipsets. When the IR vendor command is received, the driver injects a HW error event, which causes a reset sequence in hci_error_reset(). The vendor IR command is sent to the controller while hci dev is been closed, and FW is re-downloaded when nxp_setup() is called during hci_dev_do_open(). The HCI_SETUP flag is set in nxp_hw_err() to make sure that nxp_setup() is been called during hci_dev_do_open(). This also makes the nxp_setup() and power save functions more generic. Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-24Bluetooth: btnxpuart: Add support for IW624 chipsetNeeraj Sanjay Kale
This adds support for NXP IW624 chipset in btnxpuart driver by adding FW name and bootloader signature. Based on the loader version bits 7:6 of the bootloader signature, the driver can choose between selecting secure and non-secure FW files. For cmd5 payload during FW download, this chip has addresses of few registers offset by 1, so added boot_reg_offset to handle the chip specific offset. Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>