summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-04net/mlx5e: Check the number of elements before walk TC rhashtableJianbo Liu
After IPSec TX tables are destroyed, the flow rules in TC rhashtable, which have the destination to IPSec, are restored to the original one, the uplink. However, when the device is in switchdev mode and unload driver with IPSec rules configured, TC rhashtable cleanup is done before IPSec cleanup, which means tc_ht->tbl is already freed when walking TC rhashtable, in order to restore the destination. So add the checking before walking to avoid unexpected behavior. Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Reduce eswitch mode_lock protection contextJianbo Liu
Currently eswitch mode_lock is so heavy, for example, it's locked during the whole process of the mode change, which may need to hold other locks. As the mode_lock is also used by IPSec to block mode and encap change now, it is easy to cause lock dependency. Since some of protections are also done by devlink lock, the eswitch mode_lock is not needed at those places, and thus the possibility of lockdep issue is reduced. Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Tidy up IPsec NAT-T SA discoveryLeon Romanovsky
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones. In case they arrive to RX, the SPI and ESP are located in inner header, while the check was performed on outer header instead. That wrong check caused to the situation where received rekeying request was missed and caused to rekey timeout, which "compensated" this failure by completing rekeying. Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Add IPsec and ASO syndromes check in HWPatrisious Haddad
After IPsec decryption it isn't enough to only check the IPsec syndrome but need to also check the ASO syndrome in order to verify that the operation was actually successful. Verify that both syndromes are actually zero and in case not drop the packet and increment the appropriate flow counter for the drop reason. Fixes: 6b5c45e16e43 ("net/mlx5e: Configure IPsec packet offload flow steering") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Remove exposure of IPsec RX flow steering structLeon Romanovsky
After previous commit, which unified various IPsec creation modes, there is no need to have struct mlx5e_ipsec_rx exposed in global IPsec header. Move it to ipsec_fs.c to be placed together with already existing struct mlx5e_ipsec_tx. Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Unify esw and normal IPsec status table creation/destructionPatrisious Haddad
Change normal IPsec flow to use the same creation/destruction functions for status flow table as that of ESW, which first of all refines the code to have less code duplication. And more importantly, the ESW status table handles IPsec syndrome checks at steering by HW, which is more efficient than the previous behaviour we had where it was copied to WQE meta data and checked by the driver. Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Ensure that IPsec sequence packet number starts from 1Leon Romanovsky
According to RFC4303, section "3.3.3. Sequence Number Generation", the first packet sent using a given SA will contain a sequence number of 1. However if user didn't set seq/oseq, the HW used zero as first sequence packet number. Such misconfiguration causes to drop of first packet if replay window protection was enabled in SA. To fix it, set sequence number to be at least 1. Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Honor user choice of IPsec replay window sizeLeon Romanovsky
Users can configure IPsec replay window size, but mlx5 driver didn't honor their choice and set always 32bits. Fix assignment logic to configure right size from the beginning. Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-05powerpc/ftrace: Fix stack teardown in ftrace_no_traceNaveen N Rao
Commit 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind") added use of a new stack frame on ftrace entry to fix stack unwind. However, the commit missed updating the offset used while tearing down the ftrace stack when ftrace is disabled. Fix the same. In addition, the commit missed saving the correct stack pointer in pt_regs. Update the same. Fixes: 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind") Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231130065947.2188860-1-naveen@kernel.org
2023-12-04net: stmmac: fix FPE events losingJianheng Zhang
The status bits of register MAC_FPE_CTRL_STS are clear on read. Using 32-bit read for MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket() clear the status bits. Then the stmmac interrupt handler missing FPE event status and leads to FPE handshaking failure and retries. To avoid clear status bits of MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket(), add fpe_csr to stmmac_fpe_cfg structure to cache the control bits of MAC_FPE_CTRL_STS and to avoid reading MAC_FPE_CTRL_STS in those methods. Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure") Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jianheng Zhang <Jianheng.Zhang@synopsys.com> Link: https://lore.kernel.org/r/CY5PR12MB637225A7CF529D5BE0FBE59CBF81A@CY5PR12MB6372.namprd12.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04octeontx2-pf: consider both Rx and Tx packet stats for adaptive interrupt ↵Naveen Mamindlapalli
coalescing The current adaptive interrupt coalescing code updates only rx packet stats for dim algorithm. This patch also updates tx packet stats which will be useful when there is only tx traffic. Also moved configuring hardware adaptive interrupt setting to driver dim callback. Fixes: 6e144b47f560 ("octeontx2-pf: Add support for adaptive interrupt coalescing") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://lore.kernel.org/r/20231201053330.3903694-1-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05Merge tag 'drm-intel-fixes-2023-12-01-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v6.7-rc4 #2: - d21a3962d304 ("drm/i915: Call intel_pre_plane_updates() also for pipes getting enabled") in the previous fixes pull depends on a change that wasn't included. Pick it up. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87fs0m48ol.fsf@intel.com
2023-12-04arcnet: restoring support for multiple Sohard Arcnet cardsThomas Reichinger
Probe of Sohard Arcnet cards fails, if 2 or more cards are installed in a system. See kernel log: [ 2.759203] arcnet: arcnet loaded [ 2.763648] arcnet:com20020: COM20020 chipset support (by David Woodhouse et al.) [ 2.770585] arcnet:com20020_pci: COM20020 PCI support [ 2.772295] com20020 0000:02:00.0: enabling device (0000 -> 0003) [ 2.772354] (unnamed net_device) (uninitialized): PLX-PCI Controls ... [ 3.071301] com20020 0000:02:00.0 arc0-0 (uninitialized): PCI COM20020: station FFh found at F080h, IRQ 101. [ 3.071305] com20020 0000:02:00.0 arc0-0 (uninitialized): Using CKP 64 - data rate 2.5 Mb/s [ 3.071534] com20020 0000:07:00.0: enabling device (0000 -> 0003) [ 3.071581] (unnamed net_device) (uninitialized): PLX-PCI Controls ... [ 3.369501] com20020 0000:07:00.0: Led pci:green:tx:0-0 renamed to pci:green:tx:0-0_1 due to name collision [ 3.369535] com20020 0000:07:00.0: Led pci:red:recon:0-0 renamed to pci:red:recon:0-0_1 due to name collision [ 3.370586] com20020 0000:07:00.0 arc0-0 (uninitialized): PCI COM20020: station E1h found at C000h, IRQ 35. [ 3.370589] com20020 0000:07:00.0 arc0-0 (uninitialized): Using CKP 64 - data rate 2.5 Mb/s [ 3.370608] com20020: probe of 0000:07:00.0 failed with error -5 commit 5ef216c1f848 ("arcnet: com20020-pci: add rotary index support") changes the device name of all COM20020 based PCI cards, even if only some cards support this: snprintf(dev->name, sizeof(dev->name), "arc%d-%d", dev->dev_id, i); The error happens because all Sohard Arcnet cards would be called arc0-0, since the Sohard Arcnet cards don't have a PLX rotary coder. I.e. EAE Arcnet cards have a PLX rotary coder, which sets the first decimal, ensuring unique devices names. This patch adds two new card feature flags to indicate which cards support LEDs and the PLX rotary coder. For EAE based cards the names still depend on the PLX rotary coder (untested, since missing EAE hardware). For Sohard based cards, this patch will result in devices being called arc0, arc1, ... (tested). Signed-off-by: Thomas Reichinger <thomas.reichinger@sohard.de> Fixes: 5ef216c1f848 ("arcnet: com20020-pci: add rotary index support") Link: https://lore.kernel.org/r/20231130113503.6812-1-thomas.reichinger@sohard.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04kernel/resource: Increment by align value in get_free_mem_region()Alison Schofield
Currently get_free_mem_region() searches for available capacity in increments equal to the region size being requested. This can cause the search to take giant steps through the resource leaving needless gaps and missing available space. Specifically 'cxl create-region' fails with ERANGE even though capacity of the given size and CXL's expected 256M x InterleaveWays alignment can be satisfied. Replace the total-request-size increment with a next alignment increment so that the next possible address is always examined for availability. Fixes: 14b80582c43e ("resource: Introduce alloc_free_mem_region()") Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com> Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231113221324.1118092-1-alison.schofield@intel.com Cc: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-04cxl: Add cxl_num_decoders_committed() usage to cxl_testDave Jiang
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") missed the conversion for cxl_test. Add usage of cxl_num_decoders_committed() to replace the open coding. Suggested-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/169929160525.824083.11813222229025394254.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-05Revert "greybus: gb-beagleplay: Ensure le for values in transport"Greg Kroah-Hartman
This reverts commit 52eb67861ebeb2110318bd9fe33d85ddcf92aac7. Turns out to not be correct, a new version will be generated later. Link: https://lore.kernel.org/r/20231204131008.384583-1-ayushdevel1325@gmail.com Cc: Ayush Singh <ayushdevel1325@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-04RDMA/irdma: Avoid free the non-cqp_request scratchShifeng Li
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/irdma: Fix support for 64k pagesMike Marciniszyn
Virtual QP and CQ require a 4K HW page size but the driver passes PAGE_SIZE to ib_umem_find_best_pgsz() instead. Fix this by using the appropriate 4k value in the bitmap passed to ib_umem_find_best_pgsz(). Fixes: 693a5386eff0 ("RDMA/irdma: Split mr alloc and free into new functions") Link: https://lore.kernel.org/r/20231129202143.1434-4-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/irdma: Ensure iWarp QP queue memory is OS paged alignedMike Marciniszyn
The SQ is shared for between kernel and used by storing the kernel page pointer and passing that to a kmap_atomic(). This then requires that the alignment is PAGE_SIZE aligned. Fix by adding an iWarp specific alignment check. Fixes: e965ef0e7b2c ("RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qp") Link: https://lore.kernel.org/r/20231129202143.1434-3-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgszMike Marciniszyn
64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ <--- Live HCA page |OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO| <--- offset | | <--- VA | MR data | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: 10c75ccb54e4 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-12-04packet: Move reference count in packet_sock to atomic_long_tDaniel Borkmann
In some potential instances the reference count on struct packet_sock could be saturated and cause overflows which gets the kernel a bit confused. To prevent this, move to a 64-bit atomic reference count on 64-bit architectures to prevent the possibility of this type to overflow. Because we can not handle saturation, using refcount_t is not possible in this place. Maybe someday in the future if it changes it could be used. Also, instead of using plain atomic64_t, use atomic_long_t instead. 32-bit machines tend to be memory-limited (i.e. anything that increases a reference uses so much memory that you can't actually get to 2**32 references). 32-bit architectures also tend to have serious problems with 64-bit atomics. Hence, atomic_long_t is the more natural solution. Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231201131021.19999-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: - A small fix for the dirty tracking self test to fail correctly if the code is buggy - Fix a tricky syzkaller race UAF with object reference counting * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not UAF during iommufd_put_object() iommufd: Add iommufd_ctx to iommufd_put_object() iommufd/selftest: Fix _test_mock_dirty_bitmaps()
2023-12-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull vdpa fixes from Michael Tsirkin: "Fixes in mlx5 and pds drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: pds_vdpa: set features order pds_vdpa: clear config callback when status goes to 0 pds_vdpa: fix up format-truncation complaint vdpa/mlx5: preserve CVQ vringh index
2023-12-04ASoC: qcom: Limit Digital gains on speakerMark Brown
Merge series from srinivas.kandagatla@linaro.org: Limit the speaker digital gains to 0dB so that the users will not damage them. Currently there is a limit in UCM, but this does not stop the user form changing the digital gains from command line. So limit this in driver which makes the speakers more safer without active speaker protection in place. Apart from this there is also a range check fix in snd_soc_limit_volume to allow setting this limit correctly. Tested on Lenovo X13s.
2023-12-04bcachefs: Don't run indirect extent trigger unless inserting/deletingKent Overstreet
This fixes a transaction path overflow reported in the snapshot deletion path, when moving extents to the correct snapshot. The root of the issue is that creating/deleting a reflink pointer can generate an unbounded number of updates, if it is allowed to reference an unbounded number of indirect extents; to prevent this, merging of reflink pointers has been disabled. But there's a hole, which is that copygc/rebalance may fragment existing extents in the course of moving them around, and if an indirect extent becomes too fragmented we'll then become unable to delete the reflink pointer. The eventual solution is going to be to tweak trigger handling so that we can process large reflink pointers incrementally when necessary, and notice that trigger updates don't need to be run for the part of the reflink pointer not changing. That is going to be a bigger project though, for another patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Convert compression_stats to for_each_btree_key2Kent Overstreet
for_each_btree_key2() runs each loop iteration in a btree transaction, and thus does not cause SRCU lock hold time problems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Fix bch2_extent_drop_ptrs() callKent Overstreet
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and non-extents iterators. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs: Fix a journal deadlock in replayKent Overstreet
Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04bcachefs; Don't use btree write buffer until journal replay is finishedKent Overstreet
The keys being replayed by journal replay have to be synchronized with updates by other threads that overwrite them. We rely on btree node locks for synchronizing - but since btree write buffer updates take no btree locks, that won't work. Instead, simply disable using the btree write buffer until journal replay is finished. This fixes a rare backpointers error in the merge_torture_flakey test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04arm64: dts: rockchip: Fix PCI node addresses on rk3399-gruRob Herring
The rk3399-gru PCI node addresses are wrong. In rk3399-gru-scarlet, the bus number in the address should be 0. This is because bus number assignment is dynamic and not known up front. For FDT, the bus number is simply ignored. In rk3399-gru-chromebook, the addresses are simply invalid. The first "reg" entry must be the configuration space for the device. The entry should be all 0s except for device/slot and function numbers. The existing 64-bit memory space (0x83000000) entries are not valid because they must have the BAR address in the lower byte of the first cell. Warnings for these are enabled by adding the missing 'device_type = "pci"' for the root port node. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231130191830.2424361-1-robh@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-04cifs: Fix flushing, invalidation and file size with FICLONEDavid Howells
Fix a number of issues in the cifs filesystem implementation of the FICLONE ioctl in cifs_remap_file_range(). This is analogous to the previously fixed bug in cifs_file_copychunk_range() and can share the helper functions. Firstly, the invalidation of the destination range is handled incorrectly: We shouldn't just invalidate the whole file as dirty data in the file may get lost and we can't just call truncate_inode_pages_range() to invalidate the destination range as that will erase parts of a partial folio at each end whilst invalidating and discarding all the folios in the middle. We need to force all the folios covering the range to be reloaded, but we mustn't lose dirty data in them that's not in the destination range. Further, we shouldn't simply round out the range to PAGE_SIZE at each end as cifs should move to support multipage folios. Secondly, there's an issue whereby a write may have extended the file locally, but not have been written back yet. This can leaves the local idea of the EOF at a later point than the server's EOF. If a clone request is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE (which gets translated to -EIO locally) if the clone source extends past the server's EOF. Fix this by: (0) Flush the source region (already done). The flush does nothing and the EOF isn't moved if the source region has no dirty data. (1) Move the EOF to the end of the source region if it isn't already at least at this point. If we can't do this, for instance if the server doesn't support it, just flush the entire source file. (2) Find the folio (if present) at each end of the range, flushing it and increasing the region-to-be-invalidated to cover those in their entirety. (3) Fully discard all the folios covering the range as we want them to be reloaded. (4) Then perform the extent duplication. Thirdly, set i_size after doing the duplicate_extents operation as this value may be used by various things internally. stat() hides the issue because setting ->time to 0 causes cifs_getatr() to revalidate the attributes. These were causing the cifs/001 xfstest to fail. Fixes: 04b38d601239 ("vfs: pull btrfs clone API to vfs layer") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org cc: Christoph Hellwig <hch@lst.de> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04cifs: Fix flushing, invalidation and file size with copy_file_range()David Howells
Fix a number of issues in the cifs filesystem implementation of the copy_file_range() syscall in cifs_file_copychunk_range(). Firstly, the invalidation of the destination range is handled incorrectly: We shouldn't just invalidate the whole file as dirty data in the file may get lost and we can't just call truncate_inode_pages_range() to invalidate the destination range as that will erase parts of a partial folio at each end whilst invalidating and discarding all the folios in the middle. We need to force all the folios covering the range to be reloaded, but we mustn't lose dirty data in them that's not in the destination range. Further, we shouldn't simply round out the range to PAGE_SIZE at each end as cifs should move to support multipage folios. Secondly, there's an issue whereby a write may have extended the file locally, but not have been written back yet. This can leaves the local idea of the EOF at a later point than the server's EOF. If a copy request is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE (which gets translated to -EIO locally) if the copy source extends past the server's EOF. Fix this by: (0) Flush the source region (already done). The flush does nothing and the EOF isn't moved if the source region has no dirty data. (1) Move the EOF to the end of the source region if it isn't already at least at this point. If we can't do this, for instance if the server doesn't support it, just flush the entire source file. (2) Find the folio (if present) at each end of the range, flushing it and increasing the region-to-be-invalidated to cover those in their entirety. (3) Fully discard all the folios covering the range as we want them to be reloaded. (4) Then perform the copy. Thirdly, set i_size after doing the copychunk_range operation as this value may be used by various things internally. stat() hides the issue because setting ->time to 0 causes cifs_getatr() to revalidate the attributes. These were causing the generic/075 xfstest to fail. Fixes: 620d8745b35d ("Introduce cifs_copy_file_range()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04arm64: dts: rockchip: drop interrupt-names property from rk3588s dfiHeiko Stuebner
The dfi binding does not specify interrupt names, with the interrupts just specifying channels 0-x. So drop the unspecified property. Fixes: 5a6976b1040a ("arm64: dts: rockchip: Add DFI to rk3588s") Reported-by: Jagan Teki <jagan@edgeble.ai> Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de> Link: https://lore.kernel.org/r/20231201134859.322491-1-heiko@sntech.de
2023-12-04Merge patch series "riscv: Fix issues with module loading"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: Module loading did not account for multiple threads concurrently loading modules. This patch fixes that issue. There is also a small patch to fix the type of a __le16 variable. * b4-shazam-merge: riscv: Correct type casting in module loading riscv: Safely remove entries from relocation list Link: https://lore.kernel.org/r/20231127-module_linking_freeing-v4-0-a2ca1d7027d0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-04riscv: Correct type casting in module loadingCharlie Jenkins
Use __le16 with le16_to_cpu. Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231127-module_linking_freeing-v4-2-a2ca1d7027d0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-04riscv: Safely remove entries from relocation listCharlie Jenkins
Use the safe versions of list and hlist iteration to safely remove entries from the module relocation lists. To allow mutliple threads to load modules concurrently, move relocation list pointers onto the stack rather than using global variables. Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations") Reported-by: Ron Economos <re@w6rz.net> Closes: https://lore.kernel.org/linux-riscv/444de86a-7e7c-4de7-5d1d-c1c40eefa4ba@w6rz.net Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231127-module_linking_freeing-v4-1-a2ca1d7027d0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-04nvme: fix deadlock between reset and scanBitao Hu
If controller reset occurs when allocating namespace, both nvme_reset_work and nvme_scan_work will hang, as shown below. Test Scripts: for ((t=1;t<=128;t++)) do nsid=`nvme create-ns /dev/nvme1 -s 14537724 -c 14537724 -f 0 -m 0 \ -d 0 | awk -F: '{print($NF);}'` nvme attach-ns /dev/nvme1 -n $nsid -c 0 done nvme reset /dev/nvme1 We will find that both nvme_reset_work and nvme_scan_work hung: INFO: task kworker/u249:4:17848 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:4 state:D stack: 0 pid:17848 ppid: 2 flags:0x00000028 Workqueue: nvme-reset-wq nvme_reset_work [nvme] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 blk_mq_freeze_queue_wait+0x84/0xc0 nvme_wait_freeze+0x40/0x64 [nvme_core] nvme_reset_work+0x1c0/0x5cc [nvme] process_one_work+0x1d8/0x4b0 worker_thread+0x230/0x440 kthread+0x114/0x120 INFO: task kworker/u249:3:22404 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:3 state:D stack: 0 pid:22404 ppid: 2 flags:0x00000028 Workqueue: nvme-wq nvme_scan_work [nvme_core] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 rwsem_down_write_slowpath+0x32c/0x98c down_write+0x70/0x80 nvme_alloc_ns+0x1ac/0x38c [nvme_core] nvme_validate_or_alloc_ns+0xbc/0x150 [nvme_core] nvme_scan_ns_list+0xe8/0x2e4 [nvme_core] nvme_scan_work+0x60/0x500 [nvme_core] process_one_work+0x1d8/0x4b0 worker_thread+0x260/0x440 kthread+0x114/0x120 INFO: task nvme:28428 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:nvme state:D stack: 0 pid:28428 ppid: 27119 flags:0x00000000 Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 schedule_timeout+0x160/0x194 do_wait_for_common+0xac/0x1d0 __wait_for_common+0x78/0x100 wait_for_completion+0x24/0x30 __flush_work.isra.0+0x74/0x90 flush_work+0x14/0x20 nvme_reset_ctrl_sync+0x50/0x74 [nvme_core] nvme_dev_ioctl+0x1b0/0x250 [nvme_core] __arm64_sys_ioctl+0xa8/0xf0 el0_svc_common+0x88/0x234 do_el0_svc+0x7c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 The reason for the hang is that nvme_reset_work occurs while nvme_scan_work is still running. nvme_scan_work may add new ns into ctrl->namespaces list after nvme_reset_work frozen all ns->q in ctrl->namespaces list. The newly added ns is not frozen, so nvme_wait_freeze will wait forever. Unfortunately, ctrl->namespaces_rwsem is held by nvme_reset_work, so nvme_scan_work will also wait forever. Now we are deadlocked! PROCESS1 PROCESS2 ============== ============== nvme_scan_work ... nvme_reset_work nvme_validate_or_alloc_ns nvme_dev_disable nvme_alloc_ns nvme_start_freeze down_write ... nvme_ns_add_to_ctrl_list ... up_write nvme_wait_freeze ... down_read nvme_alloc_ns blk_mq_freeze_queue_wait down_write Fix by marking the ctrl with say NVME_CTRL_FROZEN flag set in nvme_start_freeze and cleared in nvme_unfreeze. Then the scan can check it before adding the new namespace (under the namespaces_rwsem). Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04nvme: prevent potential spectre v1 gadgetNitesh Shetty
This patch fixes the smatch warning, "nvmet_ns_ana_grpid_store() warn: potential spectre issue 'nvmet_ana_group_enabled' [w] (local cap)" Prevent the contents of kernel memory from being leaked to user space via speculative execution by using array_index_nospec. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptionsShin'ichiro Kawasaki
Currently two similar config options NVME_HOST_AUTH and NVME_TARGET_AUTH have almost same descriptions. It is confusing to choose them in menuconfig. Improve the descriptions to distinguish them. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04nvme-ioctl: move capable() admin check to the endKeith Busch
This can be an expensive call on some kernel configs. Move it to the end after checking the cheaper ways to determine if the command is allowed. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04nvme: ensure reset state check orderingKeith Busch
A different CPU may be setting the ctrl->state value, so ensure proper barriers to prevent optimizing to a stale state. Normally it isn't a problem to observe the wrong state as it is merely advisory to take a quicker path during initialization and error recovery, but seeing an old state can report unexpected ENETRESET errors when a reset request was in fact successful. Reported-by: Minh Hoang <mh2022@meta.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de>
2023-12-04nvme: introduce helper function to get ctrl stateKeith Busch
The controller state is typically written by another CPU, so reading it should ensure no optimizations are taken. This is a repeated pattern in the driver, so start with adding a convenience function that returns the controller state with READ_ONCE(). Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04ASoC: qcom: sc8280xp: Limit speaker digital volumesSrinivas Kandagatla
Limit the speaker digital gains to 0dB so that the users will not damage them. Currently there is a limit in UCM, but this does not stop the user form changing the digital gains from command line. So limit this in driver which makes the speakers more safer without active speaker protection in place. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231204124736.132185-3-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-12-04ASoC: ops: add correct range check for limiting volumeSrinivas Kandagatla
Volume can have ranges that start with negative values, ex: -84dB to +40dB. Apply correct range check in snd_soc_limit_volume before setting the platform_max. Without this patch, for example setting a 0dB limit on a volume range of -84dB to +40dB would fail. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231204124736.132185-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-12-04HID: Add quirk for Labtec/ODDOR/aikeec handbrakeSebastian Parschauer
This device needs ALWAYS_POLL quirk, otherwise it keeps reconnecting indefinitely. It is a handbrake for sim racing detected as joystick. Reported and tested by GitHub user N0th1ngM4tt3rs. Link: https://github.com/sriemer/fix-linux-mouse issue 22 Signed-off-by: Sebastian Parschauer <s.parschauer@gmx.de> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-04HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[]Mario Limonciello
Users have reported problems with recent Lenovo laptops that contain an IDEA5002 I2C HID device. Reports include fans turning on and running even at idle and spurious wakeups from suspend. Presumably in the Windows ecosystem there is an application that uses the HID device. Maybe that puts it into a lower power state so it doesn't cause spurious events. This device doesn't serve any functional purpose in Linux as nothing interacts with it so blacklist it from being probed. This will prevent the GPIO driver from setting up the GPIO and the spurious interrupts and wake events will not occur. Cc: stable@vger.kernel.org # 6.1 Reported-and-tested-by: Marcus Aram <marcus+oss@oxar.nl> Reported-and-tested-by: Mark Herbert <mark.herbert42@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2812 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-04pinctrl: amd: Mask non-wake source pins with interrupt enabled at suspendMario Limonciello
If a pin isn't marked as a wake source processing any interrupts is just going to destroy battery life. The APU may wake up from a hardware sleep state to process the interrupt but not return control to the OS. Mask interrupt for all non-wake source pins at suspend. They'll be re-enabled at resume. Reported-and-tested-by: Marcus Aram <marcus+oss@oxar.nl> Reported-and-tested-by: Mark Herbert <mark.herbert42@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2812 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20231203032431.30277-3-mario.limonciello@amd.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-04virtio_blk: fix snprintf truncation compiler warningStefan Hajnoczi
Commit 4e0400525691 ("virtio-blk: support polling I/O") triggers the following gcc 13 W=1 warnings: drivers/block/virtio_blk.c: In function ‘init_vq’: drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~ drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~ drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a false positive because the lower bound -2147483648 is incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs < 65536. The code mixes int, unsigned short, and unsigned int types in addition to using "%d" for an unsigned value. Use unsigned short and "%u" consistently to solve the compiler warning. Cc: Suwan Kim <suwan.kim027@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/ Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20231204140743.1487843-1-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-04ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5Bin Li
Lenovo M90 Gen5 is equipped with ALC897, and it needs ALC897_FIXUP_HEADSET_MIC_PIN quirk to make its headset mic work. Signed-off-by: Bin Li <bin.li@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231204100450.642783-1-bin.li@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-04ALSA: hda/realtek: fix speakers on XPS 9530 (2023)Aleksandrs Vinarskis
XPS 9530 has 2 tweeters and 2 subwoofers powered by CS35L41 amplifier, SPI connected. For subwoofers to work, it requires both to enable amplifier support, and to enable output to subwoofers via 0x17 quirk (similalry to XPS 9510/9520). Signed-off-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231203233006.100558-1-alex.vinarskis@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>