summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-06net: axiemac: use a phandle to reference pcs_phyAndy Chiu
In some SGMII use cases where both a fixed link external PHY and the internal PCS/PMA PHY need to be configured, we should explicitly use a phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the driver would use "phy-handle" in the DT as the reference to both the external and the internal PCS/PMA PHY. In other cases where the core is connected to a SFP cage, we could still point phy-handle to the intenal PCS/PMA PHY, and let the driver connect to the SFP module, if exist, via phylink. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06dt-bindings: net: add pcs-handle attributeAndy Chiu
Document the new pcs-handle attribute to support connecting to an external PHY. For Xilinx's AXI Ethernet, this is used when the core operates in SGMII or 1000Base-X modes and links through the internal PCS/PMA PHY. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: axienet: factor out phy_node in struct axienet_localAndy Chiu
the struct member `phy_node` of struct axienet_local is not used by the driver anymore after initialization. It might be a remnent of old code and could be removed. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: axienet: setup mdio unconditionallyAndy Chiu
The call to axienet_mdio_setup should not depend on whether "phy-node" pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in SGMII or 100Base-X modes, move it into the if statement. And the next patch will remove `lp->phy_node` from driver's private structure and do an of_node_put on it right away after use since it is not used elsewhere. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: sfc: fix using uninitialized xdp tx_queueTaehee Yoo
In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP. When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs. In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers(). Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06rxrpc: fix a race in rxrpc_exit_net()Eric Dumazet
Current code can lead to the following race: CPU0 CPU1 rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live) rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer); timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay); cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free. syzbot report was: ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK> Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: openvswitch: fix leak of nested actionsIlya Maximets
While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions. Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory. For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc(): actions:clone(ct(commit),0) Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references. Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS. Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively. New build time assertion should protect us from this problem if new actions will be added in the future. Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too. Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entrySteve Capper
tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when updating the mmu_gather structure. Unfortunately on arm64 there are two additional huge page sizes that need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced due to the fact that the tlb structure hasn't been correctly updated by the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry. This patch adds inequality logic to the generic implementation of tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are effectively covered on arm64. Also, as well as ptes, pmds and puds; p4ds are now considered too. Reported-by: David Hildenbrand <david@redhat.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat.com/ Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-06arm64: alternatives: mark patch_alternative() as `noinstr`Joey Gouly
The alternatives code must be `noinstr` such that it does not patch itself, as the cache invalidation is only performed after all the alternatives have been applied. Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update() and get_alt_insn() with `__always_inline` since they are both only called through patch_alternative(). Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused a boot hang: [ 0.241121] CPU: All CPU(s) started at EL2 The alternatives code was patching the atomics in __tsan_read4() from LL/SC atomics to LSE atomics. The following fragment is using LL/SC atomics in the .text section: | <__tsan_unaligned_read4+304>: ldxr x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304> This LL/SC atomic sequence was to be replaced with LSE atomics. However since the alternatives code was instrumentable, __tsan_read4() was being called after only the first instruction was replaced, which led to the following code in memory: | <__tsan_unaligned_read4+304>: ldadd x5, x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304> This caused an infinite loop as the `stxr` instruction never completed successfully, so `w7` was always 0. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220405104733.11476-1-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-06ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item backMario Limonciello
CONFIG_SATA_LPM_MOBILE_POLICY was renamed to CONFIG_SATA_LPM_POLICY in commit 4dd4d3deb502 ("ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY configuration item"). This can potentially cause problems as users would invisibly lose configuration policy defaults when they built the new kernel. To avoid such problems, switch back to the old name (even if it's wrong). Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-04-05net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()Andrew Lunn
There is often not a MAC address available in an EEPROM accessible by Linux with Marvell devices. Instead the bootload has the MAC address and directly programs it into the hardware. So don't consider an error from of_get_mac_address() has fatal. However, the check was added for the case where there is a MAC address in an the EEPROM, but the EEPROM has not probed yet, and -EPROBE_DEFER is returned. In that case the error should be returned. So make the check specific to this error code. Cc: Mauri Sandberg <maukka@ext.kapsi.fi> Reported-by: Thomas Walther <walther-it@gmx.de> Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05net: openvswitch: don't send internal clone attribute to the userspace.Ilya Maximets
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI. The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows: # ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0) With the fix: # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0) Additionally fixed an incorrect attribute name in the comment. Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05net: micrel: Fix KS8851 KconfigHoratiu Vultur
KS8851 selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL, so make KS8851 also depend on PTP_1588_CLOCK_OPTIONAL. Fixes kconfig warning and build errors: WARNING: unmet direct dependencies detected for MICREL_PHY Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m] Selected by [y]: - KS8851 [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && SPI [=y] ld.lld: error: undefined symbol: ptp_clock_register referenced by micrel.c net/phy/micrel.o:(lan8814_probe) in archive drivers/built-in.a ld.lld: error: undefined symbol: ptp_clock_index referenced by micrel.c net/phy/micrel.o:(lan8814_ts_info) in archive drivers/built-in.a Reported-by: kernel test robot <lkp@intel.com> Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220405065936.4105272-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden. 2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects, from Vasily Averin. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: memcg accounting for dynamically allocated objects netfilter: bitwise: fix reduce comparisons ==================== Link: https://lore.kernel.org/r/20220405100923.7231-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05ACPI: bus: Eliminate acpi_bus_get_device()Rafael J. Wysocki
Replace the last instance of acpi_bus_get_device(), added recently by commit 87e59b36e5e2 ("spi: Support selection of the index of the ACPI Spi Resource before alloc"), with acpi_fetch_acpi_dev() and finally drop acpi_bus_get_device() that has no more users. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Mark Brown <broonie@kernel.org>
2022-04-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fixes and cleanups: - A couple of mlx5 fixes related to cvq - A couple of reverts dropping useless code (code that used it got reverted earlier)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: mlx5: synchronize driver status with CVQ vdpa: mlx5: prevent cvq work from hogging CPU Revert "virtio_config: introduce a new .enable_cbs method" Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
2022-04-05x86/speculation: Restore speculation related MSRs during S3 resumePawan Gupta
After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU. These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu(). During S3 resume, restore these MSRs for boot CPU when restoring its processor state. Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: Neelima Krishnan <neelima.krishnan@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-05x86/pm: Save the MSR validity status at context setupPawan Gupta
The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be. Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend. Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-05ice: clear cmd_type_offset_bsz for TX ringsMaciej Fijalkowski
Currently when XDP rings are created, each descriptor gets its DD bit set, which turns out to be the wrong approach as it can lead to a situation where more descriptors get cleaned than it was supposed to, e.g. when AF_XDP busy poll is run with a large batch size. In this situation, the driver would request for more buffers than it is able to handle. Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They should be initialized to zero instead. Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: xsk: fix VSI state check in ice_xsk_wakeup()Maciej Fijalkowski
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: synchronize_rcu() when terminating ringsMaciej Fijalkowski
Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05Merge tag 'for-5.18-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - prevent deleting subvolume with active swapfile - fix qgroup reserve limit calculation overflow - remove device count in superblock and its item in one transaction so they cant't get out of sync - skip defragmenting an isolated sector, this could cause some extra IO - unify handling of mtime/permissions in hole punch with fallocate - zoned mode fixes: - remove assert checking for only single mode, we have the DUP mode implemented - fix potential lockdep warning while traversing devices when checking for zone activation * tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: prevent subvol with swapfile from being deleted btrfs: do not warn for free space inode in cow_file_range btrfs: avoid defragging extents whose next extents are not targets btrfs: fix fallocate to use file_modified to update permissions consistently btrfs: remove device item and update super block in the same transaction btrfs: fix qgroup reserve overflow the qgroup limit btrfs: zoned: remove left over ASSERT checking for single profile btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
2022-04-05random: opportunistically initialize on /dev/urandom readsJason A. Donenfeld
In 6f98a4bfee72 ("random: block in /dev/urandom"), we tried to make a successful try_to_generate_entropy() call *required* if the RNG was not already initialized. Unfortunately, weird architectures and old userspaces combined in TCG test harnesses, making that change still not realistic, so it was reverted in 0313bc278dac ("Revert "random: block in /dev/urandom""). However, rather than making a successful try_to_generate_entropy() call *required*, we can instead make it *best-effort*. If try_to_generate_entropy() fails, it fails, and nothing changes from the current behavior. If it succeeds, then /dev/urandom becomes safe to use for free. This way, we don't risk the regression potential that led to us reverting the required-try_to_generate_entropy() call before. Practically speaking, this means that at least on x86, /dev/urandom becomes safe. Probably other architectures with working cycle counters will also become safe. And architectures with slow or broken cycle counters at least won't be affected at all by this change. So it may not be the glorious "all things are unified!" change we were hoping for initially, but practically speaking, it makes a positive impact. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-05ipv6: Fix stats accounting in ip6_pkt_dropDavid Ahern
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'. Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip <Filip.Pudak@windriver.com> Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com> Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05Merge branch 'ice-bug-fixes'Paolo Abeni
Tony Nguyen says: ==================== ice bug fixes Alice Michael says: There were a couple of bugs that have been found and fixed by Anatolii in the ice driver. First he fixed a bug on ring creation by setting the default value for the teid. Anatolli also fixed a bug with deleting queues in ice_vc_dis_qs_msg based on their enablement. --- v2: Remove empty lines between tags The following are changes since commit 458f5d92df4807e2a7c803ed928369129996bf96: sfc: Do not free an empty page_ring and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE ==================== Link: https://lore.kernel.org/r/20220404183548.3422851-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05ice: Do not skip not enabled queues in ice_vc_dis_qs_msgAnatolii Gerasymenko
Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05ice: Set txq_teid to ICE_INVAL_TEID on ring creationAnatolii Gerasymenko
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probeMiaoqian Lin
This node pointer is returned by of_find_compatible_node() with refcount incremented. Calling of_node_put() to aovid the refcount leak. Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05netfilter: nf_tables: memcg accounting for dynamically allocated objectsVasily Averin
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to use __GFP_ACCOUNT flag for objects that are dynamically allocated from the packet path. Such objects are allocated inside nft_expr_ops->init() callbacks executed in task context while processing netlink messages. In addition, this patch adds accounting to nft_set_elem_expr_clone() used for the same purposes. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-05Merge drm-misc/drm-misc-next-fixes into drm-misc-fixesMaxime Ripard
There were a few patches left in drm-misc-next-fixes, let's bring them into drm-misc-fixes. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2022-04-05Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard
Let's start the 5.18 fixes cycle. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2022-04-05dt-bindings: display: bridge: Drop requirement on input port for DSI devicesMaxime Ripard
MIPI-DSI devices, if they are controlled through the bus itself, have to be described as a child node of the controller they are attached to. Thus, there's no requirement on the controller having an OF-Graph output port to model the data stream: it's assumed that it would go from the parent to the child. However, some bridges controlled through the DSI bus still require an input OF-Graph port, thus requiring a controller with an OF-Graph output port. This prevents those bridges from being used with the controllers that do not have one without any particular reason to. Let's drop that requirement. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20220323154823.839469-1-maxime@cerno.tech
2022-04-05sctp: count singleton chunks in assoc user statsJamie Bainbridge
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner. These are all control chunks so they should be counted as such. Add counting of singleton chunks so they are properly accounted for. Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-04cifs: update internal module numberSteve French
To 2.36 Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: force new session setup and tcon for dfsPaulo Alcantara
Do not reuse existing sessions and tcons in DFS failover as it might connect to different servers and shares. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-05drm/nouveau/pmu: Add missing callbacks for Tegra devicesKarol Herbst
Fixes a crash booting on those platforms with nouveau. Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence") Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.17+ Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220322124800.2605463-1-kherbst@redhat.com
2022-04-04selftests/harness: Pass variant to teardownWillem de Bruijn
FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as "variant". In some cases, the variant will change the setup, such that expectations also change on teardown. Also pass variant to FIXTURE_TEARDOWN. The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP, right above. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201210231010.420298-1-willemdebruijn.kernel@gmail.com Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests/harness: Run TEARDOWN for ASSERT failuresKees Cook
The kselftest test harness has traditionally not run the registered TEARDOWN handler when a test encountered an ASSERT. This creates unexpected situations and tests need to be very careful about using ASSERT, which seems a needless hurdle for test writers. Because of the harness's design for optional failure handlers, the original implementation of ASSERT used an abort() to immediately stop execution, but that meant the context for running teardown was lost. Instead, use setjmp/longjmp so that teardown can be done. Failed SETUP routines continue to not be followed by TEARDOWN, though. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kselftest@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests: fix an unused variable warning in pidfd selftestAxel Rasmussen
I fixed a few warnings like this in commit e2aa5e650b07 ("selftests: fixup build warnings in pidfd / clone3 tests"), but I missed this one by mistake. Since this variable is unused, remove it. Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests: fix header dependency for pid_namespace selftestsAxel Rasmussen
The way the test target was defined before, when building with clang we get a command line like this: clang -Wall -Werror -g -I../../../../usr/include/ \ regression_enomem.c ../pidfd/pidfd.h -o regression_enomem This yields an error, because clang thinks we want to produce both a *.o file, as well as a precompiled header: clang: error: cannot specify -o when generating multiple output files gcc, for whatever reason, doesn't exhibit the same behavior which I suspect is why the problem wasn't noticed before. This can be fixed simply by using the LOCAL_HDRS infrastructure the selftests lib.mk provides. It does the right think and marks the target as depending on the header (so if the header changes, we rebuild), but it filters the header out of the compiler command line, so we don't get the error described above. Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests: x86: add 32bit build warnings for SUSEGeliang Tang
In order to successfully build all these 32bit tests, these 32bit gcc and glibc packages, named gcc-32bit and glibc-devel-static-32bit on SUSE, need to be installed. This patch added this information in warn_32bit_failure. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests/proc: fix array_size.cocci warningGuo Zhengkui
Fix the following coccicheck warning: tools/testing/selftests/proc/proc-pid-vm.c:371:26-27: WARNING: Use ARRAY_SIZE tools/testing/selftests/proc/proc-pid-vm.c:420:26-27: WARNING: Use ARRAY_SIZE It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04selftests/vDSO: fix array_size.cocci warningGuo Zhengkui
Fix the following coccicheck warning: tools/testing/selftests/vDSO/vdso_test_correctness.c:309:46-47: WARNING: Use ARRAY_SIZE tools/testing/selftests/vDSO/vdso_test_correctness.c:373:46-47: WARNING: Use ARRAY_SIZE It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04Revert "ACPI: processor: idle: Only flush cache on entering C3"Akihiko Odaki
Revert commit 87ebbb8c612b ("ACPI: processor: idle: Only flush cache on entering C3") that broke the assumptions of the acpi_idle_play_dead() callers. Namely, the CPU cache must always be flushed in acpi_idle_play_dead(), regardless of the target C-state that is going to be requested, because this is likely to be part of a CPU offline procedure or preparation for entering a system-wide sleep state and the lack of synchronization between the CPU cache and RAM may lead to problems going forward, for example when the CPU is brought back online. In particular, it breaks resume from suspend-to-RAM on Lenovo ThinkPad C13 which fails occasionally until the problematic commit is reverted. Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-04Documentation: kunit: fix path to .kunitconfig in start.rstDaniel Latypov
Commit ddbd60c779b4 ("kunit: use --build_dir=.kunit as default") changed the default --build_dir, which had the side effect of making `.kunitconfig` move to `.kunit/.kunitconfig`. However, the first few lines of kunit/start.rst never got updated, oops. Fix this by telling people to run kunit.py first, which will automatically generate the .kunit directory and .kunitconfig file, and then edit the file manually as desired. Reported-by: Yifan Yuan <alpc_metic@live.com> Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-04-04IB/rdmavt: add lock to call to rvt_error_qp to prevent a race conditionNiels Dossche
The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition. Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-04random: do not split fast init input in add_hwgenerator_randomness()Jan Varho
add_hwgenerator_randomness() tries to only use the required amount of input for fast init, but credits all the entropy, rather than a fraction of it. Since it's hard to determine how much entropy is left over out of a non-unformly random sample, either give it all to fast init or credit it, but don't attempt to do both. In the process, we can clean up the injection code to no longer need to return a value. Signed-off-by: Jan Varho <jan.varho@gmail.com> [Jason: expanded commit message] Fixes: 73c7733f122e ("random: do not throw away excess input to crng_fast_load") Cc: stable@vger.kernel.org # 5.17+, requires af704c856e88 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-04cifs: remove check of list iterator against head past the loop bodyJakob Koschel
When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04cifs: fix potential race with cifsd threadPaulo Alcantara
To avoid racing with demultiplex thread while it is handling data on socket, use cifs_signal_cifsd_for_reconnect() helper for marking current server to reconnect and let the demultiplex thread handle the rest. Fixes: dca65818c80c ("cifs: use a different reconnect helper for non-cifsd threads") Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVDMark Zhang
On the passive side when the disconnectReq event comes, if the current state is MRA_REP_RCVD, it needs to cancel the MAD before entering the DREQ_RCVD and TIMEWAIT states, otherwise the destroy_id may block until this mad will reach timeout. Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/75261c00c1d82128b1d981af9ff46e994186e621.1649062436.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>