summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-11mm/hugetlb: fix uffd-wp handling for migration entries in ↵David Hildenbrand
hugetlb_change_protection() We have to update the uffd-wp SWP PTE bit independent of the type of migration entry. Currently, if we're unlucky and we want to install/clear the uffd-wp bit just while we're migrating a read-only mapped hugetlb page, we would miss to set/clear the uffd-wp bit. Further, if we're processing a readable-exclusive migration entry and neither want to set or clear the uffd-wp bit, we could currently end up losing the uffd-wp bit. Note that the same would hold for writable migrating entries, however, having a writable migration entry with the uffd-wp bit set would already mean that something went wrong. Note that the change from !is_readable_migration_entry -> writable_migration_entry is harmless and actually cleaner, as raised by Miaohe Lin and discussed in [1]. [1] https://lkml.kernel.org/r/90dd6a93-4500-e0de-2bf0-bf522c311b0c@huawei.com Link: https://lkml.kernel.org/r/20221222205511.675832-3-david@redhat.com Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-11mm/hugetlb: fix PTE marker handling in hugetlb_change_protection()David Hildenbrand
Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()". Playing with virtio-mem and background snapshots (using uffd-wp) on hugetlb in QEMU, I managed to trigger a VM_BUG_ON(). Looking into the details, hugetlb_change_protection() seems to not handle uffd-wp correctly in all cases. Patch #1 fixes my test case. I don't have reproducers for patch #2, as it requires running into migration entries. I did not yet check in detail yet if !hugetlb code requires similar care. This patch (of 2): There are two problematic cases when stumbling over a PTE marker in hugetlb_change_protection(): (1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will end up in the "!huge_pte_none(pte)" case and mess up the PTE marker. (2) We unprotect a uffd-wp PTE marker: we will similarly end up in the "!huge_pte_none(pte)" case even though we cleared the PTE, because the "pte" variable is stale. We'll mess up the PTE marker. For example, if we later stumble over such a "wrongly modified" PTE marker, we'll treat it like a present PTE that maps some garbage page. This can, for example, be triggered by mapping a memfd backed by huge pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a) uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c) unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE) on that file range, we will run into a VM_BUG_ON: [ 195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 [ 195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff) [ 195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000 [ 195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page)) [ 195.039573] ------------[ cut here ]------------ [ 195.039574] kernel BUG at mm/rmap.c:1346! [ 195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1 [ 195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022 [ 195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550 [ 195.039588] Code: [...] [ 195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292 [ 195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000 [ 195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff [ 195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08 [ 195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0 [ 195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100 [ 195.039595] FS: 00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000 [ 195.039596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0 [ 195.039598] PKRU: 55555554 [ 195.039599] Call Trace: [ 195.039600] <TASK> [ 195.039602] __unmap_hugepage_range+0x33b/0x7d0 [ 195.039605] unmap_hugepage_range+0x55/0x70 [ 195.039608] hugetlb_vmdelete_list+0x77/0xa0 [ 195.039611] hugetlbfs_fallocate+0x410/0x550 [ 195.039612] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 195.039616] vfs_fallocate+0x12e/0x360 [ 195.039618] __x64_sys_fallocate+0x40/0x70 [ 195.039620] do_syscall_64+0x58/0x80 [ 195.039623] ? syscall_exit_to_user_mode+0x17/0x40 [ 195.039624] ? do_syscall_64+0x67/0x80 [ 195.039626] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 195.039628] RIP: 0033:0x7fc7b590651f [ 195.039653] Code: [...] [ 195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d [ 195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f [ 195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c [ 195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073 [ 195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000 [ 195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000 [ 195.039661] </TASK> Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over an exclusive marker. spin_unlock() + continue would get the job done. However, instead, make it clearer that there are no fall-through statements: we process each case (hwpoison, migration, marker, !none, none) and then unlock the page table to continue with the next PTE. Let's avoid "continue" statements and use a single spin_unlock() at the end. Link: https://lkml.kernel.org/r/20221222205511.675832-1-david@redhat.com Link: https://lkml.kernel.org/r/20221222205511.675832-2-david@redhat.com Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-12powerpc/64s/hash: Make stress_hpt_timer_fn() staticYang Yingliang
stress_hpt_timer_fn() is only used in hash_utils.c, make it static. Fixes: 6b34a099faa1 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
2023-01-11Merge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Make 'perf kmem' cope with the removal of some kmem:kmem_cache_alloc_node and kmem:kmalloc_node in the 11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of tracepoints") commit, making sure it works with Linux >= 6.2 as well as with older kernels where those tracepoints are present. - Also make it handle the new "node" kmem:kmalloc and kmem:kmem_cache_alloc tracepoint field introduced in that same commit. - Fix hardware tracing PMU address filter duplicate symbol selection, that was preventing to match with static functions with the same name present in different object files. - Fix regression on what linux/types.h file gets used to build the "BPF prologue" 'perf test' entry, the system one lacks the fmode_t definition used in this test, so provide that type in the test itself. - Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the user asks for linking with the libbpf package provided by the distro, then it has to be >= 0.8.0. Using the libbpf supplied with the kernel would be a fallback in that case. - Fix the build when libbpf isn't available or explicitly disabled via NO_LIBBPF=1. - Don't try to install libtraceevent plugins as its not anymore in the kernel sources and will thus always fail. * tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf auxtrace: Fix address filter duplicate symbol selection perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1 perf build: Fix build error when NO_LIBBPF=1 perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring perf kmem: Support legacy tracepoints perf build: Properly guard libbpf includes perf tests bpf prologue: Fix bpf-script-test-prologue test compile issue with clang
2023-01-11KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lockDavid Woodhouse
In commit 14243b387137a ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery") the clever version of me left some helpful notes for those who would come after him: /* * For the irqfd workqueue, using the main kvm->lock mutex is * fine since this function is invoked from kvm_set_irq() with * no other lock held, no srcu. In future if it will be called * directly from a vCPU thread (e.g. on hypercall for an IPI) * then it may need to switch to using a leaf-node mutex for * serializing the shared_info mapping. */ mutex_lock(&kvm->lock); In commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") the other version of me ran straight past that comment without reading it, and introduced a potential deadlock by taking vcpu->mutex and kvm->lock in the wrong order. Solve this as originally suggested, by adding a leaf-node lock in the Xen state rather than using kvm->lock for it. Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20230111180651.14394-4-dwmw2@infradead.org> [Rebase, add docs. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11x86/pci: Simplify is_mmconf_reserved() messagesBjorn Helgaas
is_mmconf_reserved() takes a "with_e820" parameter that only determines the message logged if it finds the MMCONFIG region is reserved. Pass the message directly, which will simplify a future patch that adds a new way of looking for that reservation. No functional change intended. Link: https://lore.kernel.org/r/20230110180243.1590045-2-helgaas@kernel.org Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2023-01-11docs/conf.py: Use about.html only in sidebar of alabaster themeAkira Yokosawa
"about.html" is available only for the alabaster theme [1]. Unconditionally putting it to html_sidebars prevents us from using other themes which respect html_sidebars. Remove about.html from the initialization and insert it at the front for the alabaster theme. Link: [1] https://alabaster.readthedocs.io/en/latest/installation.html#sidebars Fixes: d5389d3145ef ("docs: Switch the default HTML theme to alabaster") Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/4b162dbe-2a7f-1710-93e0-754cf8680aae@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-11s390: update defconfigsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11btrfs: zoned: enable metadata over-commit for non-ZNS setupNaohiro Aota
The commit 79417d040f4f ("btrfs: zoned: disable metadata overcommit for zoned") disabled the metadata over-commit to track active zones properly. However, it also introduced a heavy overhead by allocating new metadata block groups and/or flushing dirty buffers to release the space reservations. Specifically, a workload (write only without any sync operations) worsen its performance from 343.77 MB/sec (v5.19) to 182.89 MB/sec (v6.0). The performance is still bad on current misc-next which is 187.95 MB/sec. And, with this patch applied, it improves back to 326.70 MB/sec (+73.82%). This patch introduces a new fs_info->flag BTRFS_FS_NO_OVERCOMMIT to indicate it needs to disable the metadata over-commit. The flag is enabled when a device with max active zones limit is loaded into a file-system. Fixes: 79417d040f4f ("btrfs: zoned: disable metadata overcommit for zoned") CC: stable@vger.kernel.org # 6.0+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-11btrfs: qgroup: do not warn on record without old_roots populatedQu Wenruo
[BUG] There are some reports from the mailing list that since v6.1 kernel, the WARN_ON() inside btrfs_qgroup_account_extent() gets triggered during rescan: WARNING: CPU: 3 PID: 6424 at fs/btrfs/qgroup.c:2756 btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs] CPU: 3 PID: 6424 Comm: snapperd Tainted: P OE 6.1.2-1-default #1 openSUSE Tumbleweed 05c7a1b1b61d5627475528f71f50444637b5aad7 RIP: 0010:btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs] Call Trace: <TASK> btrfs_commit_transaction+0x30c/0xb40 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6] ? start_transaction+0xc3/0x5b0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6] btrfs_qgroup_rescan+0x42/0xc0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6] btrfs_ioctl+0x1ab9/0x25c0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6] ? __rseq_handle_notify_resume+0xa9/0x4a0 ? mntput_no_expire+0x4a/0x240 ? __seccomp_filter+0x319/0x4d0 __x64_sys_ioctl+0x90/0xd0 do_syscall_64+0x5b/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd9b790d9bf </TASK> [CAUSE] Since commit e15e9f43c7ca ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting"), if our qgroup is already in inconsistent state, we will no longer do the time-consuming backref walk. This can leave some qgroup records without a valid old_roots ulist. Normally this is fine, as btrfs_qgroup_account_extents() would also skip those records if we have NO_ACCOUNTING flag set. But there is a small window, if we have NO_ACCOUNTING flag set, and inserted some qgroup_record without a old_roots ulist, but then the user triggered a qgroup rescan. During btrfs_qgroup_rescan(), we firstly clear NO_ACCOUNTING flag, then commit current transaction. And since we have a qgroup_record with old_roots = NULL, we trigger the WARN_ON() during btrfs_qgroup_account_extents(). [FIX] Unfortunately due to the introduction of NO_ACCOUNTING flag, the assumption that every qgroup_record would have its old_roots populated is no longer correct. Fix the false alerts and drop the WARN_ON(). Reported-by: Lukas Straub <lukasstraub2@web.de> Reported-by: HanatoK <summersnow9403@gmail.com> Fixes: e15e9f43c7ca ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting") CC: stable@vger.kernel.org # 6.1 Link: https://lore.kernel.org/linux-btrfs/2403c697-ddaf-58ad-3829-0335fc89df09@gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-11btrfs: add extra error messages to cover non-ENOMEM errors from ↵Qu Wenruo
device_add_list() [BUG] When test case btrfs/219 (aka, mount a registered device but with a lower generation) failed, there is not any useful information for the end user to find out what's going wrong. The mount failure just looks like this: # mount -o loop /tmp/219.img2 /mnt/btrfs/ mount: /mnt/btrfs: mount(2) system call failed: File exists. dmesg(1) may have more information after failed mount system call. While the dmesg contains nothing but the loop device change: loop1: detected capacity change from 0 to 524288 [CAUSE] In device_list_add() we have a lot of extra checks to reject invalid cases. That function also contains the regular device scan result like the following prompt: BTRFS: device fsid 6222333e-f9f1-47e6-b306-55ddd4dcaef4 devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (3027) But unfortunately not all errors have their own error messages, thus if we hit something wrong in device_add_list(), there may be no error messages at all. [FIX] Add errors message for all non-ENOMEM errors. For ENOMEM, I'd say we're in a much worse situation, and there should be some OOM messages way before our call sites. CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-11KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering ruleDavid Woodhouse
Documentation/virt/kvm/locking.rst tells us that kvm->lock is taken outside vcpu->mutex. But that doesn't actually happen very often; it's only in some esoteric cases like migration with AMD SEV. This means that lockdep usually doesn't notice, and doesn't do its job of keeping us honest. Ensure that lockdep *always* knows about the ordering of these two locks, by briefly taking vcpu->mutex in kvm_vm_ioctl_create_vcpu() while kvm->lock is held. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20230111180651.14394-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()David Woodhouse
The kvm_xen_update_runstate_guest() function can be called when the vCPU is being scheduled out, from a preempt notifier. It *opportunistically* updates the runstate area in the guest memory, if the gfn_to_pfn_cache which caches the appropriate address is still valid. If there is *contention* when it attempts to obtain gpc->lock, then locking inside the priority inheritance checks may cause a deadlock. Lockdep reports: [13890.148997] Chain exists of: &gpc->lock --> &p->pi_lock --> &rq->__lock [13890.149002] Possible unsafe locking scenario: [13890.149003] CPU0 CPU1 [13890.149004] ---- ---- [13890.149005] lock(&rq->__lock); [13890.149007] lock(&p->pi_lock); [13890.149009] lock(&rq->__lock); [13890.149011] lock(&gpc->lock); [13890.149013] *** DEADLOCK *** In the general case, if there's contention for a read lock on gpc->lock, that's going to be because something else is either invalidating or revalidating the cache. Either way, we've raced with seeing it in an invalid state, in which case we would have aborted the opportunistic update anyway. So in the 'atomic' case when called from the preempt notifier, just switch to using read_trylock() and avoid the PI handling altogether. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20230111180651.14394-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11KVM: x86/xen: Fix lockdep warning on "recursive" gpc lockingDavid Woodhouse
In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same time: /* * The guest's runstate_info is split across two pages and we * need to hold and validate both GPCs simultaneously. We can * declare a lock ordering GPC1 > GPC2 because nothing else * takes them more than one at a time. */ However, we forgot to tell lockdep. Do so, by setting a subclass on the first lock before taking the second. Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20230111180651.14394-1-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11Merge tag 'kvmarm-fixes-6.2-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 6.2, take #1 - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty step for not being able to correctly implement the vgic SEIS feature, just liek the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui
2023-01-11Documentation: kvm: fix SRCU locking order docsPaolo Bonzini
kvm->srcu is taken in KVM_RUN and several other vCPU ioctls, therefore vcpu->mutex is susceptible to the same deadlock that is documented for kvm->slots_lock. The same holds for kvm->lock, since kvm->lock is held outside vcpu->mutex. Fix the documentation and rearrange it to highlight the difference between these locks and kvm->slots_arch_lock, and how kvm->slots_arch_lock can be useful while processing a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bitsPablo Neira Ayuso
If the offset + length goes over the ethernet + vlan header, then the length is adjusted to copy the bytes that are within the boundaries of the vlan_ethhdr scratchpad area. The remaining bytes beyond ethernet + vlan header are copied directly from the skbuff data area. Fix incorrect arithmetic operator: subtract, not add, the size of the vlan header in case of double-tagged packets to adjust the length accordingly to address CVE-2023-0179. Reported-by: Davide Ornaghi <d.ornaghi97@gmail.com> Fixes: f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-11netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.Gavrilov Ilia
When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of an arithmetic expression 2 << (netmask - mask_bits - 1) is subject to overflow due to a failure casting operands to a larger data type before performing the arithmetic. Note that it's harmless since the value will be checked at the next step. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9fed748185a ("netfilter: ipset: Check and reject crazy /0 input parameters") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-11selftests: netfilter: fix transaction test script timeout handlingFlorian Westphal
The kselftest framework uses a default timeout of 45 seconds for all test scripts. Increase the timeout to two minutes for the netfilter tests, this should hopefully be enough, Make sure that, should the script be canceled, the net namespace and the spawned ping instances are removed. Fixes: 25d8bcedbf43 ("selftests: add script to stress-test nft packet path vs. control plane") Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-11Merge branch ↵Jakub Kicinski
'dt-bindings-first-batch-of-dt-schema-conversions-for-amlogic-meson-bindings' Neil Armstrong says: ==================== dt-bindings: first batch of dt-schema conversions for Amlogic Meson bindings Batch conversion of the following bindings: [...] - mdio-mux-meson-g12a.txt ==================== Link: https://lore.kernel.org/r/20221117-b4-amlogic-bindings-convert-v2-0-36ad050bb625@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11dt-bindings: net: convert mdio-mux-meson-g12a.txt to dt-schemaNeil Armstrong
Convert MDIO bus multiplexer/glue of Amlogic G12a SoC family bindings to dt-schema. Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11perf auxtrace: Fix address filter duplicate symbol selectionAdrian Hunter
When a match has been made to the nth duplicate symbol, return success not error. Example: Before: $ cat file.c cat: file.c: No such file or directory $ cat file1.c #include <stdio.h> static void func(void) { printf("First func\n"); } void other(void); int main() { func(); other(); return 0; } $ cat file2.c #include <stdio.h> static void func(void) { printf("Second func\n"); } void other(void) { func(); } $ gcc -Wall -Wextra -o test file1.c file2.c $ perf record -e intel_pt//u --filter 'filter func @ ./test' -- ./test Multiple symbols with name 'func' #1 0x1149 l func which is near main #2 0x1179 l func which is near other Disambiguate symbol name by inserting #n after the name e.g. func #2 Or select a global symbol by inserting #0 or #g or #G Failed to parse address filter: 'filter func @ ./test' Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>] Where multiple filters are separated by space or comma. $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test Failed to parse address filter: 'filter func #2 @ ./test' Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>] Where multiple filters are separated by space or comma. After: $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test First func Second func [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data ] $ perf script --itrace=b -Ftime,flags,ip,sym,addr --ns 1231062.526977619: tr strt 0 [unknown] => 558495708179 func 1231062.526977619: tr end call 558495708188 func => 558495708050 _init 1231062.526979286: tr strt 0 [unknown] => 55849570818d func 1231062.526979286: tr end return 55849570818f func => 55849570819d other Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Reported-by: Dmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230110185659.15979-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-11drm/i915/gt: Cover rest of SVG unit MCR registersGustavo Sousa
CHICKEN_RASTER_{1,2} got overlooked with the move done in commit a9e69428b1b4 ("drm/i915: Define MCR registers explicitly"). Registers from the SVG unit became multicast as of Xe_HP graphics. BSpec: 66534 Fixes: a9e69428b1b4 ("drm/i915: Define MCR registers explicitly") Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230105133701.19556-1-gustavo.sousa@intel.com (cherry picked from commit 10903b0a0f4d4964b352fa3df12d3d2ef5fb7a3b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-01-11KVM: s390: interrupt: use READ_ONCE() before cmpxchg()Heiko Carstens
Use READ_ONCE() before cmpxchg() to prevent that the compiler generates code that fetches the to be compared old value several times from memory. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()Heiko Carstens
Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only dereferenced once by using READ_ONCE(). Otherwise the compiler could generate incorrect code. Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11s390/cpum_sf: add READ_ONCE() semantics to compare and swap loopsHeiko Carstens
The current cmpxchg_double() loops within the perf hw sampling code do not have READ_ONCE() semantics to read the old value from memory. This allows the compiler to generate code which reads the "old" value several times from memory, which again allows for inconsistencies. For example: /* Reset trailer (using compare-double-and-swap) */ do { te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK; te_flags |= SDB_TE_ALERT_REQ_MASK; } while (!cmpxchg_double(&te->flags, &te->overflow, te->flags, te->overflow, te_flags, 0ULL)); The compiler could generate code where te->flags used within the cmpxchg_double() call may be refetched from memory and which is not necessarily identical to the previous read version which was used to generate te_flags. Which in turn means that an incorrect update could happen. Fix this by adding READ_ONCE() semantics to all cmpxchg_double() loops. Given that READ_ONCE() cannot generate code on s390 which atomically reads 16 bytes, use a private compare-and-swap-double implementation to achieve that. Also replace cmpxchg_double() with the private implementation to be able to re-use the old value within the loops. As a side effect this converts the whole code to only use bit fields to read and modify bits within the hws trailer header. Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333 Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-11spi: Merge rename of spi-cs-setup-ns DT propertyMark Brown
The newly added spi-cs-setup-ns doesn't really fit with the existing property names for delays, rename it so that it does before it makes it into a release and becomes ABI.
2023-01-11spi: spidev: remove debug messages that access spidev->spi without lockingBartosz Golaszewski
The two debug messages in spidev_open() dereference spidev->spi without taking the lock and without checking if it's not null. This can lead to a crash. Drop the messages as they're not needed - the user-space will get informed about ENOMEM with the syscall return value. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230106100719.196243-2-brgl@bgdev.pl Signed-off-by: Mark Brown <broonie@kernel.org>
2023-01-11spi: spidev: fix a race condition when accessing spidev->spiBartosz Golaszewski
There's a spinlock in place that is taken in file_operations callbacks whenever we check if spidev->spi is still alive (not null). It's also taken when spidev->spi is set to NULL in remove(). This however doesn't protect the code against driver unbind event while one of the syscalls is still in progress. To that end we need a lock taken continuously as long as we may still access spidev->spi. As both the file ops and the remove callback are never called from interrupt context, we can replace the spinlock with a mutex. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230106100719.196243-1-brgl@bgdev.pl Signed-off-by: Mark Brown <broonie@kernel.org>
2023-01-11Merge tag 'mlx5-fixes-2023-01-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-fixes-2023-01-09
2023-01-11ipv6: raw: Deduct extension header length in rawv6_push_pending_framesHerbert Xu
The total cork length created by ip6_append_data includes extension headers, so we must exclude them when comparing them against the IPV6_CHECKSUM offset which does not include extension headers. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drm/nouveau: Remove file nouveau_fbcon.cThomas Zimmermann
Commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers") converted nouveau to generic fbdev emulation. The driver's internal implementation later got accidentally restored during a merge commit. Remove the file from the driver. No functional changes. v2: * point Fixes tag to merge commit (Alex) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Fixes: 4e291f2f5853 ("Merge tag 'drm-misc-next-2022-11-10-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next") Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20230110123526.28770-1-tzimmermann@suse.de
2023-01-11Merge tag 'mlx5-updates-2023-01-10' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2023-01-10 1) From Gal: Add debugfs entries for netdev nic driver - ktls, flow steering and hairpin info - useful for debug and performance analysis - e.g hairpin queue attributes, dump ktls tx pool size, etc 2) From Maher: Update shared buffer configuration on PFC commands 2.1) For every change of buffer's headroom, recalculate the size of shared buffer to be equal to "total_buffer_size" - "new_headroom_size". The new shared buffer size will be split in ratio of 3:1 between lossy and lossless pools, respectively. 2.2) For each port buffer change, count the number of lossless buffers. If there is only one lossless buffer, then set its lossless pool usage threshold to be infinite. Otherwise, if there is more than one lossless buffer, set a usage threshold for each lossless buffer. While at it, add more verbosity to debug prints when handling user commands, to assist in future debug. 3) From Tariq: Throttle high rate FW commands 4) From Shay: Properly initialize management PF 5) Various cleanup patches
2023-01-11Merge branch 'NCN26000-PLCA-RS-support'David S. Miller
Piergiorgio Beruto says: ==================== net: add PLCA RS support and onsemi NCN26000 This patchset adds support for getting/setting the Physical Layer Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and status on Ethernet PHYs that supports it. PLCA is a feature that provides improved media-access performance in terms of throughput, latency and fairness for multi-drop (P2MP) half-duplex PHYs. PLCA is defined in Clause 148 of the IEEE802.3 specifications as amended by 802.3cg-2019. Currently, PLCA is supported by the 10BASE-T1S single-pair Ethernet PHY defined in the same standard and related amendments. The OPEN Alliance SIG TC14 defines additional specifications for the 10BASE-T1S PHY, including a standard register map for PHYs that embeds the PLCA RS (see PLCA management registers at https://www.opensig.org/about/specifications/). The changes proposed herein add the appropriate ethtool netlink interface for configuring the PLCA RS on PHYs that supports it. A separate patchset further modifies the ethtool userspace program to show and modify the configuration/status of the PLCA RS. Additionally, this patchset adds support for the onsemi NCN26000 Industrial Ethernet 10BASE-T1S PHY that uses the newly added PLCA infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drivers/net/phy: add driver for the onsemi NCN26000 10BASE-T1S PHYPiergiorgio Beruto
This patch adds support for the onsemi NCN26000 10BASE-T1S industrial Ethernet PHY. The driver supports Point-to-Multipoint operation without auto-negotiation and with link control handling. The PHY also features PLCA for improving performance in P2MP mode. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drivers/net/phy: add helpers to get/set PLCA configurationPiergiorgio Beruto
This patch adds support in phylib to read/write PLCA configuration for Ethernet PHYs that support the OPEN Alliance "10BASE-T1S PLCA Management Registers" specifications. These can be found at https://www.opensig.org/about/specifications/ Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drivers/net/phy: add connection between ethtool and phylib for PLCAPiergiorgio Beruto
This patch adds the required connection between netlink ethtool and phylib to resolve PLCA get/set config and get status messages. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHYPiergiorgio Beruto
This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S Ethernet PHY. According to the specifications, the 10BASE-T1S supports Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11net/ethtool: add netlink interface for the PLCA RSPiergiorgio Beruto
Add support for configuring the PLCA Reconciliation Sublayer on multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g., 10BASE-T1S). This patch adds the appropriate netlink interface to ethtool. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()Clément Léger
If ptp was not enabled due to missing IRQ for instance, lan966x_ptp_deinit() will dereference NULL pointers. Fixes: d096459494a8 ("net: lan966x: Add support for ptp clocks") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11powerpc/imc-pmu: Fix use of mutex in IRQs disabled sectionKajol Jain
Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event. Command to trigger the warning: # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5 Performance counter stats for 'sleep 5': 0 thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ 5.002117947 seconds time elapsed 0.000131000 seconds user 0.001063000 seconds sys Below is snippet of the warning in dmesg: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec preempt_count: 2, expected: 0 4 locks held by perf-exec/2869: #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90 #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0 #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510 #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510 irq event stamp: 4806 hardirqs last enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510 softirqs last enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) __might_resched+0x2f8/0x310 __mutex_lock+0x6c/0x13f0 thread_imc_event_add+0xf4/0x1b0 event_sched_in+0xe0/0x210 merge_sched_in+0x1f0/0x600 visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0 ctx_flexible_sched_in+0xcc/0x140 ctx_sched_in+0x20c/0x2a0 ctx_resched+0x104/0x1c0 perf_event_exec+0x340/0x510 begin_new_exec+0x730/0xef0 load_elf_binary+0x3f8/0x1e10 ... do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0 WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0 CPU: 36 PID: 2869 Comm: sleep Tainted: G W 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV NIP: c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670 REGS: c00000004d2134e0 TRAP: 0700 Tainted: G W (6.2.0-rc2-00011-g1247637727f2) MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 48002824 XER: 00000000 CFAR: c00000000013fb64 IRQMASK: 1 The above warning triggered because the current imc-pmu code uses mutex lock in interrupt disabled sections. The function mutex_lock() internally calls __might_resched(), which will check if IRQs are disabled and in case IRQs are disabled, it will trigger the warning. Fix the issue by changing the mutex lock to spinlock. Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device") Reported-by: Michael Petlan <mpetlan@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Fix comments, trim oops in change log, add reported-by tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
2023-01-11powerpc/boot: Fix incorrect version calculation issue in ld_versionOjaswin Mujoo
The ld_version() function computes the wrong version value for certain ld versions such as the following: $ ld --version GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.37 For input 2.37.20211103, the value computed is 202348030000 which is higher than the value for a later version like 2.39.0, which is 23900000. This issue was highlighted because with the above ld version, the powerpc kernel build started failing with ld error: "unrecognized option --no-warn-rwx-segments". This was caused due to the recent commit 579aee9fc594 ("powerpc: suppress some linker warnings in recent linker versions") which added the --no-warn-rwx-segments linker flag if the ld version is greater than 2.39. Due to the bug in ld_version(), ld version 2.37.20111103 is wrongly calculated to be greater than 2.39 and the unsupported flag is added. To fix it, if version is of the form x.y.z and length(z) == 8, then most probably it is a date [yyyymmdd] commonly used for release snapshots and not an actual new version. Hence, ignore the date part replacing it with 0. Fixes: 579aee9fc594 ("powerpc: suppress some linker warnings in recent linker versions") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> [mpe: Tweak change log wording/formatting, add Fixes tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230104202437.90039-1-ojaswin@linux.ibm.com
2023-01-10net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()YueHaibing
'accel_tcp' is allocted by kvzalloc() now, which is a small chunk. Use kzalloc() directly instead of kvzalloc(). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5: remove redundant ret variablezhang songyi
Return value from mlx5dr_send_postsend_action() directly instead of taking this in another redundant variable. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5e: Replace 0-length array with flexible arrayKees Cook
Zero-length arrays are deprecated[1]. Replace struct mlx5e_rx_wqe_cyc's "data" 0-length array with a flexible array. Detected with GCC 13, using -fstrict-flex-arrays=3: drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_alloc_rq': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:827:42: warning: array subscript f is outside array bounds of 'struct mlx5_wqe_data_seg[0]' [-Warray-bounds=] 827 | wqe->data[f].byte_count = 0; | ~~~~~~~~~^~~ In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h:11, from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:48, from drivers/net/ethernet/mellanox/mlx5/core/en_main.c:42: drivers/net/ethernet/mellanox/mlx5/core/en.h:250:39: note: while referencing 'data' 250 | struct mlx5_wqe_data_seg data[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5e: Replace zero-length array with flexible-array memberGustavo A. R. Silva
Zero-length arrays are deprecated[1] and we are moving towards adopting C99 flexible-array members instead. So, replace zero-length array declaration in struct mlx5e_flow_meter_aso_obj with flex-array member. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [2]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Link: https://github.com/KSPP/linux/issues/78 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5: Prevent high-rate FW commands from populating all slotsTariq Toukan
Certain connection-based device-offload protocols (like TLS) use per-connection HW objects to track the state, maintain the context, and perform the offload properly. Some of these objects are created, modified, and destroyed via FW commands. Under high connection rate, this type of FW commands might continuously populate all slots of the FW command interface and throttle it, while starving other critical control FW commands. Limit these throttle commands to using only up to a portion (half) of the FW command interface slots. FW commands maximal rate is not hit, and the same high rate is still reached when applying this limitation. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5: Introduce and use opcode getter in command interfaceTariq Toukan
Introduce an opcode getter in the FW command interface, and use it. Initialize the entry's opcode field early in cmd_alloc_ent() and use it when possible. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5: Enable management PF initializationShay Drory
Enable initialization of DPU Management PF, which is a new loopback PF designed for communication with BMC. For now Management PF doesn't support nor require most upper layer protocols so avoid them. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10net/mlx5e: Add hairpin debugfs filesGal Pressman
We refer to a TC NIC rule that involves forwarding as "hairpin". Hairpin queues are mlx5 hardware specific implementation for hardware forwarding of such packets. For debug purposes, introduce debugfs files which: * Expose the number of active hairpins * Dump the hairpin table * Allow control over the number and size of the hairpin queues instead of the hard-coded values. This allows us to get visibility of the feature in order to improve it for next generation hardware. Add debugfs files: fs/tc/hairpin_num_active fs/tc/hairpin_num_queues fs/tc/hairpin_queue_size fs/tc/hairpin_table_dump Note that the new values will only take effect on the next queues creation, it does not affect existing queues. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>