summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-06mm/slub: Convert print_page_info() to print_slab_info()Matthew Wilcox (Oracle)
Improve the type safety and prepare for further conversion. For flags access, convert to folio internally. [ vbabka@suse.cz: access flags via folio_flags() ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/slub: Convert __slab_lock() and __slab_unlock() to struct slabVlastimil Babka
These functions operate on the PG_locked page flag, but make them accept struct slab to encapsulate this implementation detail. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Convert kfree() to use a struct slabMatthew Wilcox (Oracle)
Convert kfree(), kmem_cache_free() and ___cache_free() to resolve object addresses to struct slab, using folio as intermediate step where needed. Keep passing the result as struct page for now in preparation for mass conversion of internal functions. [ vbabka@suse.cz: Use folio as intermediate step when checking for large kmalloc pages, and when freeing them - rename free_nonslab_page() to free_large_kmalloc() that takes struct folio ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Convert detached_freelist to use a struct slabMatthew Wilcox (Oracle)
This gives us a little bit of extra typesafety as we know that nobody called virt_to_page() instead of virt_to_head_page(). [ vbabka@suse.cz: Use folio as intermediate step when filtering out large kmalloc pages ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert check_heap_object() to use struct slabMatthew Wilcox (Oracle)
Ensure that we're not seeing a tail page inside __check_heap_object() by converting to a slab instead of a page. Take the opportunity to mark the slab as const since we're not modifying it. Also move the declaration of __check_heap_object() to mm/slab.h so it's not available to the wider kernel. [ vbabka@suse.cz: in check_heap_object() only convert to struct slab for actual PageSlab pages; use folio as intermediate step instead of page ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Use struct slab in kmem_obj_info()Matthew Wilcox (Oracle)
All three implementations of slab support kmem_obj_info() which reports details of an object allocated from the slab allocator. By using the slab type instead of the page type, we make it obvious that this can only be called for slabs. [ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert __ksize() to struct slabMatthew Wilcox (Oracle)
In SLUB, use folios, and struct slab to access slab_cache field. In SLOB, use folios to properly resolve pointers beyond PAGE_SIZE offset of the object. [ vbabka@suse.cz: use folios, and only convert folio_test_slab() == true folios to struct slab ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert virt_to_cache() to use struct slabMatthew Wilcox (Oracle)
This function is entirely self-contained, so can be converted from page to slab. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm: Convert [un]account_slab_page() to struct slabMatthew Wilcox (Oracle)
Convert the parameter of these functions to struct slab instead of struct page and drop _page from the names. For now their callers just convert page to slab. [ vbabka@suse.cz: replace existing functions instead of calling them ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Split slab into its own typeMatthew Wilcox (Oracle)
Make struct slab independent of struct page. It still uses the underlying memory in struct page for storing slab-specific data, but slab and slub can now be weaned off using struct page directly. Some of the wrapper functions (slab_address() and slab_order()) still need to cast to struct folio, but this is a significant disentanglement. [ vbabka@suse.cz: Rebase on folios, use folio instead of page where possible. Do not duplicate flags field in struct slab, instead make the related accessors go through slab_folio(). For testing pfmemalloc use the folio_*_active flag accessors directly so the PageSlabPfmemalloc wrappers can be removed later. Make folio_slab() expect only folio_test_slab() == true folios and virt_to_slab() return NULL when folio_test_slab() == false. Move struct slab to mm/slab.h. Don't represent with struct slab pages that are not true slab pages, but just a compound page obtained directly rom page allocator (with large kmalloc() for SLUB and SLOB). ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Make object_err() staticVlastimil Babka
There are no callers outside of mm/slub.c anymore. Move freelist_corrupted() that calls object_err() to avoid a need for forward declaration. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slab: Dissolve slab_map_pages() in its callerVlastimil Babka
The function no longer does what its name and comment suggests, and just sets two struct page fields, which can be done directly in its sole caller. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06netfilter: nft_set_pipapo: allocate pcpu scratch maps on cloneFlorian Westphal
This is needed in case a new transaction is made that doesn't insert any new elements into an already existing set. Else, after second 'nft -f ruleset.txt', lookups in such a set will fail because ->lookup() encounters raw_cpu_ptr(m->scratch) == NULL. For the initial rule load, insertion of elements takes care of the allocation, but for rule reloads this isn't guaranteed: we might not have additions to the set. Fixes: 3c4287f62044a90e ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: etkaar <lists.netfilter.org@prvy.eu> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06netfilter: nft_payload: do not update layer 4 checksum when mangling fragmentsPablo Neira Ayuso
IP fragments do not come with the transport header, hence skip bogus layer 4 checksum updates. Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields") Reported-and-tested-by: Steffen Weinreich <steve@weinreich.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06selftests: netfilter: switch to socat for tests using -q optionHangbin Liu
The nc cmd(nmap-ncat) that distributed with Fedora/Red Hat does not have option -q. This make some tests failed with: nc: invalid option -- 'q' Let's switch to socat which is far more dependable. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-06MAINTAIERS/printk: Add link to printk gitPetr Mladek
It might also help to avoid confusion with the historic pmladek/printk.git that has got obsoleted by printk/linux.git in February 2020. Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220105094157.26216-3-pmladek@suse.com
2022-01-06MAINTAINERS/vsprintf: Update link to printk git treePetr Mladek
printk git tree has moved to printk/linux.git in February 2020. Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220105094157.26216-2-pmladek@suse.com
2022-01-06dt-bindings: xen: Clarify "reg" purposeOleksandr Tyshchenko
Xen on Arm has gained new support recently to calculate and report extended regions (unused address space) safe to use for external mappings. These regions are reported via "reg" property under "hypervisor" node in the guest device-tree. As region 0 is reserved for grant table space (always present), the indexes for extended regions are 1...N. No device-tree bindings update is needed (except clarifying the text) as guest infers the presence of extended regions from the number of regions in "reg" property. While at it, remove the following sentence: "This property is unnecessary when booting Dom0 using ACPI." for "reg" and "interrupts" properties as the initialization is not done via device-tree "hypervisor" node in that case anyway. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-7-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06arm/xen: Read extended regions from DT and init Xen resourceOleksandr Tyshchenko
This patch implements arch_xen_unpopulated_init() on Arm where the extended regions (if any) are gathered from DT and inserted into specific Xen resource to be used as unused address space for Xen scratch pages by unpopulated-alloc code. The extended region (safe range) is a region of guest physical address space which is unused and could be safely used to create grant/foreign mappings instead of wasting real RAM pages from the domain memory for establishing these mappings. The extended regions are chosen by the hypervisor at the domain creation time and advertised to it via "reg" property under hypervisor node in the guest device-tree. As region 0 is reserved for grant table space (always present), the indexes for extended regions are 1...N. If arch_xen_unpopulated_init() fails for some reason the default behaviour will be restored (allocate xenballooned pages). This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-6-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06xen/unpopulated-alloc: Add mechanism to use Xen resourceOleksandr Tyshchenko
The main reason of this change is that unpopulated-alloc code cannot be used in its current form on Arm, but there is a desire to reuse it to avoid wasting real RAM pages for the grant/foreign mappings. The problem is that system "iomem_resource" is used for the address space allocation, but the really unallocated space can't be figured out precisely by the domain on Arm without hypervisor involvement. For example, not all device I/O regions are known by the time domain starts creating grant/foreign mappings. And following the advise from "iomem_resource" we might end up reusing these regions by a mistake. So, the hypervisor which maintains the P2M for the domain is in the best position to provide unused regions of guest physical address space which could be safely used to create grant/foreign mappings. Introduce new helper arch_xen_unpopulated_init() which purpose is to create specific Xen resource based on the memory regions provided by the hypervisor to be used as unused space for Xen scratch pages. If arch doesn't define arch_xen_unpopulated_init() the default "iomem_resource" will be used. Update the arguments list of allocate_resource() in fill_list() to always allocate a region from the hotpluggable range (maximum possible addressable physical memory range for which the linear mapping could be created). If arch doesn't define arch_get_mappable_range() the default range (0,-1) will be used. The behaviour on x86 won't be changed by current patch as both arch_xen_unpopulated_init() and arch_get_mappable_range() are not implemented for it. Also fallback to allocate xenballooned pages (balloon out RAM pages) if we do not have any suitable resource to work with (target_resource is invalid) and as the result we won't be able to provide unpopulated pages on a request. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-5-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06xen/balloon: Bring alloc(free)_xenballooned_pages helpers backOleksandr Tyshchenko
This patch rolls back some of the changes introduced by commit 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages" in order to make possible to still allocate xenballooned pages if CONFIG_XEN_UNPOPULATED_ALLOC is enabled. On Arm the unpopulated pages will be allocated on top of extended regions provided by Xen via device-tree (the subsequent patches will add required bits to support unpopulated-alloc feature on Arm). The problem is that extended regions feature has been introduced into Xen quite recently (during 4.16 release cycle). So this effectively means that Linux must only use unpopulated-alloc on Arm if it is running on "new Xen" which advertises these regions. But, it will only be known after parsing the "hypervisor" node at boot time, so before doing that we cannot assume anything. In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not advertised (Linux is running on "old Xen", etc) we need the fallback to alloc_xenballooned_pages(). This way we wouldn't reduce the amount of memory usable (wasting RAM pages) for any of the external mappings anymore (and eliminate XSA-300) with "new Xen", but would be still functional ballooning out RAM pages with "old Xen". Also rename alloc(free)_xenballooned_pages to xen_alloc(free)_ballooned_pages and make xen_alloc(free)_unpopulated_pages static inline in xen.h if CONFIG_XEN_UNPOPULATED_ALLOC is disabled. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-4-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06arm/xen: Switch to use gnttab_setup_auto_xlat_frames() for DTOleksandr Tyshchenko
Read the start address of the grant table space from DT (region 0). This patch mostly restores behaviour before commit 3cf4095d7446 ("arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table") but trying not to break the ACPI support added after that commit. So the patch touches DT part only and leaves the ACPI part with xen_xlate_map_ballooned_pages(). Also in order to make a code more resilient use a fallback to xen_xlate_map_ballooned_pages() if grant table region wasn't found. This is a preparation for using Xen extended region feature where unused regions of guest physical address space (provided by the hypervisor) will be used to create grant/foreign/whatever mappings instead of wasting real RAM pages from the domain memory for establishing these mappings. The immediate benefit of this change: - Avoid superpage shattering in Xen P2M when establishing stage-2 mapping (GFN <-> MFN) for the grant table space - Avoid wasting real RAM pages (reducing the amount of memory usuable) for mapping grant table space - The grant table space is always mapped at the exact same place (region 0 is reserved for the grant table) Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1639080336-26573-3-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06xen/unpopulated-alloc: Drop check for virt_addr_valid() in fill_list()Oleksandr Tyshchenko
If memremap_pages() succeeds the range is guaranteed to have proper page table, there is no need for an additional virt_addr_valid() check. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/1639080336-26573-2-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06MAINTAINERS: update PCMCIA treeTom Saeger
Update location of PCMCIA tree. Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2022-01-06pcmcia: use sysfs_emit{,_at} for sysfs outputDominik Brodowski
Convert the PCMCIA core and yenta_socket.c to use sysfs_emit or sysfs_emit_at when providing output in sysfs. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2022-01-06xen/x86: obtain upper 32 bits of video frame buffer address for Dom0Jan Beulich
The hypervisor has been supplying this information for a couple of major releases. Make use of it. The need to set a flag in the capabilities field also points out that the prior setting of that field from the hypervisor interface's gbl_caps one was wrong, so that code gets deleted (there's also no equivalent of this in native boot code). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/a3df8bf3-d044-b7bb-3383-cd5239d6d4af@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-06xen/gntdev: fix unmap notification orderOleksandr Andrushchenko
While working with Xen's libxenvchan library I have faced an issue with unmap notifications sent in wrong order if both UNMAP_NOTIFY_SEND_EVENT and UNMAP_NOTIFY_CLEAR_BYTE were requested: first we send an event channel notification and then clear the notification byte which renders in the below inconsistency (cli_live is the byte which was requested to be cleared on unmap): [ 444.514243] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 1 [ 444.515239] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 Thus it is not possible to reliably implement the checks like - wait for the notification (UNMAP_NOTIFY_SEND_EVENT) - check the variable (UNMAP_NOTIFY_CLEAR_BYTE) because it is possible that the variable gets checked before it is cleared by the kernel. To fix that we need to re-order the notifications, so the variable is first gets cleared and then the event channel notification is sent. With this fix I can see the correct order of execution: [ 54.522611] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 [ 54.537966] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 0 Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211210092817.580718-1-andr2000@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-05xdp: Add xdp_do_redirect_frame() for pre-computed xdp_framesToke Høiland-Jørgensen
Add an xdp_do_redirect_frame() variant which supports pre-computed xdp_frame structures. This will be used in bpf_prog_run() to avoid having to write to the xdp_frame structure when the XDP program doesn't modify the frame boundaries. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-6-toke@redhat.com
2022-01-05xdp: Move conversion to xdp_frame out of map functionsToke Høiland-Jørgensen
All map redirect functions except XSK maps convert xdp_buff to xdp_frame before enqueueing it. So move this conversion of out the map functions and into xdp_do_redirect(). This removes a bit of duplicated code, but more importantly it makes it possible to support caller-allocated xdp_frame structures, which will be added in a subsequent commit. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-5-toke@redhat.com
2022-01-05page_pool: Store the XDP mem idToke Høiland-Jørgensen
Store the XDP mem ID inside the page_pool struct so it can be retrieved later for use in bpf_prog_run(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com
2022-01-05page_pool: Add callback to init pages when they are allocatedToke Høiland-Jørgensen
Add a new callback function to page_pool that, if set, will be called every time a new page is allocated. This will be used from bpf_test_run() to initialise the page data with the data provided by userspace when running XDP programs with redirect turned on. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com
2022-01-05xdp: Allow registering memory model without rxq referenceToke Høiland-Jørgensen
The functions that register an XDP memory model take a struct xdp_rxq as parameter, but the RXQ is not actually used for anything other than pulling out the struct xdp_mem_info that it embeds. So refactor the register functions and export variants that just take a pointer to the xdp_mem_info. This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a page_pool instance that is not connected to any network device. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com
2022-01-05Merge branch 'samples/bpf: xdpsock app enhancements'Alexei Starovoitov
Ong Boon says: ==================== First of all, sorry for taking more time to get back to this series and thanks to all valuble feedback in series-1 at [1] from Jesper and Song Liu. Since then I have looked into what Jesper suggested in [2] and worked on revising the patch series into several patches for ease of review: v1->v2: 1/7: [No change]. Add VLAN tag (ID & Priority) to the generated Tx-Only frames. 2/7: [No change]. Add DMAC and SMAC setting to the generated Tx-Only frames. If parameters are not set, previous DMAC and SMAC are used. 3/7: [New]. Add support for selecting different CLOCK for clock_gettime() used in get_nsecs. 4/7: [New]. This is a total rework from series-1 3/4-patch [3]. It uses clock_nanosleep() suggested by Jesper. In addition, added statistic for Tx schedule variance under application stat (-a|--app-stats). Make the cyclic Tx operation and --poll mode to be mutually- exclusive. Still, the ability to specify TX cycle time and used together with batch size and packet count remain the same. 5/7: [New]. Add the support for TX process schedule policy and priority setting. By default, SCHED_OTHER policy is used. This too is matching the schedule policy setting in [2]. 6/7: [Change]. This is update from series-1 4/4-patch [4]. Added TX clean process time-out in 1s granularity with configurable retries count (-O|--retries). 7/7: [New]. Added timestamp for TX packet following pktgen_hdr format matching the implementation in [2]. However, the sequence ID remains the same as it is instead of process schedule diff in [2]. To summarize on what program options have been added with v2 series using an example below:- DMAC (-G) = fa:8d:f1:e2:0b:e8 SMAC (-H) = ce:17:07:17:3e:3a VLAN tagged (-V) VLAN ID (-J) = 12 VLAN Pri (-K) = 3 Tx Queue (-q) = 3 Cycle Time in us (-T) = 1000 Batch (-b) = 2 Packet Count = 6 Tx schedule policy (-W) = FIFO Tx schedule priority (-U) = 50 Clock selection (-w) = REALTIME Tx timeout retries(-O) = 5 Tx timestamp (-y) Cyclic Tx schedule stat (-a) Note: xdpsock sets UDP dest-port and src-port to 0x1000 as default. Sending Board ============= $ xdpsock -i eth0 -t -N -z -H ce:17:07:17:3e:3a -G fa:8d:f1:e2:0b:e8 \ -V -J 12 -K 3 -q 3 \ -T 1000 -b 2 -C 6 -W FIFO -U 50 -w REALTIME \ -O 5 -y -a sock0@eth0:3 txonly xdp-drv pps pkts 0.00 rx 0 0 tx 0 6 calls/s count rx empty polls 0 0 fill fail polls 0 0 copy tx sendtos 0 0 tx wakeup sendtos 0 5 opt polls 0 0 period min ave max cycle Cyclic TX 1000000 31033 32009 33397 3 Receiving Board =============== $ tcpdump -nei eth0 udp port 0x1000 -vv -Q in -X \ --time-stamp-precision nano tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 03:46:40.520111580 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e997 be9b e955 ...............U 0x0020: 0000 0000 61cd 2ba1 0006 987c ....a.+....| 03:46:40.520112163 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e996 be9b e955 ...............U 0x0020: 0000 0001 61cd 2ba1 0006 987c ....a.+....| 03:46:40.521066860 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e5af be9b e955 ...............U 0x0020: 0000 0002 61cd 2ba1 0006 9c62 ....a.+....b 03:46:40.521067012 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e5ae be9b e955 ...............U 0x0020: 0000 0003 61cd 2ba1 0006 9c62 ....a.+....b 03:46:40.522061935 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e1c5 be9b e955 ...............U 0x0020: 0000 0004 61cd 2ba1 0006 a04a ....a.+....J 03:46:40.522062173 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44) 10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16 0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~.... 0x0010: 0a0a 0a20 1000 1000 0018 e1c4 be9b e955 ...............U 0x0020: 0000 0005 61cd 2ba1 0006 a04a ....a.+....J I have tested the above with both tagged and untagged packet format and based on the timestamp in tcpdump found that the timing of the batch cyclic transmission is correct. Appreciate if community can give the patch series v2 a try and point out any gap. Thanks Boon Leong [1] https://patchwork.kernel.org/project/netdevbpf/cover/20211124091821.3916046-1-boon.leong.ong@intel.com/ [2] https://github.com/netoptimizer/network-testing/blob/master/src/udp_pacer.c [3] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-4-boon.leong.ong@intel.com/ [4] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-5-boon.leong.ong@intel.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-05samples/bpf: xdpsock: Add timestamp for Tx-only operationOng Boon Leong
It may be useful to add timestamp for Tx packets for continuous or cyclic transmit operation. The timestamp and sequence ID of a Tx packet are stored according to pktgen header format. To enable per-packet timestamp, use -y|--tstamp option. If timestamp is off, pktgen header is not included in the UDP payload. This means receiving side can use the magic number for pktgen for differentiation. The implementation supports both VLAN tagged and untagged option. By default, the minimum packet size is set at 64B. However, if VLAN tagged is on (-V), the minimum packet size is increased to 66B just so to fit the pktgen_hdr size. Added hex_dump() into the code path just for future cross-checking. As before, simply change to "#define DEBUG_HEXDUMP 1" to inspect the accuracy of TX packet. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-8-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add time-out for cleaning TxOng Boon Leong
When user sets tx-pkt-count and in case where there are invalid Tx frame, the complete_tx_only_all() process polls indefinitely. So, this patch adds a time-out mechanism into the process so that the application can terminate automatically after it retries 3*polling interval duration. v1->v2: Thanks to Jesper's and Song Liu's suggestion. - clean-up git message to remove polling log - make the Tx time-out retries configurable with 1s granularity Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-7-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add sched policy and priority supportOng Boon Leong
By default, TX schedule policy is SCHED_OTHER (round-robin time-sharing). To improve TX cyclic scheduling, we add SCHED_FIFO policy and its priority by using -W FIFO or --policy=FIFO and -U <PRIO> or --schpri=<PRIO>. A) From xdpsock --app-stats, for SCHED_OTHER policy: $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a period min ave max cycle Cyclic TX 1000000 53507 75334 712642 6250 B) For SCHED_FIFO policy and schpri=50: $ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a -W FIFO -U 50 period min ave max cycle Cyclic TX 1000000 3699 24859 54397 6250 Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-6-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add cyclic TX operation capabilityOng Boon Leong
Tx cycle time is in micro-seconds unit. By combining the batch size (-b M) and Tx cycle time (-T|--tx-cycle N), xdpsock now can transmit batch-size of packets every N-us periodically. Cyclic TX operation is not applicable if --poll mode is used. To transmit 16 packets every 1ms cycle time for total of 100000 packets silently: $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 To print cyclic TX schedule variance stats, use --app-stats|-a: $ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 -a sock0@eth0:0 txonly xdp-drv pps pkts 0.00 rx 0 0 tx 0 100000 calls/s count rx empty polls 0 0 fill fail polls 0 0 copy tx sendtos 0 0 tx wakeup sendtos 0 6254 opt polls 0 0 period min ave max cycle Cyclic TX 1000000 53507 75334 712642 6250 Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-5-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add clockid selection supportOng Boon Leong
User specifies the clock selection by using -w CLOCK or --clock=CLOCK where CLOCK=[REALTIME, TAI, BOOTTIME, MONOTONIC]. The default CLOCK selection is MONOTONIC. The implementation of clock selection parsing is borrowed from iproute2/tc/q_taprio.c Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230035447.523177-4-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add Dest and Src MAC setting for Tx-only operationOng Boon Leong
To set Dest MAC address (-G|--tx-dmac) only: $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff To set Source MAC address (-H|--tx-smac) only: $ xdpsock -i eth0 -t -N -z -H 11:22:33:44:55:66 To set both Dest and Source MAC address: $ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff \ -H 11:22:33:44:55:66 The default Dest and Source MAC address remain the same as before. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20211230035447.523177-3-boon.leong.ong@intel.com
2022-01-05samples/bpf: xdpsock: Add VLAN support for Tx-only operationOng Boon Leong
In multi-queue environment testing, the support for VLAN-tag based steering is useful. So, this patch adds the capability to add VLAN tag (VLAN ID and Priority) to the generated Tx frame. To set the VLAN ID=10 and Priority=2 for Tx only through TxQ=3: $ xdpsock -i eth0 -t -N -z -q 3 -V -J 10 -K 2 If VLAN ID (-J) and Priority (-K) is set, it default to VLAN ID = 1 VLAN Priority = 0. For example, VLAN-tagged Tx only, xdp copy mode through TxQ=1: $ xdpsock -i eth0 -t -N -c -q 1 -V Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211230035447.523177-2-boon.leong.ong@intel.com
2022-01-05clk: Drop unused COMMON_CLK_STM32MP157_SCMI configSudeep Holla
Commit 21e743300dd0 ("clk: stm32mp1: new compatible for secure RCC support") introduced a new Kconfig option COMMON_CLK_STM32MP157_SCMI which is not used anywhere. Further, it looks like this Kconfig option is just to select bunch of other options which doesn't sound correct to me. There is no need for another SCMI firmware based clock driver and hence the same applies for the config option too. Let us just drop the unused COMMON_CLK_STM32MP157_SCMI before it gives someone idea to write a specific clock driver for this SoC/platform. Cc: Etienne Carriere <etienne.carriere@foss.st.com> Cc: Gabriel Fernandez <gabriel.fernandez@foss.st.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20211015150043.140793-1-sudeep.holla@arm.com Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-05clk: st: clkgen-mux: search reg within node or parentAlain Volmat
In order to avoid having duplicated addresses within the DT, only have one unit-address per clockgen and each driver within the clockgen should look at the parent node (overall clockgen) to figure out the reg property. Such behavior is already in place in other STi platform clock drivers such as clk-flexgen and clkgen-pll. Keep backward compatibility by first looking at reg within the node before looking into the parent node. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20211218211157.188214-3-avolmat@me.com Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-05clk: st: clkgen-fsyn: search reg within node or parentAlain Volmat
In order to avoid having duplicated addresses within the DT, only have one unit-address per clockgen and each driver within the clockgen should look at the parent node (overall clockgen) to figure out the reg property. Such behavior is already in place in other STi platform clock drivers such as clk-flexgen and clkgen-pll. Keep backward compatibility by first looking at reg within the node before looking into the parent node. Signed-off-by: Alain Volmat <avolmat@me.com> Link: https://lore.kernel.org/r/20211218211157.188214-2-avolmat@me.com Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-05clk: Enable/Disable runtime PM for clk_summaryTaniya Das
The registers for some clocks in the SOC area, which are under the power domain are required to be enabled before accessing them. During the clk_summary if the power-domains are not enabled they could result into NoC errors. Thus ensure the register access of the clock controller is done with pm_untime_get/put functions. Signed-off-by: Taniya Das <tdas@codeaurora.org> Link: https://lore.kernel.org/r/1640018638-19436-3-git-send-email-tdas@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-05Merge branch 'net-lantiq_xrx200-improve-ethernet-performance'Jakub Kicinski
Aleksander Jan Bajkowski says: ==================== net: lantiq_xrx200: improve ethernet performance This patchset improves Ethernet performance by 15%. NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 539 Mbps 599 Mbps After 624 Mbps 695 Mbps ==================== Link: https://lore.kernel.org/r/20220104151144.181736-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05net: lantiq_xrx200: convert to build_skbAleksander Jan Bajkowski
We can increase the efficiency of rx path by using buffers to receive packets then build SKBs around them just before passing into the network stack. In contrast, preallocating SKBs too early reduces CPU cache efficiency. NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 577 Mbps 648 Mbps After 624 Mbps 695 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05net: lantiq_xrx200: increase napi poll weigthAleksander Jan Bajkowski
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 545 Mbps 625 Mbps After 577 Mbps 648 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05MIPS: lantiq: dma: increase descritor countAleksander Jan Bajkowski
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500): Down Up Before 539 Mbps 599 Mbps After 545 Mbps 625 Mbps Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05MAINTAINERS: Add entries for Toshiba Visconti PLL and clock controllerNobuhiro Iwamatsu
Add entries for Toshiba Visconti PLL and clock controller binding and driver. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Link: https://lore.kernel.org/r/20211025031038.4180686-5-nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-01-05clk: visconti: Add support common clock driver and reset driverNobuhiro Iwamatsu
Add support for common interface of the common clock and reset driver for Toshiba Visconti5 and its SoC, TMPV7708. The PIPLLCT provides the PLL, and the PISMU provides clock and reset functionality. Each drivers are provided in this patch. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Link: https://lore.kernel.org/r/20211025031038.4180686-4-nobuhiro1.iwamatsu@toshiba.co.jp [sboyd@kernel.org: Add bitfield.h include to pll.c] Signed-off-by: Stephen Boyd <sboyd@kernel.org>