Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can.
Still no major regressions, most of the changes are still due to data
races fixes, plus the usual bunch of drivers fixes.
Previous releases - regressions:
- tcp/udp: make early_demux back namespacified.
- dsa: fix issues with vlan_filtering_is_global
Previous releases - always broken:
- ip: fix data-races around ipv4_net_table (round 2, 3 & 4)
- amt: fix validation and synchronization bugs
- can: fix detection of mcp251863
- eth: iavf: fix handling of dummy receive descriptors
- eth: lan966x: fix issues with MAC table
- eth: stmmac: dwmac-mediatek: fix clock issue
Misc:
- dsa: update documentation"
* tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
net/sched: cls_api: Fix flow action initialization
tcp: Fix data-races around sysctl_tcp_max_reordering.
tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
tcp: Fix a data-race around sysctl_tcp_rfc1337.
tcp: Fix a data-race around sysctl_tcp_stdurg.
tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
tcp: Fix data-races around sysctl_tcp_recovery.
tcp: Fix a data-race around sysctl_tcp_early_retrans.
tcp: Fix data-races around sysctl knobs related to SYN option.
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
ip: Fix data-races around sysctl_ip_prot_sock.
ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
can: rcar_canfd: Add missing of_node_put() in rcar_canfd_probe()
can: mcp251xfd: fix detection of mcp251863
Documentation: fix udp_wmem_min in ip-sysctl.rst
...
|
|
Recall that CXL capable address ranges, on ACPI platforms, are published
in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window
Structures). These windows represent both the actively mapped capacity
and the potential address space that can be dynamically assigned to a
new CXL decode configuration (region / interleave-set).
CXL endpoints like DDR DIMMs can be mapped at any physical address
including 0 and legacy ranges.
There is an expectation and requirement that the /proc/iomem interface
and the iomem_resource tree in the kernel reflect the full set of
platform address ranges. I.e. that every address range that platform
firmware and bus drivers enumerate be reflected as an iomem_resource
entry. The hard requirement to do this for CXL arises from the fact that
facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
iomem_resource ranges as free for software to use as proxy address
space. Without CXL publishing its potential address ranges in
iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
steal capacity reserved for runtime provisioning of new CXL regions.
So, iomem_resource needs to know about both active and potential CXL
resource ranges. The active CXL resources might already be reflected in
iomem_resource as "System RAM". insert_resource_expand_to_fit() handles
re-parenting "System RAM" underneath a CXL window.
The "_expand_to_fit()" behavior handles cases where a CXL window is not
a strict superset of an existing entry in the iomem_resource tree. The
"_expand_to_fit()" behavior is acceptable from the perspective of
resource allocation. The expansion happens because a conflicting
resource range is already populated, which means the resource boundary
expansion does not result in any additional free CXL address space being
made available. CXL address space allocation is always bounded by the
orginal unexpanded address range.
However, the potential for expansion does mean that something like
walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on
corner case platforms that cause the resource tree to expand a CXL
window resource over a range that is not decoded by CXL. This would be
an odd platform configuration, but if it becomes a problem in practice
the CXL subsytem could just publish an API that returns definitive
answers.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The peripheral may have several FIFOs, but some case just select
some FIFOs from them for data transfer, which means FIFO0 and FIFO2
may be selected. So add FIFO address stride support, 0 means all FIFOs
are continuous, 1 means 1 word stride between FIFOs. All stride between
FIFOs should be same.
Another option words_per_fifo means how many audio channel data copied
to one FIFO one time, 1 means one channel per FIFO, 2 means 2 channels
per FIFO.
If 'n_fifos_src = 4' and 'words_per_fifo = 2', it means the first two
words(channels) fetch from FIFO0 and then jump to FIFO1 for next two words,
and so on after the last FIFO3 fetched, roll back to FIFO0.
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/1657162829-9273-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
DisplayLink ethernet devices require NTB buffers larger then 32kb
in order to run with highest performance.
This patch is changing upper limit of the rx and tx buffers.
Those buffers are initialized with CDC_NCM_NTB_DEF_SIZE_RX and
CDC_NCM_NTB_DEF_SIZE_TX which is 16kb so by default no device is
affected by increased limit.
Rx and tx buffer is increased under two conditions:
- Device need to advertise that it supports higher buffer size in
dwNtbMaxInMaxSize and dwNtbMaxOutMaxSize.
- cdc_ncm/rx_max and cdc_ncm/tx_max driver parameters must be adjusted
with udev rule or ethtool.
Summary of testing and performance results:
Tests were performed on following devices:
- DisplayLink DL-3xxx family device
- DisplayLink DL-6xxx family device
- ASUS USB-C2500 2.5G USB3 ethernet adapter
- Plugable USB3 1G USB3 ethernet adapter
- EDIMAX EU-4307 USB-C ethernet adapter
- Dell DBQBCBC064 USB-C ethernet adapter
Performance measurements were done with:
- iperf3 between two linux boxes
- http://openspeedtest.com/ instance running on local test machine
Insights from tests results:
- All except one from third party usb adapters were not affected by
increased buffer size to their advertised dwNtbOutMaxSize and
dwNtbInMaxSize.
Devices were generally reaching 912-940Mbps both download and upload.
Only EDIMAX adapter experienced decreased download size from
929Mbps to 827Mbps with iper3, with openspeedtest decrease was from
968Mbps to 886Mbps.
- DisplayLink DL-3xxx family devices experienced performance increase
with iperf3 download from 300Mbps to 870Mbps and
upload from 782Mbps to 844Mbps.
With openspeedtest download increased from 556Mbps to 873Mbps
and upload from 727Mbps to 973Mbps
- DiplayLink DL-6xxx family devices are not affected by
increased buffer size.
Signed-off-by: Łukasz Spintzyk <lukasz.spintzyk@synaptics.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220720060518.541-2-lukasz.spintzyk@synaptics.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
> Some changes to the way invalid MSR accesses are reported by the
> kernel is causing some problems with messages printed on the
> console.
>
> We have seen several cases of ex_handler_msr() printing invalid MSR
> accesses once but the callstack multiple times causing confusion on
> the console.
> The problem here is that another earlier commit (5.13):
>
> a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
>
> Modifies all the pr_*_once() calls to always return true claiming
> that no caller is ever checking the return value of the functions.
>
> This is why we are seeing the callstack printed without the
> associated printk() msg.
Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
implement DO_ONCE_LITE_IF() and fix the extable code.
Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Simplify nf_ct_get_tuple(), from Jackie Liu.
2) Add format to request_module() call, from Bill Wendling.
3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
hardware offload objects to be processed, from Vlad Buslov.
4) Missing rcu annotation and accessors in the netfilter tree,
from Florian Westphal.
5) Merge h323 conntrack helper nat hooks into single object,
also from Florian.
6) A batch of update to fix sparse warnings treewide,
from Florian Westphal.
7) Move nft_cmp_fast_mask() where it used, from Florian.
8) Missing const in nf_nat_initialized(), from James Yonan.
9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.
10) Use refcount_inc instead of _inc_not_zero in flowtable,
from Florian Westphal.
11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: xt_TPROXY: remove pr_debug invocations
netfilter: flowtable: prefer refcount_inc
netfilter: ipvs: Use the bitmap API to allocate bitmaps
netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
netfilter: nf_tables: move nft_cmp_fast_mask to where its used
netfilter: nf_tables: use correct integer types
netfilter: nf_tables: add and use BE register load-store helpers
netfilter: nf_tables: use the correct get/put helpers
netfilter: x_tables: use correct integer types
netfilter: nfnetlink: add missing __be16 cast
netfilter: nft_set_bitmap: Fix spelling mistake
netfilter: h323: merge nat hook pointers into one
netfilter: nf_conntrack: use rcu accessors where needed
netfilter: nf_conntrack: add missing __rcu annotations
netfilter: nf_flow_table: count pending offload workqueue tasks
net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
netfilter: conntrack: use correct format characters
netfilter: conntrack: use fallthrough to cleanup
====================
Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-07-17
1) Add resiliency for lost completions for PTP TX port timestamp
2) Report Header-data split state via ethtool
3) Decouple HTB code from main regular TX code
* tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: CT: Remove warning of ignore_flow_level support for non PF
net/mlx5e: Add resiliency for PTP TX port timestamp
net/mlx5: Expose ts_cqe_metadata_size2wqe_counter
net/mlx5e: HTB, move htb functions to a new file
net/mlx5e: HTB, change functions name to follow convention
net/mlx5e: HTB, remove priv from htb function calls
net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure
net/mlx5e: HTB, move stats and max_sqs to priv
net/mlx5e: HTB, move section comment to the right place
net/mlx5e: HTB, move ids to selq_params struct
net/mlx5e: HTB, reduce visibility of htb functions
net/mlx5e: Fix mqprio_rl handling on devlink reload
net/mlx5e: Report header-data split state through ethtool
====================
Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
They were updated in kernel/bpf/trampoline.c to fix another build
issue. We should to do the same for include/linux/bpf.h header.
Fixes: 3908fcddc65d ("bpf: fix lsm_cgroup build errors on esoteric configs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com
|
|
Instead of bouncing the function call to the driver op through a blocking
notifier just have the iommu layer call it directly.
Register each device that is being attached to the iommu with the lower
driver which then threads them on a linked list and calls the appropriate
driver op at the right time.
Currently the only use is if dma_unmap() is defined.
Also, fully lock all the debugging tests on the pinning path that a
dma_unmap is registered.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Instead of having drivers register the notifier with explicit code just
have them provide a dma_unmap callback op in their driver ops and rely on
the core code to wire it up.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Add support for SMN communication on family 17h model A0h and family 19h
models 60h-70h.
[ bp: Merge into a single patch. ]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220719195256.1516-1-mario.limonciello@amd.com
|
|
* irq/loongarch:
: .
: Merge the long awaited IRQ support for the LoongArch architecture.
:
: From the cover letter:
:
: "Currently, LoongArch based processors (e.g. Loongson-3A5000)
: can only work together with LS7A chipsets. The irq chips in
: LoongArch computers include CPUINTC (CPU Core Interrupt
: Controller), LIOINTC (Legacy I/O Interrupt Controller),
: EIOINTC (Extended I/O Interrupt Controller), PCH-PIC (Main
: Interrupt Controller in LS7A chipset), PCH-LPC (LPC Interrupt
: Controller in LS7A chipset) and PCH-MSI (MSI Interrupt Controller)."
:
: Note that this comes with non-official, arch private ACPICA
: definitions until the official ACPICA update is realeased.
: .
irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
irqchip: Add LoongArch CPU interrupt controller support
irqchip: Add Loongson Extended I/O interrupt controller support
irqchip/loongson-liointc: Add ACPI init support
irqchip/loongson-pch-msi: Add ACPI init support
irqchip/loongson-pch-pic: Add ACPI init support
irqchip: Add Loongson PCH LPC controller support
LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
LoongArch: Use ACPI_GENERIC_GSI for gsi handling
genirq/generic_chip: Export irq_unmap_generic_chip
ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
APCI: irq: Add support for multiple GSI domains
LoongArch: Provisionally add ACPICA data structures
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
For LoongArch, ACPI_IRQ_MODEL_LPIC is introduced, and then the
callback acpi_get_gsi_domain_id and acpi_gsi_to_irq_fallback are
implemented.
The acpi_get_gsi_domain_id callback returns related fwnode handle
of irqdomain for different GSI range.
The acpi_gsi_to_irq_fallback will create new mapping for gsi when
the mapping of it is not found.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-14-git-send-email-lvjianmin@loongson.cn
|
|
EIOINTC stands for "Extended I/O Interrupts" that described in Section
11.2 of "Loongson 3A5000 Processor Reference Manual". For more
information please refer Documentation/loongarch/irq-chip-model.rst.
Loongson-3A5000 has 4 cores per NUMA node, and each NUMA node has an
EIOINTC; while Loongson-3C5000 has 16 cores per NUMA node, and each NUMA
node has 4 EIOINTCs. In other words, 16 cores of one NUMA node in
Loongson-3C5000 are organized in 4 groups, each group connects to an
EIOINTC. We call the "group" here as an EIOINTC node, so each EIOINTC
node always includes 4 cores (both in Loongson-3A5000 and Loongson-
3C5000).
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-12-git-send-email-lvjianmin@loongson.cn
|
|
Some irq controllers have to re-implement a private version for
irq_generic_chip_ops, because they have a different xlate to translate
hwirq. Export irq_unmap_generic_chip to allow reusing in drivers.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-5-git-send-email-lvjianmin@loongson.cn
|
|
It appears that the generic version of acpi_gsi_to_irq() doesn't
fallback to establishing a mapping if there is no pre-existing
one while the x86 version does.
While arm64 seems unaffected by it, LoongArch is relying on the x86
behaviour. In an effort to prevent new architectures from reinventing
the proverbial wheel, provide an optional callback that the arch code
can set to restore the x86 behaviour.
Hopefully we can eventually get rid of this in the future once
the expected behaviour has been clarified.
Reported-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/1658314292-35346-4-git-send-email-lvjianmin@loongson.cn
|
|
In an unfortunate departure from the ACPI spec, the LoongArch
architecture split its GSI space across multiple interrupt
controllers.
In order to be able to reuse the core code and prevent
architectures from reinventing an already square wheel, offer
the arch code the ability to register a dispatcher function
that will return the domain fwnode for a given GSI.
The ARM GIC drivers are updated to support this (with a single
domain, as intended).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/1658314292-35346-3-git-send-email-lvjianmin@loongson.cn
|
|
THP_SWAP has been proven to improve the swap throughput significantly
on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
splitting THP after swapped out").
As long as arm64 uses 4K page size, it is quite similar with x86_64
by having 2MB PMD THP. THP_SWAP is architecture-independent, thus,
enabling it on arm64 will benefit arm64 as well.
A corner case is that MTE has an assumption that only base pages
can be swapped. We won't enable THP_SWAP for ARM64 hardware with
MTE support until MTE is reworked to coexist with THP_SWAP.
A micro-benchmark is written to measure thp swapout throughput as
below,
unsigned long long tv_to_ms(struct timeval tv)
{
return tv.tv_sec * 1000 + tv.tv_usec / 1000;
}
main()
{
struct timeval tv_b, tv_e;;
#define SIZE 400*1024*1024
volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (!p) {
perror("fail to get memory");
exit(-1);
}
madvise(p, SIZE, MADV_HUGEPAGE);
memset(p, 0x11, SIZE); /* write to get mem */
gettimeofday(&tv_b, NULL);
madvise(p, SIZE, MADV_PAGEOUT);
gettimeofday(&tv_e, NULL);
printf("swp out bandwidth: %ld bytes/ms\n",
SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
}
Testing is done on rk3568 64bit Quad Core Cortex-A55 platform -
ROCK 3A.
thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests)
thp swp throughput w/ patch: 3331bytes/ms (mean of 10 tests)
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Link: https://lore.kernel.org/r/20220720093737.133375-1-21cnbao@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Many binary attributes need to limit access to CAP_SYS_ADMIN only; ie
many binary attributes specify is_visible with 0400 or 0600.
Make setting the permissions of such attributes more explicit by
defining BIN_ATTR_ADMIN_{RO,RW}.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-6-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Introduced in a PCIe r6.0, sec 6.30, DOE provides a config space based
mailbox with standard protocol discovery. Each mailbox is accessed
through a DOE Extended Capability.
Each DOE mailbox must support the DOE discovery protocol in addition to
any number of additional protocols.
Define core PCIe functionality to manage a single PCIe DOE mailbox at a
defined config space offset. Functionality includes iterating,
creating, query of supported protocol, and task submission. Destruction
of the mailboxes is device managed.
Cc: "Li, Ming" <ming4.li@intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-4-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This ID is used in DOE headers to identify protocols that are defined
within the PCI Express Base Specification, PCIe r6.0, sec 6.30.1.1 table
6-32.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220719205249.566684-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux
Pavel Begunkov says:
====================
io_uring zerocopy send
The patchset implements io_uring zerocopy send. It works with both registered
and normal buffers, mixing is allowed but not recommended. Apart from usual
request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
the userspace when buffers are freed and can be reused (see API design below),
which is delivered into io_uring's Completion Queue. Those "buffer-free"
notifications are not necessarily per request, but the userspace has control
over it and should explicitly attaching a number of requests to a single
notification. The series also adds some internal optimisations when used with
registered buffers like removing page referencing.
From the kernel networking perspective there are two main changes. The first
one is passing ubuf_info into the network layer from io_uring (inside of an
in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
caching on the io_uring side, but also helps to avoid cross-referencing
and synchronisation problems. The second part is an optional optimisation
removing page referencing for requests with registered buffers.
Benchmarking UDP with an optimised version of the selftest (see [1]), which
sends a bunch of requests, waits for completions and repeats. "+ flush" column
posts one additional "buffer-free" notification per request, and just "zc"
doesn't post buffer notifications at all.
NIC (requests / second):
IO size | non-zc | zc | zc + flush
4000 | 495134 | 606420 (+22%) | 558971 (+12%)
1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%)
1000 | 584677 | 592088 (+1.2%) | 560885 (-4%)
600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%)
dummy (requests / second):
IO size | non-zc | zc | zc + flush
8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%)
4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%)
1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%)
600 | 2106794 | 2381527 (+13%) | 2195295 (+4%)
Previously it also brought a massive performance speedup compared to the
msg_zerocopy tool (see [3]), which is probably not super interesting. There
is also an additional bunch of refcounting optimisations that was omitted from
the series for simplicity and as they don't change the picture drastically,
they will be sent as follow up, as well as flushing optimisations closing the
performance gap b/w two last columns.
For TCP on localhost (with hacks enabling localhost zerocopy) and including
additional overhead for receive:
IO size | non-zc | zc
1200 | 4174 | 4148
4096 | 7597 | 11228
Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
omitted optimisations will somewhat help, should look better for 4000,
but couldn't test properly because of setup problems.
Links:
liburing (benchmark + tests):
[1] https://github.com/isilence/liburing/tree/zc_v4
kernel repo:
[2] https://github.com/isilence/linux/tree/zc_v4
RFC v1:
[3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/
RFC v2:
https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/
Net patches based:
git@github.com:isilence/linux.git zc_v4-net-base
or
https://github.com/isilence/linux/tree/zc_v4-net-base
API design overview:
The series introduces an io_uring concept of notifactors. From the userspace
perspective it's an entity to which it can bind one or more requests and then
requesting to flush it. Flushing a notifier makes it impossible to attach new
requests to it, and instructs the notifier to post a completion once all
requests attached to it are completed and the kernel doesn't need the buffers
anymore.
Notifications are stored in notification slots, which should be registered as
an array in io_uring. Each slot stores only one notifier at any particular
moment. Flushing removes it from the slot and the slot automatically replaces
it with a new notifier. All operations with notifiers are done by specifying
an index of a slot it's currently in.
When registering a notification the userspace specifies a u64 tag for each
slot, which will be copied in notification completion entries as
cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
sequence number counting notifiers of a slot.
====================
Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Managed pages contain pinned userspace pages and controlled by upper
layers, there is no need in tracking skb->pfmemalloc for them. Introduce
a helper for filling frags but ignoring page tracking, it'll be needed
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some users like io_uring can do page pinning more efficiently, so we
want a way to delegate referencing to other subsystems. For that add
a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold
page references and upper layers are responsivle to managing page
lifetime.
It's allowed to convert skbs from managed to normal by calling
skb_zcopy_downgrade_managed(). The function will take all needed
page references and clear the flag. It's needed, for instance,
to avoid mixing managed modes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for custom iov_iter handling to msghdr. The idea is that
in-kernel subsystems want control over how an SG is split.
Signed-off-by: David Ahern <dsahern@kernel.org>
[pavel: move callback into msghdr]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add capability field which indicates the mask for wqe_counter which
connects between loopback CQE and the original WQE. With this connection
the driver can identify lost of the loopback CQE and reply PTP
synchronization with timestamp given in the original CQE.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When running KASAN with Tiny RCU (e.g. under ARCH=um, where
a working KASAN patch is now available), we don't get any
information on the original kfree_rcu() (or similar) caller
when a problem is reported, as Tiny RCU doesn't record this.
Add the recording, which required pulling kvfree_call_rcu()
out of line for the KASAN case since the recording function
(kasan_record_aux_stack_noalloc) is neither exported, nor
can we include kasan.h into rcutiny.h.
without KASAN, the patch has no size impact (ARCH=um kernel):
text data bss dec hex filename
6151515 4423154 33148520 43723189 29b29b5 linux
6151515 4423154 33148520 43723189 29b29b5 linux + patch
with KASAN, the impact on my build was minimal:
text data bss dec hex filename
13915539 7388050 33282304 54585893 340ea25 linux
13911266 7392114 33282304 54585684 340e954 linux + patch
-4273 +4064 +-0 -209
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.
LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html
Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a dedicated helper to handle the setgid bit when creating a new file
in a setgid directory. This is a preparatory patch for moving setgid
stripping into the vfs. The patch contains no functional changes.
Currently the setgid stripping logic is open-coded directly in
inode_init_owner() and the individual filesystems are responsible for
handling setgid inheritance. Since this has proven to be brittle as
evidenced by old issues we uncovered over the last months (see [1] to
[3] below) we will try to move this logic into the vfs.
Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
Link: https://lore.kernel.org/r/1657779088-2242-1-git-send-email-xuyang2018.jy@fujitsu.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
This reverts commit 28a6ed8e39f77f6ac613ec9b7461aa75e85fa79a.
The chrome platform driver changes need to come in through the platform
tree due to some api changes that showed up there that cause build
errors in linux-next
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20220719160821.5e68e30b@oak.ozlabs.ibm.com
Cc: Prashant Malani <pmalani@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
* irq/renesas-irqc:
: .
: New Renesas RZ/G2L IRQC driver from Lad Prabhakar, equipped with
: its companion GPIO driver.
: .
dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC
gpio: thunderx: Don't directly include asm-generic/msi.h
pinctrl: renesas: pinctrl-rzg2l: Add IRQ domain to handle GPIO interrupt
dt-bindings: pinctrl: renesas,rzg2l-pinctrl: Document the properties to handle GPIO IRQ
gpio: gpiolib: Allow free() callback to be overridden
irqchip: Add RZ/G2L IA55 Interrupt Controller driver
dt-bindings: interrupt-controller: Add Renesas RZ/G2L Interrupt Controller
gpio: Remove dynamic allocation from populate_parent_alloc_arg()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add support for the MT6331 PMIC with MT6332 Companion PMIC, found
in MT6795 Helio X10 smartphone platforms.
This combo has support for multiple devices but, for a start,
only the following have been implemented:
- Regulators (two instances, one in MT6331, one in MT6332)
- RTC (MT6331)
- Keys (MT6331)
- Interrupts (MT6331 also dispatches MT6332's interrupts)
There's more to be implemented, especially for MT6332, which
will come at a later stage.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220627123954.64299-1-angelogioacchino.delregno@collabora.com
|
|
Change 'receieved' to 'received' and 'recieve' to 'receive'.
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220623073333.5675-1-jiaming@nfschina.com
|
|
All implementations return 0, so simplify accordingly.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220619082655.53728-1-u.kleine-koenig@pengutronix.de
|
|
There is no in-tree machine that provides a struct twl4030_platform_data
since commit e92fc4f04a34 ("ARM: OMAP2+: Drop legacy board file for
LDP"). So assume dev_get_platdata() returns NULL in twl_probe() and
simplify accordingly.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220614152148.252820-1-u.kleine-koenig@pengutronix.de
|
|
MT8195 System Companion Processors(SCP) is a dual-core RISC-V MCU.
Add a new CrOS feature ID to represent the SCP's 2nd core.
The 1st core is referred to as 'core 0', and the 2nd core is referred
to as 'core 1'.
Signed-off-by: Tinghan Shen <tinghan.shen@mediatek.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220601112201.15510-16-tinghan.shen@mediatek.com
|
|
Add MT6357 PMIC IRQ support.
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220531124959.202787-6-fparent@baylibre.com
|
|
Adds support for PMIC keys, Regulator, and RTC for the MT6357 PMIC.
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220531124959.202787-5-fparent@baylibre.com
|
|
The driver never calls the disable callback, so drop the member from
the platform struct and all callbacks from the actual platform datas.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220530192430.2108217-4-u.kleine-koenig@pengutronix.de
|
|
None of the in-tree instantiations of struct t7l66xb_platform_data
provides a disable callback. So better don't dereference this function
pointer unconditionally. As there is no user, drop it completely instead
of calling it conditional.
This is a preparation for making platform remove callbacks return void.
Fixes: 1f192015ca5b ("mfd: driver for the T7L66XB TMIO SoC")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220530192430.2108217-3-u.kleine-koenig@pengutronix.de
|
|
My Bootlin address is preferred from now on.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220603155727.1232061-4-luca@lucaceresoli.net
|
|
'ib-mfd-edac-i2c-leds-pinctrl-platform-watchdog-5.20' and 'ib-mfd-soc-bcm-5.20' into ibs-for-mfd-merged
|
|
On top of looking at PULL_UP and PULL_DOWN flags, also look at
PULL_DISABLE and set the appropriate GPIO flag. The GPIO core will then
pass down this to controllers that support it.
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
This change prepares the gpio core to look at firmware flags and set
'FLAG_BIAS_DISABLE' if necessary. It works in similar way to
'GPIO_PULL_DOWN' and 'GPIO_PULL_UP'.
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
There is no user of these callbacks. The motivation for this change is
to stop returning an error code from the remove callback.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
There is no machine providing a teardown callback, so drop the unused
code.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
The last user, which in fact was a dead code, has gone a year ago,
previous one 3 years ago. On top of that we want to drop away the
legacy GPIO APIs in the kernel, so take a chance to get rid of
unused devm_gpio_free() and accompanying stuff.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|