Age | Commit message (Collapse) | Author |
|
BT core driver should addtionally check LMP feature bit
"Erroneous Data Reporting" instead of quirk
HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if
HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken.
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 2, Part C | page 587
This feature indicates whether the device is able to support the
Packet_Status_Flag and the HCI commands HCI_Write_Default_-
Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_-
Data_Reporting.
the quirk was introduced by 'commit cde1a8a99287 ("Bluetooth: btusb: Fix
and detect most of the Chinese Bluetooth controllers")' to mark HCI
commands HCI_Read|Write_Default_Erroneous_Data_Reporting broken by BT
device driver, but the reason why these two HCI commands are broken is
that feature "Erroneous Data Reporting" is not enabled by firmware, this
scenario is illustrated by below log of QCA controllers with USB I/F:
@ RAW Open: hcitool (privileged) version 2.22
< HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0
> HCI Event: Command Complete (0x0e) plen 68
Read Local Supported Commands (0x04|0x0002) ncmd 1
Status: Success (0x00)
Commands: 288 entries
......
Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
......
< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
> HCI Event: Command Complete (0x0e) plen 4
Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1
Status: Unknown HCI Command (0x01)
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 12
Read Local Supported Features (0x04|0x0003) ncmd 1
Status: Success (0x00)
Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87
3 slot packets
......
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Tested-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The bt_skb_sendmsg() function can't return NULL so there is no need to
check for that. Several of these checks were removed previously but
this one was missed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The scan response and advertising data needs to be tracked on a per
instance (adv_info) since when these instaces are removed so are their
data, to fix that new flags are introduced which is used to mark when
the data changes and then checked to confirm when the data needs to be
synced with the controller.
Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
When HCI_USERCHANNEL is used, unregister the suspend notifier when
binding and register when releasing. The userchannel socket should be
left alone after open is completed.
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The HCI command, event, and data packet processing workqueue is drained
to avoid deadlock in commit
76727c02c1e1 ("Bluetooth: Call drain_workqueue() before resetting state").
There is another delayed work, which will queue command to this drained
workqueue. Which results in the following error report:
Bluetooth: hci2: command 0x040f tx timeout
WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140
Workqueue: events hci_cmd_timeout
RIP: 0010:__queue_work+0xdad/0x1140
RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000
RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08
RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700
R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60
R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? queue_work_on+0xcb/0x110
? lockdep_hardirqs_off+0x90/0xd0
queue_work_on+0xee/0x110
process_one_work+0x996/0x1610
? pwq_dec_nr_in_flight+0x2a0/0x2a0
? rwlock_bug.part.0+0x90/0x90
? _raw_spin_lock_irq+0x41/0x50
worker_thread+0x665/0x1080
? process_one_work+0x1610/0x1610
kthread+0x2e9/0x3a0
? kthread_complete_and_exit+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the
timeout workqueue while command workqueue is draining.
Fixes: 76727c02c1e1 ("Bluetooth: Call drain_workqueue() before resetting state")
Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can.
Still no major regressions, most of the changes are still due to data
races fixes, plus the usual bunch of drivers fixes.
Previous releases - regressions:
- tcp/udp: make early_demux back namespacified.
- dsa: fix issues with vlan_filtering_is_global
Previous releases - always broken:
- ip: fix data-races around ipv4_net_table (round 2, 3 & 4)
- amt: fix validation and synchronization bugs
- can: fix detection of mcp251863
- eth: iavf: fix handling of dummy receive descriptors
- eth: lan966x: fix issues with MAC table
- eth: stmmac: dwmac-mediatek: fix clock issue
Misc:
- dsa: update documentation"
* tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
net/sched: cls_api: Fix flow action initialization
tcp: Fix data-races around sysctl_tcp_max_reordering.
tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
tcp: Fix a data-race around sysctl_tcp_rfc1337.
tcp: Fix a data-race around sysctl_tcp_stdurg.
tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
tcp: Fix data-races around sysctl_tcp_recovery.
tcp: Fix a data-race around sysctl_tcp_early_retrans.
tcp: Fix data-races around sysctl knobs related to SYN option.
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
ip: Fix data-races around sysctl_ip_prot_sock.
ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
can: rcar_canfd: Add missing of_node_put() in rcar_canfd_probe()
can: mcp251xfd: fix detection of mcp251863
Documentation: fix udp_wmem_min in ip-sysctl.rst
...
|
|
Jann reported a race between munmap() and unmap_mapping_range(), where
unmap_mapping_range() will no-op once unmap_vmas() has unlinked the
VMA; however munmap() will not yet have invalidated the TLBs.
Therefore unmap_mapping_range() will complete while there are still
(stale) TLB entries for the specified range.
Mitigate this by force flushing TLBs for VM_PFNMAP ranges.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now that architectures are no longer allowed to override
tlb_{start,end}_vma() re-arrange code so that there is only one
implementation for each of these functions.
This much simplifies trying to figure out what they actually do.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.
- MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
but does *NOT* want to call it for each VMA.
- MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
invalidate across multiple VMAs if possible.
With these it is possible to capture the three forms:
1) empty stubs;
select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
2) start: flush_cache_range(), end: empty;
select MMU_GATHER_MERGE_VMAS
3) start: flush_cache_range(), end: flush_tlb_range();
default
Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
REQ_PREFLUSH is turned into REQ_OP_FLUSH by the flush state machine
and thus never seen by a blk-mq based driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220721130916.1869719-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Recall that CXL capable address ranges, on ACPI platforms, are published
in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window
Structures). These windows represent both the actively mapped capacity
and the potential address space that can be dynamically assigned to a
new CXL decode configuration (region / interleave-set).
CXL endpoints like DDR DIMMs can be mapped at any physical address
including 0 and legacy ranges.
There is an expectation and requirement that the /proc/iomem interface
and the iomem_resource tree in the kernel reflect the full set of
platform address ranges. I.e. that every address range that platform
firmware and bus drivers enumerate be reflected as an iomem_resource
entry. The hard requirement to do this for CXL arises from the fact that
facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
iomem_resource ranges as free for software to use as proxy address
space. Without CXL publishing its potential address ranges in
iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
steal capacity reserved for runtime provisioning of new CXL regions.
So, iomem_resource needs to know about both active and potential CXL
resource ranges. The active CXL resources might already be reflected in
iomem_resource as "System RAM". insert_resource_expand_to_fit() handles
re-parenting "System RAM" underneath a CXL window.
The "_expand_to_fit()" behavior handles cases where a CXL window is not
a strict superset of an existing entry in the iomem_resource tree. The
"_expand_to_fit()" behavior is acceptable from the perspective of
resource allocation. The expansion happens because a conflicting
resource range is already populated, which means the resource boundary
expansion does not result in any additional free CXL address space being
made available. CXL address space allocation is always bounded by the
orginal unexpanded address range.
However, the potential for expansion does mean that something like
walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on
corner case platforms that cause the resource tree to expand a CXL
window resource over a range that is not decoded by CXL. This would be
an odd platform configuration, but if it becomes a problem in practice
the CXL subsytem could just publish an API that returns definitive
answers.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers
More Qualcomm driver changes for v5.20
This adds support for booting secondary cores, SPM, SMD-RPM and RPM
power-domain support for the MSM8909 platform.
It drops an unnecessary print in icc-bwmon, corrects SA8540P entries in
socinfo and a Kconfig build dependency for QCOM_RPMPD.
Lastly it continues to clean up up the Devicetree bindings for the
Qualcomm drivers.
* tag 'qcom-drivers-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
dt-bindings: soc: qcom: qcom,smd-rpm: add power-controller
dt-bindings: soc: qcom: aoss: document qcom,sm8450-aoss-qmp
dt-bindings: soc: qcom,rpmh-rsc: simplify qcom,tcs-config
ARM: mach-qcom: Add support for MSM8909
dt-bindings: arm: cpus: Document "qcom,msm8909-smp" enable-method
soc: qcom: spm: Add CPU data for MSM8909
dt-bindings: soc: qcom: spm: Add MSM8909 CPU compatible
soc: qcom: rpmpd: Add compatible for MSM8909
dt-bindings: power: qcom-rpmpd: Add MSM8909 power domains
soc: qcom: smd-rpm: Add compatible for MSM8909
dt-bindings: soc: qcom: smd-rpm: Add MSM8909
soc: qcom: icc-bwmon: Remove unnecessary print function dev_err()
soc: qcom: socinfo: Fix the id of SA8540P SoC
soc: qcom: Make QCOM_RPMPD depend on PM
Link: https://lore.kernel.org/r/20220720230648.2113609-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The peripheral may have several FIFOs, but some case just select
some FIFOs from them for data transfer, which means FIFO0 and FIFO2
may be selected. So add FIFO address stride support, 0 means all FIFOs
are continuous, 1 means 1 word stride between FIFOs. All stride between
FIFOs should be same.
Another option words_per_fifo means how many audio channel data copied
to one FIFO one time, 1 means one channel per FIFO, 2 means 2 channels
per FIFO.
If 'n_fifos_src = 4' and 'words_per_fifo = 2', it means the first two
words(channels) fetch from FIFO0 and then jump to FIFO1 for next two words,
and so on after the last FIFO3 fetched, roll back to FIFO0.
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/1657162829-9273-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Some IAX operation code nomenclatures are misleading or don't match with
others:
1. Operation code 0x4c is Zero Compress 32. IAX_OPCODE_DECOMP_32 is a
misleading name. Change it to IAX_OPCODE_ZERO_COMP_32.
2. Operation code 0x4d is Zero Compress 16. IAX_OPCODE_DECOMP_16 is a
misleading name. Change it to IAX_OPCODE_ZERO_COMP_16.
3. IAX_OPCDE_FIND_UNIQUE is corrected to match with other nomenclatures.
Co-developed-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20220707002052.1546361-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
DisplayLink ethernet devices require NTB buffers larger then 32kb
in order to run with highest performance.
This patch is changing upper limit of the rx and tx buffers.
Those buffers are initialized with CDC_NCM_NTB_DEF_SIZE_RX and
CDC_NCM_NTB_DEF_SIZE_TX which is 16kb so by default no device is
affected by increased limit.
Rx and tx buffer is increased under two conditions:
- Device need to advertise that it supports higher buffer size in
dwNtbMaxInMaxSize and dwNtbMaxOutMaxSize.
- cdc_ncm/rx_max and cdc_ncm/tx_max driver parameters must be adjusted
with udev rule or ethtool.
Summary of testing and performance results:
Tests were performed on following devices:
- DisplayLink DL-3xxx family device
- DisplayLink DL-6xxx family device
- ASUS USB-C2500 2.5G USB3 ethernet adapter
- Plugable USB3 1G USB3 ethernet adapter
- EDIMAX EU-4307 USB-C ethernet adapter
- Dell DBQBCBC064 USB-C ethernet adapter
Performance measurements were done with:
- iperf3 between two linux boxes
- http://openspeedtest.com/ instance running on local test machine
Insights from tests results:
- All except one from third party usb adapters were not affected by
increased buffer size to their advertised dwNtbOutMaxSize and
dwNtbInMaxSize.
Devices were generally reaching 912-940Mbps both download and upload.
Only EDIMAX adapter experienced decreased download size from
929Mbps to 827Mbps with iper3, with openspeedtest decrease was from
968Mbps to 886Mbps.
- DisplayLink DL-3xxx family devices experienced performance increase
with iperf3 download from 300Mbps to 870Mbps and
upload from 782Mbps to 844Mbps.
With openspeedtest download increased from 556Mbps to 873Mbps
and upload from 727Mbps to 973Mbps
- DiplayLink DL-6xxx family devices are not affected by
increased buffer size.
Signed-off-by: Łukasz Spintzyk <lukasz.spintzyk@synaptics.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220720060518.541-2-lukasz.spintzyk@synaptics.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
> Some changes to the way invalid MSR accesses are reported by the
> kernel is causing some problems with messages printed on the
> console.
>
> We have seen several cases of ex_handler_msr() printing invalid MSR
> accesses once but the callstack multiple times causing confusion on
> the console.
> The problem here is that another earlier commit (5.13):
>
> a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
>
> Modifies all the pr_*_once() calls to always return true claiming
> that no caller is ever checking the return value of the functions.
>
> This is why we are seeing the callstack printed without the
> associated printk() msg.
Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
implement DO_ONCE_LITE_IF() and fix the extable code.
Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Simplify nf_ct_get_tuple(), from Jackie Liu.
2) Add format to request_module() call, from Bill Wendling.
3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
hardware offload objects to be processed, from Vlad Buslov.
4) Missing rcu annotation and accessors in the netfilter tree,
from Florian Westphal.
5) Merge h323 conntrack helper nat hooks into single object,
also from Florian.
6) A batch of update to fix sparse warnings treewide,
from Florian Westphal.
7) Move nft_cmp_fast_mask() where it used, from Florian.
8) Missing const in nf_nat_initialized(), from James Yonan.
9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.
10) Use refcount_inc instead of _inc_not_zero in flowtable,
from Florian Westphal.
11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: xt_TPROXY: remove pr_debug invocations
netfilter: flowtable: prefer refcount_inc
netfilter: ipvs: Use the bitmap API to allocate bitmaps
netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
netfilter: nf_tables: move nft_cmp_fast_mask to where its used
netfilter: nf_tables: use correct integer types
netfilter: nf_tables: add and use BE register load-store helpers
netfilter: nf_tables: use the correct get/put helpers
netfilter: x_tables: use correct integer types
netfilter: nfnetlink: add missing __be16 cast
netfilter: nft_set_bitmap: Fix spelling mistake
netfilter: h323: merge nat hook pointers into one
netfilter: nf_conntrack: use rcu accessors where needed
netfilter: nf_conntrack: add missing __rcu annotations
netfilter: nf_flow_table: count pending offload workqueue tasks
net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
netfilter: conntrack: use correct format characters
netfilter: conntrack: use fallthrough to cleanup
====================
Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-07-17
1) Add resiliency for lost completions for PTP TX port timestamp
2) Report Header-data split state via ethtool
3) Decouple HTB code from main regular TX code
* tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: CT: Remove warning of ignore_flow_level support for non PF
net/mlx5e: Add resiliency for PTP TX port timestamp
net/mlx5: Expose ts_cqe_metadata_size2wqe_counter
net/mlx5e: HTB, move htb functions to a new file
net/mlx5e: HTB, change functions name to follow convention
net/mlx5e: HTB, remove priv from htb function calls
net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure
net/mlx5e: HTB, move stats and max_sqs to priv
net/mlx5e: HTB, move section comment to the right place
net/mlx5e: HTB, move ids to selq_params struct
net/mlx5e: HTB, reduce visibility of htb functions
net/mlx5e: Fix mqprio_rl handling on devlink reload
net/mlx5e: Report header-data split state through ethtool
====================
Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
They were updated in kernel/bpf/trampoline.c to fix another build
issue. We should to do the same for include/linux/bpf.h header.
Fixes: 3908fcddc65d ("bpf: fix lsm_cgroup build errors on esoteric configs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com
|
|
Instead of bouncing the function call to the driver op through a blocking
notifier just have the iommu layer call it directly.
Register each device that is being attached to the iommu with the lower
driver which then threads them on a linked list and calls the appropriate
driver op at the right time.
Currently the only use is if dma_unmap() is defined.
Also, fully lock all the debugging tests on the pinning path that a
dma_unmap is registered.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Instead of having drivers register the notifier with explicit code just
have them provide a dma_unmap callback op in their driver ops and rely on
the core code to wire it up.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Add support for SMN communication on family 17h model A0h and family 19h
models 60h-70h.
[ bp: Merge into a single patch. ]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220719195256.1516-1-mario.limonciello@amd.com
|
|
* irq/loongarch:
: .
: Merge the long awaited IRQ support for the LoongArch architecture.
:
: From the cover letter:
:
: "Currently, LoongArch based processors (e.g. Loongson-3A5000)
: can only work together with LS7A chipsets. The irq chips in
: LoongArch computers include CPUINTC (CPU Core Interrupt
: Controller), LIOINTC (Legacy I/O Interrupt Controller),
: EIOINTC (Extended I/O Interrupt Controller), PCH-PIC (Main
: Interrupt Controller in LS7A chipset), PCH-LPC (LPC Interrupt
: Controller in LS7A chipset) and PCH-MSI (MSI Interrupt Controller)."
:
: Note that this comes with non-official, arch private ACPICA
: definitions until the official ACPICA update is realeased.
: .
irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
irqchip: Add LoongArch CPU interrupt controller support
irqchip: Add Loongson Extended I/O interrupt controller support
irqchip/loongson-liointc: Add ACPI init support
irqchip/loongson-pch-msi: Add ACPI init support
irqchip/loongson-pch-pic: Add ACPI init support
irqchip: Add Loongson PCH LPC controller support
LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
LoongArch: Use ACPI_GENERIC_GSI for gsi handling
genirq/generic_chip: Export irq_unmap_generic_chip
ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
APCI: irq: Add support for multiple GSI domains
LoongArch: Provisionally add ACPICA data structures
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
For LoongArch, ACPI_IRQ_MODEL_LPIC is introduced, and then the
callback acpi_get_gsi_domain_id and acpi_gsi_to_irq_fallback are
implemented.
The acpi_get_gsi_domain_id callback returns related fwnode handle
of irqdomain for different GSI range.
The acpi_gsi_to_irq_fallback will create new mapping for gsi when
the mapping of it is not found.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-14-git-send-email-lvjianmin@loongson.cn
|
|
EIOINTC stands for "Extended I/O Interrupts" that described in Section
11.2 of "Loongson 3A5000 Processor Reference Manual". For more
information please refer Documentation/loongarch/irq-chip-model.rst.
Loongson-3A5000 has 4 cores per NUMA node, and each NUMA node has an
EIOINTC; while Loongson-3C5000 has 16 cores per NUMA node, and each NUMA
node has 4 EIOINTCs. In other words, 16 cores of one NUMA node in
Loongson-3C5000 are organized in 4 groups, each group connects to an
EIOINTC. We call the "group" here as an EIOINTC node, so each EIOINTC
node always includes 4 cores (both in Loongson-3A5000 and Loongson-
3C5000).
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-12-git-send-email-lvjianmin@loongson.cn
|
|
Some irq controllers have to re-implement a private version for
irq_generic_chip_ops, because they have a different xlate to translate
hwirq. Export irq_unmap_generic_chip to allow reusing in drivers.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-5-git-send-email-lvjianmin@loongson.cn
|
|
It appears that the generic version of acpi_gsi_to_irq() doesn't
fallback to establishing a mapping if there is no pre-existing
one while the x86 version does.
While arm64 seems unaffected by it, LoongArch is relying on the x86
behaviour. In an effort to prevent new architectures from reinventing
the proverbial wheel, provide an optional callback that the arch code
can set to restore the x86 behaviour.
Hopefully we can eventually get rid of this in the future once
the expected behaviour has been clarified.
Reported-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/1658314292-35346-4-git-send-email-lvjianmin@loongson.cn
|
|
In an unfortunate departure from the ACPI spec, the LoongArch
architecture split its GSI space across multiple interrupt
controllers.
In order to be able to reuse the core code and prevent
architectures from reinventing an already square wheel, offer
the arch code the ability to register a dispatcher function
that will return the domain fwnode for a given GSI.
The ARM GIC drivers are updated to support this (with a single
domain, as intended).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/1658314292-35346-3-git-send-email-lvjianmin@loongson.cn
|
|
Building perf for MIPS failed after 9f79b8b72339 ("uapi: simplify
__ARCH_FLOCK{,64}_PAD a little") with the following error:
CC
/home/fainelli/work/buildroot/output/bmips/build/linux-custom/tools/perf/trace/beauty/fcntl.o
In file included from
../../../../host/mipsel-buildroot-linux-gnu/sysroot/usr/include/asm/fcntl.h:77,
from ../include/uapi/linux/fcntl.h:5,
from trace/beauty/fcntl.c:10:
../include/uapi/asm-generic/fcntl.h:188:8: error: redefinition of
'struct flock'
struct flock {
^~~~~
In file included from ../include/uapi/linux/fcntl.h:5,
from trace/beauty/fcntl.c:10:
../../../../host/mipsel-buildroot-linux-gnu/sysroot/usr/include/asm/fcntl.h:63:8:
note: originally defined here
struct flock {
^~~~~
This is due to the local copy under
tools/include/uapi/asm-generic/fcntl.h including the toolchain's kernel
headers which already define 'struct flock' and define
HAVE_ARCH_STRUCT_FLOCK to future inclusions make a decision as to
whether re-defining 'struct flock' is appropriate or not.
Make sure what do not re-define 'struct flock'
when HAVE_ARCH_STRUCT_FLOCK is already defined.
Fixes: 9f79b8b72339 ("uapi: simplify __ARCH_FLOCK{,64}_PAD a little")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[arnd: sync with include/uapi/asm-generic/fcntl.h as well]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
THP_SWAP has been proven to improve the swap throughput significantly
on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
splitting THP after swapped out").
As long as arm64 uses 4K page size, it is quite similar with x86_64
by having 2MB PMD THP. THP_SWAP is architecture-independent, thus,
enabling it on arm64 will benefit arm64 as well.
A corner case is that MTE has an assumption that only base pages
can be swapped. We won't enable THP_SWAP for ARM64 hardware with
MTE support until MTE is reworked to coexist with THP_SWAP.
A micro-benchmark is written to measure thp swapout throughput as
below,
unsigned long long tv_to_ms(struct timeval tv)
{
return tv.tv_sec * 1000 + tv.tv_usec / 1000;
}
main()
{
struct timeval tv_b, tv_e;;
#define SIZE 400*1024*1024
volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (!p) {
perror("fail to get memory");
exit(-1);
}
madvise(p, SIZE, MADV_HUGEPAGE);
memset(p, 0x11, SIZE); /* write to get mem */
gettimeofday(&tv_b, NULL);
madvise(p, SIZE, MADV_PAGEOUT);
gettimeofday(&tv_e, NULL);
printf("swp out bandwidth: %ld bytes/ms\n",
SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
}
Testing is done on rk3568 64bit Quad Core Cortex-A55 platform -
ROCK 3A.
thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests)
thp swp throughput w/ patch: 3331bytes/ms (mean of 10 tests)
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Link: https://lore.kernel.org/r/20220720093737.133375-1-21cnbao@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
this is a pull request of 29 patches for net-next/master.
The first 6 patches target the slcan driver. Dan Carpenter contributes
a hardening patch, followed by 5 cleanup patches.
Biju Das contributes 5 patches to prepare the sja1000 driver to
support the Renesas RZ/N1 SJA1000 CAN controller.
Dario Binacchi's patch for the slcan driver fixes a sleep with held
spin lock.
Another patch by Dario Binacchi fixes a wrong comment in the c_can
driver.
Pavel Pisa updates the CTU CAN FD IP core registers.
Stephane Grosjean contributes 3 patches to the peak_usb driver for
cleanups and support of a new MCU.
The last 12 patches are by Vincent Mailhol, they fix and improve the
txerr and rxerr reporting in all CAN drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During a subsystem reset the Topology-Change-Report is cleared.
Let's give userland the possibility to clear the MTCR in the case
of a subsystem reset.
To migrate the MTCR, we give userland the possibility to
query the MTCR state.
We indicate KVM support for the CPU topology facility with a new
KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20220714194334.127812-1-pmorel@linux.ibm.com>
Link: https://lore.kernel.org/all/20220714194334.127812-1-pmorel@linux.ibm.com/
[frankja@linux.ibm.com: Simple conflict resolution in Documentation/virt/kvm/api.rst]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
Currently, drivers are using magic numbers to derive the CAN error
states from the error counter. Add three macro declarations to
remediate this.
For reference, the error-active, error-passive and bus-off are defined
in ISO 11898, section 12.1.4.2 "Error counting". Although ISO 11898
does not define error-warning state, this extra value is also commonly
used and is thus also added.
Link: https://lore.kernel.org/all/20220719143550.3681-13-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add a dedicated flag in uapi/linux/can/error.h to notify the userland
that fields data[6] and data[7] of the CAN error frame were
respectively populated with the tx and rx error counters.
For all driver tree-wide, set up this flags whenever needed.
Link: https://lore.kernel.org/all/20220719143550.3681-12-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Currently, data[5..7] of struct can_frame, when used as a CAN error
frame, are defined as being "controller specific". Device specific
behaviours are problematic because it prevents someone from writing
code which is portable between devices.
As a matter of fact, data[5] is never used, data[6] is always used to
report TX error counter and data[7] is always used to report RX error
counter. can-utils also relies on this.
This patch updates the comment in the uapi header to specify that
data[5] is reserved (and thus should not be used) and that data[6..7]
are used for error counters.
Fixes: 0d66548a10cb ("[CAN]: Add PF_CAN core module")
Link: https://lore.kernel.org/all/20220719143550.3681-11-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
the last caller has been removed with commit 96f5e66e8a79 ("mac80211: fix
aggregation for hardware with ampdu queues"), so it's safe to remove this
function.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Many binary attributes need to limit access to CAP_SYS_ADMIN only; ie
many binary attributes specify is_visible with 0400 or 0600.
Make setting the permissions of such attributes more explicit by
defining BIN_ATTR_ADMIN_{RO,RW}.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-6-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Introduced in a PCIe r6.0, sec 6.30, DOE provides a config space based
mailbox with standard protocol discovery. Each mailbox is accessed
through a DOE Extended Capability.
Each DOE mailbox must support the DOE discovery protocol in addition to
any number of additional protocols.
Define core PCIe functionality to manage a single PCIe DOE mailbox at a
defined config space offset. Functionality includes iterating,
creating, query of supported protocol, and task submission. Destruction
of the mailboxes is device managed.
Cc: "Li, Ming" <ming4.li@intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-4-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This ID is used in DOE headers to identify protocols that are defined
within the PCI Express Base Specification, PCIe r6.0, sec 6.30.1.1 table
6-32.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220719205249.566684-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux
Pavel Begunkov says:
====================
io_uring zerocopy send
The patchset implements io_uring zerocopy send. It works with both registered
and normal buffers, mixing is allowed but not recommended. Apart from usual
request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
the userspace when buffers are freed and can be reused (see API design below),
which is delivered into io_uring's Completion Queue. Those "buffer-free"
notifications are not necessarily per request, but the userspace has control
over it and should explicitly attaching a number of requests to a single
notification. The series also adds some internal optimisations when used with
registered buffers like removing page referencing.
From the kernel networking perspective there are two main changes. The first
one is passing ubuf_info into the network layer from io_uring (inside of an
in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
caching on the io_uring side, but also helps to avoid cross-referencing
and synchronisation problems. The second part is an optional optimisation
removing page referencing for requests with registered buffers.
Benchmarking UDP with an optimised version of the selftest (see [1]), which
sends a bunch of requests, waits for completions and repeats. "+ flush" column
posts one additional "buffer-free" notification per request, and just "zc"
doesn't post buffer notifications at all.
NIC (requests / second):
IO size | non-zc | zc | zc + flush
4000 | 495134 | 606420 (+22%) | 558971 (+12%)
1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%)
1000 | 584677 | 592088 (+1.2%) | 560885 (-4%)
600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%)
dummy (requests / second):
IO size | non-zc | zc | zc + flush
8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%)
4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%)
1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%)
600 | 2106794 | 2381527 (+13%) | 2195295 (+4%)
Previously it also brought a massive performance speedup compared to the
msg_zerocopy tool (see [3]), which is probably not super interesting. There
is also an additional bunch of refcounting optimisations that was omitted from
the series for simplicity and as they don't change the picture drastically,
they will be sent as follow up, as well as flushing optimisations closing the
performance gap b/w two last columns.
For TCP on localhost (with hacks enabling localhost zerocopy) and including
additional overhead for receive:
IO size | non-zc | zc
1200 | 4174 | 4148
4096 | 7597 | 11228
Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
omitted optimisations will somewhat help, should look better for 4000,
but couldn't test properly because of setup problems.
Links:
liburing (benchmark + tests):
[1] https://github.com/isilence/liburing/tree/zc_v4
kernel repo:
[2] https://github.com/isilence/linux/tree/zc_v4
RFC v1:
[3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/
RFC v2:
https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/
Net patches based:
git@github.com:isilence/linux.git zc_v4-net-base
or
https://github.com/isilence/linux/tree/zc_v4-net-base
API design overview:
The series introduces an io_uring concept of notifactors. From the userspace
perspective it's an entity to which it can bind one or more requests and then
requesting to flush it. Flushing a notifier makes it impossible to attach new
requests to it, and instructs the notifier to post a completion once all
requests attached to it are completed and the kernel doesn't need the buffers
anymore.
Notifications are stored in notification slots, which should be registered as
an array in io_uring. Each slot stores only one notifier at any particular
moment. Flushing removes it from the slot and the slot automatically replaces
it with a new notifier. All operations with notifiers are done by specifying
an index of a slot it's currently in.
When registering a notification the userspace specifies a u64 tag for each
slot, which will be copied in notification completion entries as
cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
sequence number counting notifiers of a slot.
====================
Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Managed pages contain pinned userspace pages and controlled by upper
layers, there is no need in tracking skb->pfmemalloc for them. Introduce
a helper for filling frags but ignoring page tracking, it'll be needed
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some users like io_uring can do page pinning more efficiently, so we
want a way to delegate referencing to other subsystems. For that add
a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold
page references and upper layers are responsivle to managing page
lifetime.
It's allowed to convert skbs from managed to normal by calling
skb_zcopy_downgrade_managed(). The function will take all needed
page references and clear the flag. It's needed, for instance,
to avoid mixing managed modes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for custom iov_iter handling to msghdr. The idea is that
in-kernel subsystems want control over how an SG is split.
Signed-off-by: David Ahern <dsahern@kernel.org>
[pavel: move callback into msghdr]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add capability field which indicates the mask for wqe_counter which
connects between loopback CQE and the original WQE. With this connection
the driver can identify lost of the loopback CQE and reply PTP
synchronization with timestamp given in the original CQE.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When running KASAN with Tiny RCU (e.g. under ARCH=um, where
a working KASAN patch is now available), we don't get any
information on the original kfree_rcu() (or similar) caller
when a problem is reported, as Tiny RCU doesn't record this.
Add the recording, which required pulling kvfree_call_rcu()
out of line for the KASAN case since the recording function
(kasan_record_aux_stack_noalloc) is neither exported, nor
can we include kasan.h into rcutiny.h.
without KASAN, the patch has no size impact (ARCH=um kernel):
text data bss dec hex filename
6151515 4423154 33148520 43723189 29b29b5 linux
6151515 4423154 33148520 43723189 29b29b5 linux + patch
with KASAN, the impact on my build was minimal:
text data bss dec hex filename
13915539 7388050 33282304 54585893 340ea25 linux
13911266 7392114 33282304 54585684 340e954 linux + patch
-4273 +4064 +-0 -209
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|