Age | Commit message (Collapse) | Author |
|
posix_timer_add() tries to allocate a posix timer ID by starting from the
cached ID which was stored by the last successful allocation.
This is done in a loop searching the ID space for a free slot one by
one. The loop has to terminate when the search wrapped around to the
starting point.
But that's racy vs. establishing the starting point. That is read out
lockless, which leads to the following problem:
CPU0 CPU1
posix_timer_add()
start = sig->posix_timer_id;
lock(hash_lock);
... posix_timer_add()
if (++sig->posix_timer_id < 0)
start = sig->posix_timer_id;
sig->posix_timer_id = 0;
So CPU1 can observe a negative start value, i.e. -1, and the loop break
never happens because the condition can never be true:
if (sig->posix_timer_id == start)
break;
While this is unlikely to ever turn into an endless loop as the ID space is
huge (INT_MAX), the racy read of the start value caught the attention of
KCSAN and Dmitry unearthed that incorrectness.
Rewrite it so that all id operations are under the hash lock.
Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-updates-2023-06-16
1) Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.
2) Debugfs for mlx5 eswitch bridge offloads
3) Added two new counters for vport stats
4) Minor Fixups and cleanups for net-next branch
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:
- Avoid deadlocks on resume from sleep by delaying scsi rescan until
the scsi device is also fully resumed.
* tag 'ata-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-scsi: Avoid deadlock on rescan after device resume
|
|
We implement KVM device interface for in-kernel AIA irqchip so that
user-space can use KVM device ioctls to create, configure, and destroy
in-kernel AIA irqchip.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Per-VMA locking allows us to lock a struct vm_area_struct without
taking the process-wide mmap lock in read mode.
Consider a process workload where the mmap lock is taken constantly in
write mode. In this scenario, all zerocopy receives are periodically
blocked during that period of time - though in principle, the memory
ranges being used by TCP are not touched by the operations that need
the mmap write lock. This results in performance degradation.
Now consider another workload where the mmap lock is never taken in
write mode, but there are many TCP connections using receive zerocopy
that are concurrently receiving. These connections all take the mmap
lock in read mode, but this does induce a lot of contention and atomic
ops for this process-wide lock. This results in additional CPU
overhead caused by contending on the cache line for this lock.
However, with per-vma locking, both of these problems can be avoided.
As a test, I ran an RPC-style request/response workload with 4KB
payloads and receive zerocopy enabled, with 100 simultaneous TCP
connections. I measured perf cycles within the
find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
without per-vma locking enabled.
When using process-wide mmap semaphore read locking, about 1% of
measured perf cycles were within this path. With per-VMA locking, this
value dropped to about 0.45%.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is part of the effort to remove the empty element at the end of
ctl_table structs. "child" was a deprecated elem in this struct and was
being used to differentiate between two types of ctl_tables: "normal"
and "permanently emtpy".
What changed?:
* Replace "child" with an enumeration that will have two values: the
default (0) and the permanently empty (1). The latter is left at zero
so when struct ctl_table is created with kzalloc or in a local
context, it will have the zero value by default. We document the
new enum with kdoc.
* Remove the "empty child" check from sysctl_check_table
* Remove count_subheaders function as there is no longer a need to
calculate how many headers there are for every child
* Remove the recursive call to unregister_sysctl_table as there is no
need to traverse down the child tree any longer
* Add a new SYSCTL_PERM_EMPTY_DIR binary flag
* Remove the last remanence of child from partport/procfs.c
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
When an ATA port is resumed from sleep, the port is reset and a power
management request issued to libata EH to reset the port and rescanning
the device(s) attached to the port. Device rescanning is done by
scheduling an ata_scsi_dev_rescan() work, which will execute
scsi_rescan_device().
However, scsi_rescan_device() takes the generic device lock, which is
also taken by dpm_resume() when the SCSI device is resumed as well. If
a device rescan execution starts before the completion of the SCSI
device resume, the rcu locking used to refresh the cached VPD pages of
the device, combined with the generic device locking from
scsi_rescan_device() and from dpm_resume() can cause a deadlock.
Avoid this situation by changing struct ata_port scsi_rescan_task to be
a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
modified to check if the SCSI device associated with the ATA device that
must be rescanned is not suspended. If the SCSI device is still
suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
execution after an arbitrary delay of 5ms.
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Joe Breuer <linux-kernel@jmbreuer.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>
|
|
These commits
a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
because that memory will be correctly marked as decrypted or encrypted
for all VM types (CoCo or normal). But problems ensue when CPUs in the
VM go online or offline after virtual PCI devices have been configured.
When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
may call the virtual PCI driver and fault trying to use the as yet
uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
VM if the MMIO read and write hypercalls are used from state
CPUHP_AP_IRQ_AFFINITY_ONLINE.
When a CPU is taken offline, IRQs may be reassigned in state
CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
use the hyperv_pcpu_input_arg that has already been freed by a
higher state.
Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
early enough.
Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
a CPU goes offline. Retain the allocated memory, and reuse it if
the CPU comes back online later.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
This event brackets the svcrdma_post_* trace points. If this trace
event is enabled but does not appear as expected, that indicates a
chunk_ctxt leak.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since no in-tree driver supports it, sht3x_platform_data has been
removed and the relevant properties have been moved to sht3x_data.
Signed-off-by: JuenKit Yip <JuenKit_Yip@hotmail.com>
Link: https://lore.kernel.org/r/DB4PR10MB626126FB7226D5AF341197449258A@DB4PR10MB6261.EURPRD10.PROD.OUTLOOK.COM
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.
To reproduce: Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()). This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].
As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.
The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender. This problem has previously been identified in
RFC 7323, appendix F [1].
The Linux kernel currently adheres to never shrinking the window.
In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session. A receive buffer full condition should instead result
in a zero window and an indefinite wait.
In practice, this problem is largely hidden for most flows. It
is not applicable to mice flows. Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.
But this problem does show up for other types of flows. Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time. In these cases, we directly
encounter the problem described in [1].
RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].
This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf). This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window
Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/
[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323
Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is unused since switch to psched_l2t_ns().
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230615124810.34020-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
iort_pmsi_get_dev_id() has a __weak definition in the driver, and
an override in arm64 specific code, but the declaration is conditional
and not always seen when the copy in the driver gets built:
drivers/irqchip/irq-gic-v3-its-platform-msi.c:41:12: error: no previous prototype for 'iort_pmsi_get_dev_id' [-Werror=missing-prototypes]
Move the existing declaration out of the #ifdef block to ensure
it can be seen in all configurations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230516200516.554663-5-arnd@kernel.org
|
|
Building with "W=1" warns about missing declarations for
two functions in the mmp irqchip driver:
drivers/irqchip/irq-mmp.c:248:13: error: no previous prototype for 'icu_init_irq'
drivers/irqchip/irq-mmp.c:271:13: error: no previous prototype for 'mmp2_init_icu'
The declarations are present in an unused header, but since there is no
caller, it's best to just remove the functions and the header completely,
making the driver DT-only to match the state of the platform.
Fixes: 77acc85ce797 ("ARM: mmp: remove device definitions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230516200516.554663-2-arnd@kernel.org
|
|
As described in commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between
shrink_slab and dm_pool_abort_metadata"), ABBA deadlocks will be
triggered because shrinker_rwsem currently needs to held by
dm_pool_abort_metadata() as a side-effect of thin-pool metadata
operation failure.
The following three problem scenarios have been noticed:
1) Described by commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between
shrink_slab and dm_pool_abort_metadata")
2) shrinker_rwsem and throttle->lock
P1(drop cache) P2(kworker)
drop_caches_sysctl_handler
drop_slab
shrink_slab
down_read(&shrinker_rwsem) - LOCK A
do_shrink_slab
super_cache_scan
prune_icache_sb
dispose_list
evict
ext4_evict_inode
ext4_clear_inode
ext4_discard_preallocations
ext4_mb_load_buddy_gfp
ext4_mb_init_cache
ext4_wait_block_bitmap
__ext4_error
ext4_handle_error
ext4_commit_super
...
dm_submit_bio
do_worker
throttle_work_update
down_write(&t->lock) -- LOCK B
process_deferred_bios
commit
metadata_operation_failed
dm_pool_abort_metadata
dm_block_manager_create
dm_bufio_client_create
register_shrinker
down_write(&shrinker_rwsem)
-- LOCK A
thin_map
thin_bio_map
thin_defer_bio_with_throttle
throttle_lock
down_read(&t->lock) - LOCK B
3) shrinker_rwsem and wait_on_buffer
P1(drop cache) P2(kworker)
drop_caches_sysctl_handler
drop_slab
shrink_slab
down_read(&shrinker_rwsem) - LOCK A
do_shrink_slab
...
ext4_wait_block_bitmap
__ext4_error
ext4_handle_error
jbd2_journal_abort
jbd2_journal_update_sb_errno
jbd2_write_superblock
submit_bh
// LOCK B
// RELEASE B
do_worker
throttle_work_update
down_write(&t->lock) - LOCK B
process_deferred_bios
process_bio
commit
metadata_operation_failed
dm_pool_abort_metadata
dm_block_manager_create
dm_bufio_client_create
register_shrinker
register_shrinker_prepared
down_write(&shrinker_rwsem) - LOCK A
bio_endio
wait_on_buffer
__wait_on_buffer
Fix these by resetting dm_bufio_client without holding shrinker_rwsem.
Fixes: 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Add a tracepoint for when a CSD is queued to a remote CPU's
call_single_queue. This allows finding exactly which CPU queued a given CSD
when looking at a csd_function_{entry,exit} event, and also enables us to
accurately measure IPI delivery time with e.g. a synthetic event:
$ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\
/sys/kernel/tracing/events/smp/csd_queue_cpu/trigger
$ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\
/sys/kernel/tracing/synthetic_events
$ echo \
'hist:keys=common_cpu,csd.hex:'\
'time=common_timestamp.usecs-$ts:'\
'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\
/sys/kernel/tracing/events/smp/csd_function_entry/trigger
$ trace-cmd record -e 'synthetic:csd_latency' hackbench
$ trace-cmd report
<...>-467 [001] 21.824263: csd_queue_cpu: cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8
<...>-467 [001] 21.824280: ipi_send_cpu: cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0
<...>-489 [000] 21.824299: csd_function_entry: func=sched_ttwu_pending csd=0xffff8880076148b8
<...>-489 [000] 21.824320: csd_latency: dst_cpu=0, csd=18446612682193848504, time=36
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com
|
|
The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources
of IPIs targeting CPUs running latency-sensitive applications.
For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are
sufficient to find them and work on getting rid of them. In some setups
however, not *all* IPIs are to be suppressed, but long-running IPI
callbacks can still be problematic.
Add a pair of tracepoints to mark the start and end of processing a CSD IPI
callback, similar to what exists for softirq, workqueue or timer callbacks.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com
|
|
Add needed HW bits for querying local loopback counter and the
HCA capability for it.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.
If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.
Added capability bit for supporting sync_reset_unload event.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This fixes a spinlock-initialization regression in SRCU that causes
the SRCU notifier to fail.
The fix simply adds the initialization, but introduces a #ifdef
because there is no spinlock to initialize for the Tiny SRCU used in
!SMP builds.
Yes, it would be nice to abstract this somehow in order to hide it in
SRCU, but I still don't see a good way of doing this"
* tag 'urgent-rcu.2023.06.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
notifier: Initialize new struct srcu_usage field
|
|
vfio-cdx driver enables IOCTLs for user space to query
MMIO regions for CDX devices and mmap them. This change
also adds support for reset of CDX devices. With VFIO
enabled on CDX devices, user-space applications can also
exercise DMA securely via IOMMU on these devices.
This change adds the VFIO CDX driver and enables the following
ioctls for CDX devices:
- VFIO_DEVICE_GET_INFO:
- VFIO_DEVICE_GET_REGION_INFO
- VFIO_DEVICE_RESET
Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20230531124557.11009-1-nipun.gupta@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Test and enable PCIe AtomicOp completer support of various widths and
report via device-info capability to userspace.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Robin Voetter <robin@streamhpc.com>
Tested-by: Robin Voetter <robin@streamhpc.com>
Link: https://lore.kernel.org/r/20230519214748.402003-1-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
scsi_target_block rework"
Martin Wilck <mwilck@suse.com> says:
This patch series addresses some issues we saw in a test setup with a
large number of SCSI LUNs. The first two patches simply increase the
number of available sg and bsg devices. 3-5 fix a large delay we
encountered between blocking a Fibre Channel remote port and the
dev_loss_tmo. 6 renames scsi_target_block() to scsi_block_targets(),
and makes additional changes to this API, as suggested in the review
of the v2 series. 7 improves a warning message.
Link: https://lore.kernel.org/r/20230614103616.31857-1-mwilck@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
All callers (fc_remote_port_delete(), __iscsi_block_session(),
__srp_start_tl_fail_timers(), srp_reconnect_rport(), snic_tgt_del()) pass
parent devices of scsi_target devices to scsi_target_block().
Rename the function to scsi_block_targets(), and simplify it by assuming
that it is always passed a parent device. Also, have callers pass the
Scsi_Host pointer to scsi_block_targets(), as every caller has this pointer
readily available.
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-7-mwilck@suse.com
Cc: Karan Tilak Kumar <kartilak@cisco.com>
Cc: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Now that the direct I/O helpers have switched to use
iov_iter_extract_pages, these helpers are unused.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230614140341.521331-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that all block direct I/O helpers use page pinning, this flag is
unused.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230614140341.521331-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some hosts do not implement SQ Run Time Command (SQRTC) register, thus we
need this quirk to skip the related flow.
Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com>
Link: https://lore.kernel.org/r/20230612085817.12275-3-powen.kao@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR is introduced for hosts that implement a
different interrupt topology from the UFSHCI 4.0 spec. Some hosts raise
per hw queue interrupt in addition to CQES (traditional) when ESI is
disabled.
Enabling this quirk will disable CQES and use only per hw queue interrupt.
Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com>
Link: https://lore.kernel.org/r/20230612085817.12275-2-powen.kao@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In MCQ mode, when a device command uses a hardware queue shared with other
commands, a race condition may occur in the following scenario:
1. A device command is completed in CQx with CQE entry "e".
2. The interrupt handler copies the "cqe" pointer to "hba->dev_cmd.cqe"
and completes "hba->dev_cmd.complete".
3. The "ufshcd_wait_for_dev_cmd()" function is awakened and retrieves the
OCS value from "hba->dev_cmd.cqe".
However, there is a possibility that the CQE entry "e" will be overwritten
by newly completed commands in CQx, resulting in an incorrect OCS value
being received by "ufshcd_wait_for_dev_cmd()".
To avoid this race condition, the OCS value should be immediately copied to
the struct "lrb" of the device command. Then "ufshcd_wait_for_dev_cmd()"
can retrieve the OCS value from the struct "lrb".
Fixes: 57b1c0ef89ac ("scsi: ufs: core: mcq: Add support to allocate multiple queues")
Suggested-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Link: https://lore.kernel.org/r/20230610021553.1213-2-powen.kao@mediatek.com
Tested-by: Po-Wen Kao <powen.kao@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The introduction of the macro IOPRIO_PRIO_LEVEL() in commit eca2040972b4
("scsi: block: ioprio: Clean up interface definition") results in an
iopriority level to always be masked using the macro IOPRIO_LEVEL_MASK, and
thus to the kernel always seeing an acceptable value for an I/O priority
level when checked in ioprio_check_cap(). Before this patch, this function
would return an error for some (but not all) invalid values for a level
valid range of [0..7].
Restore and improve the detection of invalid priority levels by introducing
the inline function ioprio_value() to check an ioprio class, level and hint
value before combining these fields into a single value to be used with
ioprio_set() or AIOs. If an invalid value for the class, level or hint of
an ioprio is detected, ioprio_value() returns an ioprio using the class
IOPRIO_CLASS_INVALID, indicating an invalid value and causing
ioprio_check_cap() to return -EINVAL.
Fixes: 6c913257226a ("scsi: block: Introduce ioprio hints")
Fixes: eca2040972b4 ("scsi: block: ioprio: Clean up interface definition")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230608095556.124001-1-dlemoal@kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
for-6.5/block
Pull NVMe updates from Keith:
"nvme updates for Linux 6.5
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner)"
* tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme: (27 commits)
nvme: forward port sysfs delete fix
nvme: skip optional id ctrl csi if it failed
nvme-core: use nvme_ns_head_multipath instead of ns->head->disk
nvmet-fcloop: Do not wait on completion when unregister fails
nvme-fabrics: open code __nvmf_host_find()
nvme-fabrics: error out to unlock the mutex
nvme: Increase block size variable size to 32-bit
nvme-fcloop: no need to return from void function
nvmet-auth: remove unnecessary break after goto
nvmet-auth: remove some dead code
nvme-core: remove redundant check from nvme_init_ns_head
nvme: move sysfs code to a dedicated sysfs.c file
nvme-fabrics: prevent overriding of existing host
nvme-fabrics: check hostid using uuid_equal
nvme-fabrics: unify common code in admin and io queue connect
nvmet: reorder fields in 'struct nvmefc_fcp_req'
nvmet: reorder fields in 'struct nvme_dhchap_queue_context'
nvmet: reorder fields in 'struct nvmf_ctrl_options'
nvme: reorder fields in 'struct nvme_ctrl'
nvmet: reorder fields in 'struct nvmet_sq'
...
|
|
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.
It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.
This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.
1: https://github.com/osandov/drgn
2: https://github.com/osandov/drgn/issues/303
Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
|
|
kthread_park and wait_woken have a similar race that
kthread_stop and wait_woken used to have before it was fixed in
commit cb6538e740d7 ("sched/wait: Fix a kthread race with
wait_woken()"). Extend that fix to also cover kthread_park.
[jstultz: Made changes suggested by Peter to optimize
memory loads]
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20230602212350.535358-1-jstultz@google.com
|
|
All callers of set_sched_topology() are within __init section. Mark
it __init too.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20230603073645.1173332-1-linmiaohe@huawei.com
|
|
Reiji reports that the arm64 implementation of arch_perf_update_userpage()
is now ignored and replaced by the dummy stub in core code.
This seems to happen since the PMUv3 driver was moved to driver/perf.
As it turns out, dropping the __weak attribute from the *prototype*
of the function solves the problem. You're right, this doesn't seem
to make much sense. And yet... It appears that both symbols get
flagged as weak, and that the first one to appear in the link order
wins:
$ nm drivers/perf/arm_pmuv3.o|grep arch_perf_update_userpage
0000000000001db0 W arch_perf_update_userpage
Dropping the attribute from the prototype restores the expected
behaviour, and arm64 is able to enjoy arch_perf_update_userpage()
again.
Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf")
Fixes: f1ec3a517b43 ("kernel/events: Add a missing prototype for arch_perf_update_userpage()")
Reported-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Reiji Watanabe <reijiw@google.com>
Link: https://lkml.kernel.org/r/20230616114831.3186980-1-maz@kernel.org
|
|
The ${atomic}_dec_if_positive() ops are unlike all the other conditional
atomic ops. Rather than returning a boolean success value, these return
the value that the atomic variable would be updated to, even when no
update is performed.
We missed this when adding kerneldoc comments, and the documentation for
${atomic}_dec_if_positive() erroneously states:
| Return: @true if @v was updated, @false otherwise.
Ideally we'd clean this up by aligning ${atomic}_dec_if_positive() with
the usual atomic op conventions: with ${atomic}_fetch_dec_if_positive()
for those who care about the value of the varaible, and
${atomic}_dec_if_positive() returning a boolean success value.
In the mean time, align the documentation with the current reality.
Fixes: ad8110706f381170 ("locking/atomic: scripts: generate kerneldoc comments")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20230615132734.1119765-1-mark.rutland@arm.com
|
|
So we can apply the tlv320aic3xxx DT conversion.
|
|
There is a class of interrupt controllers out there that, once they
have signalled a given interrupt number, will still signal incoming
instances of the *same* interrupt despite the original interrupt
not having been EOIed yet.
As long as the new interrupt reaches the *same* CPU, nothing bad
happens, as that CPU still has its interrupts globally disabled,
and we will only take the new interrupt once the interrupt has
been EOIed.
However, things become more "interesting" if an affinity change comes
in while the interrupt is being handled. More specifically, while
the per-irq lock is being dropped. This results in the affinity change
taking place immediately. At this point, there is nothing that prevents
the interrupt from firing on the new target CPU. We end-up with the
interrupt running concurrently on two CPUs, which isn't a good thing.
And that's where things become worse: the new CPU notices that the
interrupt handling is in progress (irq_may_run() return false), and
*drops the interrupt on the floor*.
The whole race looks like this:
CPU 0 | CPU 1
-----------------------------|-----------------------------
interrupt start |
handle_fasteoi_irq | set_affinity(CPU 1)
handler |
... | interrupt start
... | handle_fasteoi_irq -> early out
handle_fasteoi_irq return | interrupt end
interrupt end |
If the interrupt was an edge, too bad. The interrupt is lost, and
the system will eventually die one way or another. Not great.
A way to avoid this situation is to detect this problem at the point
we handle the interrupt on the new target. Instead of dropping the
interrupt, use the resend mechanism to force it to be replayed.
Also, in order to limit the impact of this workaround to the pathetic
architectures that require it, gate it behind a new irq flag aptly
named IRQD_RESEND_WHEN_IN_PROGRESS.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: reworded commit mesage]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
|
|
As we're about to use the last bit available in the IRQD_* state
flags, rewrite these flags with BIT(), which ensures that these
constant do not represent a signed value.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
'slab/for-6.5/slab-deprecate' into slab/for-next
Merge the feature branches scheduled for 6.5:
- replace the usage of weak PRNGs, by David Keisar Schmidt
- introduce the SLAB_NO_MERGE kmem_cache flag, by Jesper Dangaard Brouer
- deprecate CONFIG_SLAB, with a planned removal, by myself
|
|
Everything is converted over to arch_cpu_finalize_init(). Remove the
check_bugs() leftovers including the empty stubs in asm-generic, alpha,
parisc, powerpc and xtensa.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20230613224545.553215951@linutronix.de
|
|
check_bugs() has become a dumping ground for all sorts of activities to
finalize the CPU initialization before running the rest of the init code.
Most are empty, a few do actual bug checks, some do alternative patching
and some cobble a CPU advertisement string together....
Aside of that the current implementation requires duplicated function
declaration and mostly empty header files for them.
Provide a new function arch_cpu_finalize_init(). Provide a generic
declaration if CONFIG_ARCH_HAS_CPU_FINALIZE_INIT is selected and a stub
inline otherwise.
This requires a temporary #ifdef in start_kernel() which will be removed
along with check_bugs() once the architectures are converted over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224544.957805717@linutronix.de
|
|
USB4 v2 bumps the per-lane speed up to 40 Gb/s. Also the lanes are
always bonded which gives 80 Gb/s symmetric link (and 120/40 Gb/s
asymmetric). This updates the speed and width of routers and XDomain
connections to support the Gen 4 link. For now we keep the link as is
even if it is already asymmetric.
While there make tb_port_set_link_width() static.
Signed-off-by: Gil Fine <gil.fine@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Most of the ioctls to net protocols operates directly on userspace
argument (arg). Usually doing get_user()/put_user() directly in the
ioctl callback. This is not flexible, because it is hard to reuse these
functions without passing userspace buffers.
Change the "struct proto" ioctls to avoid touching userspace memory and
operate on kernel buffers, i.e., all protocol's ioctl callbacks is
adapted to operate on a kernel memory other than on userspace (so, no
more {put,get}_user() and friends being called in the ioctl callback).
This changes the "struct proto" ioctl format in the following way:
int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);
(Important to say that this patch does not touch the "struct proto_ops"
protocols)
So, the "karg" argument, which is passed to the ioctl callback, is a
pointer allocated to kernel space memory (inside a function wrapper).
This buffer (karg) may contain input argument (copied from userspace in
a prep function) and it might return a value/buffer, which is copied
back to userspace if necessary. There is not one-size-fits-all format
(that is I am using 'may' above), but basically, there are three type of
ioctls:
1) Do not read from userspace, returns a result to userspace
2) Read an input parameter from userspace, and does not return anything
to userspace
3) Read an input from userspace, and return a buffer to userspace.
The default case (1) (where no input parameter is given, and an "int" is
returned to userspace) encompasses more than 90% of the cases, but there
are two other exceptions. Here is a list of exceptions:
* Protocol RAW:
* cmd = SIOCGETVIFCNT:
* input and output = struct sioc_vif_req
* cmd = SIOCGETSGCNT
* input and output = struct sioc_sg_req
* Explanation: for the SIOCGETVIFCNT case, userspace passes the input
argument, which is struct sioc_vif_req. Then the callback populates
the struct, which is copied back to userspace.
* Protocol RAW6:
* cmd = SIOCGETMIFCNT_IN6
* input and output = struct sioc_mif_req6
* cmd = SIOCGETSGCNT_IN6
* input and output = struct sioc_sg_req6
* Protocol PHONET:
* cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
* input int (4 bytes)
* Nothing is copied back to userspace.
For the exception cases, functions sock_sk_ioctl_inout() will
copy the userspace input, and copy it back to kernel space.
The wrapper that prepare the buffer and put the buffer back to user is
sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
calls sk_ioctl(), which will handle all cases.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/mlx5/driver.h
617f5db1a626 ("RDMA/mlx5: Fix affinity assignment")
dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/
tools/testing/selftests/net/mptcp/mptcp_join.sh
47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported")
425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not")
45b1a1227a7a ("mptcp: introduces more address related mibs")
0639fa230a21 ("selftests: mptcp: add explicit check for new mibs")
https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199, which
isn't really small but perhaps with the exception of the WiFi locking
change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able to
run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices.
We are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may not
have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there are
lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()"
* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
dccp: Print deprecation notice.
udplite: Print deprecation notice.
octeon_ep: Add missing check for ioremap
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
net: tipc: resize nlattr array to correct size
sfc: fix XDP queues mode with legacy IRQ
net: macsec: fix double free of percpu stats
net: lapbether: only support ethernet devices
MAINTAINERS: add reviewers for SMC Sockets
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
net/sched: cls_api: Fix lockup on flushing explicitly created chain
ice: Fix ice module unload
net/handshake: remove fput() that causes use-after-free
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
...
|
|
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A fix for dvb-core to avoid a race condition during DVB board
registration"
* tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
|
|
[Why]
The sequence for collecting down_reply from source perspective should
be:
Request_n->repeat (get partial reply of Request_n->clear message ready
flag to ack DPRX that the message is received) till all partial
replies for Request_n are received->new Request_n+1.
Now there is chance that drm_dp_mst_hpd_irq() will fire new down
request in the tx queue when the down reply is incomplete. Source is
restricted to generate interveleaved message transactions so we should
avoid it.
Also, while assembling partial reply packets, reading out DPCD DOWN_REP
Sideband MSG buffer + clearing DOWN_REP_MSG_RDY flag should be
wrapped up as a complete operation for reading out a reply packet.
Kicking off a new request before clearing DOWN_REP_MSG_RDY flag might
be risky. e.g. If the reply of the new request has overwritten the
DPRX DOWN_REP Sideband MSG buffer before source writing one to clear
DOWN_REP_MSG_RDY flag, source then unintentionally flushes the reply
for the new request. Should handle the up request in the same way.
[How]
Separete drm_dp_mst_hpd_irq() into 2 steps. After acking the MST IRQ
event, driver calls drm_dp_mst_hpd_irq_send_new_request() and might
trigger drm_dp_mst_kick_tx() only when there is no on going message
transaction.
Changes since v1:
* Reworked on review comments received
-> Adjust the fix to let driver explicitly kick off new down request
when mst irq event is handled and acked
-> Adjust the commit message
Changes since v2:
* Adjust the commit message
* Adjust the naming of the divided 2 functions and add a new input
parameter "ack".
* Adjust code flow as per review comments.
Changes since v3:
* Update the function description of drm_dp_mst_hpd_irq_handle_event
Changes since v4:
* Change ack of drm_dp_mst_hpd_irq_handle_event() to be an array align
the size of esi[]
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|