Age | Commit message (Collapse) | Author |
|
Check of the pointer exists and we are actually on AC power.
v2: fix error message to reflect AC/DC mode.
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
PMFW may boots those ASICs with DC mode. Need to set it back
to AC mode.
v2: split from Evan's original patch (Alex)
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The smu_v11_0 version works for navi1x.
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a common smu11 helper to set the AC/DC power source.
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This is needed to tell the SMU firmware what state is in
in certain cases. DC mode does not allow overclocking
for example.
v2: split Evan's original patch (Alex)
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
->dma_fence_signal
->dma_fence_signal_locked
->test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.
->dma_fence_put
->dma_fence_release
->drm_sched_fence_release_scheduled
->call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list
Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal
[ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 732.914815] #PF: supervisor write access in kernel mode
[ 732.915731] #PF: error_code(0x0002) - not-present page
[ 732.916621] PGD 0 P4D 0
[ 732.917072] Oops: 0002 [#1] SMP PTI
[ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1
[ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[ 732.938569] Call Trace:
[ 732.939003] <IRQ>
[ 732.939364] dma_fence_signal+0x29/0x50
[ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched]
[ 732.941910] dma_fence_signal_locked+0x85/0x100
[ 732.942692] dma_fence_signal+0x29/0x50
[ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu]
[ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]
v2: hold the finished fence at drm_sched_process_job instead of
amdgpu_fence_process
v3: resume the blank line
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set ComputePGMRSRC1.VGPRS as 0x3f to clear all ArcVGPRs.
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
events at vsartup for DCN")' introduces a new way of pageflip
completion handling for DCN, and some trouble.
The current implementation introduces a race condition, which
can cause pageflip completion events to be sent out one vblank
too early, thereby confusing userspace and causing flicker:
prepare_flip_isr():
1. Pageflip programming takes the ddev->event_lock.
2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
3. Releases ddev->event_lock.
--> Deadline for surface address regs double-buffering passes on
target pipe.
4. dc_commit_updates_for_stream() MMIO programs the new pageflip
into hw, but too late for current vblank.
=> pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
in current vblank due to missing the double-buffering deadline
by a tiny bit.
5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
dm_dcn_crtc_high_irq() gets called.
6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
pageflip has been completed/will complete in this vblank and
sends out pageflip completion event to userspace and resets
pflip_status = AMDGPU_FLIP_NONE.
=> Flip completion event sent out one vblank too early.
This behaviour has been observed during my testing with measurement
hardware a couple of time.
The commit message says that the extra flip event code was added to
dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events
in case the pflip irq doesn't fire, because the "DCH HUBP" component
is clock gated and doesn't fire pflip irqs in that state. Also that
this clock gating may happen if no planes are active. This suggests
that the problem addressed by that commit can't happen if planes
are active.
The proposed solution is therefore to only execute the extra pflip
completion code iff the count of active planes is zero and otherwise
leave pflip completion handling to the pflip irq handler, for a
more race-free experience.
Note that i don't know if this fixes the problem the original commit
tried to address, as i don't know what the test scenario was. It
does fix the observed too early pageflip events though and points
out the problem introduced.
Fixes: 16f17eda8bad ("drm/amd/display: Send vblank and user events at vsartup for DCN")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Works stable without the overrides.
Signed-off-by: Yassine Oudjana <y.oudjana@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 29e2501f8a64fa2fa8f6fe4be53cce5a5a4fe79f.
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull networking fixes from David Miller:
1) Fix deadlock in bpf_send_signal() from Yonghong Song.
2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
Wang.
6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
7) cls_route removes the wrong filter during change operations, from
Cong Wang.
8) Reject unrecognized request flags in ethtool netlink code, from
Michal Kubecek.
9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
Doug Berger.
10) Don't leak ct zone template in act_ct during replace, from Paul
Blakey.
11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
Blakey.
12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
Lakkireddy.
13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
Dumazet.
14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
also from Eric Dumazet.
15) Restrict macsec to ethernet devices, from Willem de Bruijn.
16) Fix reference leak in some ethtool *_SET handlers, from Michal
Kubecek.
17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
net: ena: Add PCI shutdown handler to allow safe kexec
selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
selftests/net: add missing tests to Makefile
r8169: re-enable MSI on RTL8168c
net: phy: mdio-bcm-unimac: Fix clock handling
cxgb4/ptp: pass the sign of offset delta in FW CMD
net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
net: cbs: Fix software cbs to consider packet sending time
net/mlx5e: Do not recover from a non-fatal syndrome
net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
net/mlx5e: Enhance ICOSQ WQE info fields
net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
selftests: netfilter: add nfqueue test case
netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
netfilter: nft_fwd_netdev: validate family and chain type
netfilter: nft_set_rbtree: Detect partial overlaps on insertion
netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- One core quirk by myself to fix the .irq_disable() semantics when the
gpiolib core takes over this callback.
- The rest is an elaborate series of four patches fixing Intel laptop
ACPI wakeup quirks.
* tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
gpiolib: Fix irq_disable() semantics
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.6
Fourth, and last, set of fixes for v5.6. Just two important fixes to
iwlwifi regressions.
iwlwifi
* fix GEO_TX_POWER_LIMIT command on certain devices which caused
firmware to crash during initialisation
* add back device ids for three devices which were accidentally
removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lift the common namespace identifier reporting between the shared
namespace and new nshead cases into common code. This also means
one less lock is held while doing I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
There is no non __-prefixed version, so make the name a little more
readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Move the handling of an error into the function from the caller, and
only do it for an actual error on the admin command itself, not the
command parsing, as that should be enough to deal with devices claiming
a bogus version compliance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Calling nvme_sysfs_delete() when the controller is in the middle of
creation may cause several bugs. If the controller is in NEW state we
remove delete_controller file and don't delete the controller. The user
will not be able to use nvme disconnect command on that controller again,
although the controller may be active. Other bugs may happen if the
controller is in the middle of create_ctrl callback and
nvme_do_delete_ctrl() starts. For example, freeing I/O tagset at
nvme_do_delete_ctrl() before it was allocated at create_ctrl callback.
To fix all those races don't allow the user to delete the controller
before it was fully created.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Put the ctrl reference count at nvme_uninit_ctrl as opposed to
nvme_init_ctrl which takes it. This decrease the reference count at the
core layer instead of decreasing it on each transport separately.
Also move the call of nvme_uninit_ctrl at PCI driver after calling to
nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference
count after using the dev. This is safe because those functions use
nvme_dev which is freed only later at nvme_pci_free_ctrl.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
In case nvme_sysfs_delete() is called by the user before taking the ctrl
reference count, the ctrl may be freed during the creation and cause the
bug. Take the reference as soon as the controller is externally visible,
which is done by cdev_device_add() in nvme_init_ctrl(). Also take the
reference count at the core layer instead of taking it on each transport
separately.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Destroy the resources in the same order like in nvme_probe error flow to
improve code readability.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The return code of nvme_delete_ctrl_sync is never used, so change it to
void.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Improve code readability.
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
ida instances allocate some internal memory in addition to the base
'struct ida'. Use ida_destroy() to release that memory at module_exit().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Currently 32 bit application gets ENOTTY when it calls
compat_ioctl with NVME_IOCTL_SUBMIT_IO in 64 bit kernel.
The cause is that the results of sizeof(struct nvme_user_io),
which is used to define NVME_IOCTL_SUBMIT_IO,
are not same between 32 bit compiler and 64 bit compiler.
* 32 bit: the result of sizeof nvme_user_io is 44.
* 64 bit: the result of sizeof nvme_user_io is 48.
64 bit compiler seems to add 32 bit padding for multiple of 8 bytes.
This patch adds a compat_ioctl handler.
The handler replaces NVME_IOCTL_SUBMIT_IO32 with NVME_IOCTL_SUBMIT_IO
in case 32 bit application calls compat_ioctl for submit in 64 bit kernel.
Then, it calls nvme_ioctl as usual.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada (KIOXIA) <masahiro31.yamada@kioxia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
If we have a 4-byte data digest to send to the wire, but we
have more data to send, set MSG_MORE to tell the stack
that more is coming.
Reviewed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The nvme multipath error handling defaults to controller reset if the
error is unknown. There are, however, no existing nvme status codes that
indicate a reset should be used, and resetting causes unnecessary
disruption to the rest of IO.
Change nvme's error handling to first check if failover should happen.
If not, let the normal error handling take over rather than reset the
controller.
Based-on-a-patch-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Meneghini <johnm@netapp.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Current nvmet-rdma code allocates MR pool budget based on queue size,
assuming both host and target use the same "max_pages_per_mr" count.
After limiting the mdts value for RDMA controllers, we know the factor
of maximum MR's per IO operation. Thus, make sure MR pool will be
sufficient for the required IO depth and IO size.
That is, say host's SQ size is 100, then the MR pool budget allocated
currently at target will also be 100 MRs. But 100 IO WRITE Requests
with 256 sg_count(IO size above 1MB) require 200 MRs when target's
"max_pages_per_mr" is 128.
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
|
|
Set the maximal data transfer size to be 1MB (currently mdts is
unlimited). This will allow calculating the amount of MR's that
one ctrl should allocate to fulfill it's capabilities.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
|
|
Some transports, such as RDMA, would like to set the Maximum Data
Transfer Size (MDTS) according to device/port/ctrl characteristics.
This will enable the transport to set the optimal MDTS according to
controller needs and device capabilities. Add a new nvmet transport
op that is called during ctrl identification. This will not effect
transports that don't implement this option. The return value of the new
op is according to the NVMe spec definition for MDTS.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
|
|
Align PCI address print with fabrics address that is printed with
newline character.
Before:
[root@server40 linux]# cat /sys/class/nvme/nvme2/address
0000:0b:00.0[root@server40 linux]#
After:
[root@server40 linux]# cat /sys/class/nvme/nvme2/address
0000:0b:00.0
[root@server40 linux]#
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
|
|
If we failed to receive data from the socket, don't try
to further process it, we will for sure be handling a queue
error at this point. While no issue was seen with the
current behavior thus far, its safer to cease socket processing
if we detected an error.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Consolidate the request failure handling code to where
it is being fetched (nvme_tcp_try_send).
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
MAXH2CDATA is not zero based. Also no reason to limit ourselves to
1M transfers as we can do more easily. Make this an arbitrary limit
of 16M.
Reported-by: Wenhua Liu <liuw@vmware.com>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Currently, queue io_cpu assignment is done sequentially for default,
read and poll queues based on queue id. This causes miss-alignment between
context of CPU initiating I/O and the I/O worker thread processing
queued requests or completions.
Change to modify queue io_cpu assignment to take into account queue
maps offset. Each queue io_cpu will start at zero for each queue map.
This essentially aligns read/poll queues to start over the same range as
default queues.
Testing performed by Mark with:
- ram device (nvmet)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring (not using sq_thread option)
- hipri flag set
Micro-benchmark results show a net gain of:
- increase of 18%-29% in IOPs
- reduction of 16%-22% in average latency
- reduction of 7%-23% in 99.99% latency
Baseline:
========
QDepth/Batch | IOPs [k] | Avg. Lat [us] | 99.99% Lat [us]
-----------------------------------------------------------------
1/1 | 32.4 | 30.11 | 50.94
32/8 | 179 | 168.20 | 371
CPU alignment:
=============
QDepth/Batch | IOPs [k] | Avg. Lat [us] | 99.99% Lat [us]
-----------------------------------------------------------------
1/1 | 38.5 | 25.18 | 39.16
32/8 | 231 | 130.75 | 343
Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The timeout handler can use the existing nvme_poll() if it needs to
check a polled queue, allowing nvme_poll_irqdisable() to handle only
irq driven queues for the remaining callers.
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Completion handling had been done in two steps: find all new completions
under a lock, then handle those completions outside the lock. This was
done to make the locked section as short as possible so that other
threads using the same lock wait less time.
The driver no longer shares locks during completion, and is in fact
lockless for interrupt driven queues, so the optimization no longer
serves its original purpose. Replace the two-pass completion queue
handler with a single pass that completes entries immediately.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The only user for tagged completion was for timeout handling. That user,
though, really only cares if the timed out command is completed, which
we can safely check within the timeout handler.
Remove the tag check to simplify completion handling.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Update CQ head with pre-increment operator. This saves subtraction of 1
and a few registers.
Also update phase with "^= 1". This generates only one RMW instruction.
ffffffff815ba150 <nvme_update_cq_head>:
ffffffff815ba150: 0f b7 47 70 movzx eax,WORD PTR [rdi+0x70]
ffffffff815ba154: 83 c0 01 add eax,0x1
ffffffff815ba157: 66 89 47 70 mov WORD PTR [rdi+0x70],ax
ffffffff815ba15b: 66 3b 47 68 cmp ax,WORD PTR [rdi+0x68]
ffffffff815ba15f: 74 01 je ffffffff815ba162 <nvme_update_cq_head+0x12>
ffffffff815ba161: c3 ret
ffffffff815ba162: 31 c0 xor eax,eax
ffffffff815ba164: 80 77 74 01 ===> xor BYTE PTR [rdi+0x74],0x1
ffffffff815ba168: 66 89 47 70 mov WORD PTR [rdi+0x70],ax
ffffffff815ba16c: c3 ret
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-119 (-119)
Function old new delta
nvme_poll 690 678 -12
nvme_dev_disable 1230 1177 -53
nvme_irq 613 559 -54
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
|
For set feature command when setting up NVME_FEAT_NUM_QUEUES, check
Number of I/O Completion Queues Requested (NCQR) and Number of I/O
Submission Queues Requested (NSQR) before we proceed, for invalid values
(i.e. 65535) return an appropriate NVMe invalid field status.
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
After initialization, nvme_wait_ready checks for readiness every 100ms,
even though the drive may be ready far sooner than that. This delays
system boot by hundreds of milliseconds. Reduce the delay, checking for
readiness every millisecond instead.
Boot-time tests on an AWS c5.12xlarge:
Before:
[ 0.546936] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[ 0.764178] nvme nvme0: 2/0/0 default/read/poll queues
[ 0.768424] nvme0n1: p1
[ 0.774132] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 0.774146] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[ 0.788141] Run /sbin/init as init process
After:
[ 0.537088] initcall nvme_init+0x0/0x5b returned 0 after 37 usecs
...
[ 0.543457] nvme nvme0: 2/0/0 default/read/poll queues
[ 0.548473] nvme0n1: p1
[ 0.554339] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 0.554344] VFS: Mounted root (ext4 filesystem) on device 259:1.
...
[ 0.567931] Run /sbin/init as init process
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Log the controller status to know more about issue if it
lies within kernel nvme subsytem or controller is unhealthy.
Signed-off-by: Rupesh Girase <rgirase@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulakrni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The function nvme_identify_ns_desc() has 3 levels of nesting which make
error message to exceeded > 80 char per line which is not aligned with
the kernel code standards and rest of the NVMe subsystem code.
Add a helper function to move the processing of the log when the
command is successful by reducing the nesting and keeping the
code < 80 char per line.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
I see no good reason for the "If unsure, say N" advice in the description
of the NVME_HWMON configuration option. It is not dangerous, it does
not select any other option, and has a fairly low overhead.
As the option is already not enabled by default, further suggesting
hesitant users to not enable it is not useful anyway. Unlike some other
options where the description alone may not be sufficient for users to
make a decision, NVME_HWMON is pretty simple to grasp in my opinion,
so just let the user do what they want.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
We allow userspace to connect with a custom hostid which is useful for
certain use-cases. However there is is no way to tell what is the hostid
used to connect to a given controller.
Expose this so userspace can correlate controllers based on hostid.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
We allow userspace to connect with a custom hostnqn which is useful for
certain use-cases. However there is no way to tell what is the hostnqn
used to connect to a given controller.
Expose this so userspace can correlate controllers based on hostnqn.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~
To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.
This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).
Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit adds support to catch any bits set in SGE_INT_CAUSE5 for Parity Errors.
F_ERR_T_RXCRC flag is used to ignore that particular bit as it is not considered as fatal.
So, we clear out the bit before looking for error.
This patch now read and report separately all three registers(Cause1, Cause2, Cause5).
Also, checks for errors if any.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|