Age | Commit message (Collapse) | Author |
|
Fill those namespace fields from the block device format for adding
metadata (T10-PI) over fabric support with block devices.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata offload support was implemented over the new RDMA
signature verbs API and it is enabled for capable controllers.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove first_sgl pointer from struct nvme_rdma_request and use pointer
arithmetic instead. The inline scatterlist, if exists, will be located
right after the nvme_rdma_request. This patch is needed as a preparation
for adding PI support.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
SGL size of metadata is usually small. Thus, 1 inline sg should cover
most cases. The macro will be used for pre-allocate a single SGL entry
for metadata. The preallocation of small inline SGLs depends on SG_CHAIN
capability so if the ARCH doesn't support SG_CHAIN, use the runtime
allocation for the SGL. This patch is a preparation for adding metadata
(T10-PI) over fabric support.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
An extended LBA is a larger LBA that is created when metadata associated
with the LBA is transferred contiguously with the LBA data (AKA
interleaved). The metadata may be either transferred as part of the LBA
(creating an extended LBA) or it may be transferred as a separate
contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
supports only an Extended LBA format. Fail revalidation in case we have a
spec violation. Also add a flag that will imply on capable transports and
controllers as part of a preparation for allowing end-to-end protection
information for fabric controllers.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch doesn't change any logic, and is needed as a preparation
for adding PI support for fabrics drivers that will use an extended
LBA format for metadata and will support more than 1 integrity segment.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Move the nvme_ns_has_pi() inline from core.c to the nvme.h header.
This allows use by the transports.
Signed-off-by: James Smart <jsmart2021@gmail.com>
[maxg: added a comment for nvme_ns_has_pi()]
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is a preparation for adding support for metadata in fabric
controllers. New flag will imply that NVMe namespace supports getting
metadata that was originally generated by host's block layer.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Replace the specific ext boolean (that implies on extended LBA format)
with a feature in the new namespace features flag. This is a preparation
for adding more namespace features (such as metadata specific features).
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This will reduce the amount of ifdefs inside the source code for various
drivers and also will reduce the amount of stub functions that were
created for the !CONFIG_BLK_DEV_INTEGRITY case.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add a new attribute "revalidate_size" for the namespace which allows
user to revalidate and generate the AEN if needed. This attribute is
needed so that we can install userspace rules with systemd service based
on inotify/fsnotify/uevent. The registered callback for such a service
will end up writing to this attribute to generate AEN if needed.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The newly added function nvmet_ns_revalidate() does update the ns size
in the identify namespace in-core target data structure when host issues
id-ns command. This can lead to host having inconsistencies between size
of the namespace present in the id-ns command result and size of the
corresponding block device until host scans the namespaces explicitly.
To avoid this scenario generate AEN if old size is not same as the new
one in nvmet_ns_revalidate().
This will allow automatic AEN generation when host calls id-ns command
and also allows target to install userspace rules so that it can trigger
nvmet_ns_revalidate() (using configfs interface with the help of next
patch) resulting in appropriate AEN generation when underlying namespace
size change is detected.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch adds a wrapper helper to indicate size change in the bdev &
file-backed namespace when revalidating ns. This helper is needed in
order to minimize code repetition in the next patch for configfs.c and
existing admin-cmd.c.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This adds a new tracepoint for the target to trace async event. This is
helpful in debugging and comparing host and target side async events
especially when host is connected to different targets on different
machines and now that we rely on userspace components to generate AEN.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme_put_ctrl() is implemented earlier as an inline function so
this declaration isn't required.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, a namespace io_opt queue limit is set by default to the
physical sector size of the namespace and to the the write optimal
size (NOWS) when the namespace reports optimal IO sizes. This causes
problems with block limits stacking in blk_stack_limits() when a
namespace block device is combined with an HDD which generally do not
report any optimal transfer size (io_opt limit is 0). The code:
/* Optimal I/O a multiple of the physical block size? */
if (t->io_opt & (t->physical_block_size - 1)) {
t->io_opt = 0;
t->misaligned = 1;
ret = -1;
}
in blk_stack_limits() results in an error return for this function when
the combined devices have different but compatible physical sector
sizes (e.g. 512B sector SSD with 4KB sector disks).
Fix this by not setting the optimal IO size queue limit if the namespace
does not report an optimal write size value.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Disable streams again if getting the stream params fails.
Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme-fc devloss_tmo is computed as the min of either the
ctrl_loss_tmo (max_retries * reconnect_delay) or the remote port's
devloss_tmo. But what gets printed as the nvme-fc devloss_tmo in
nvme_fc_reconnect_or_delete() is always the remote port's devloss_tmo
value. So correct this by printing the min value instead.
Signed-off-by: Martin George <marting@netapp.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Check module parameter write/poll_queues before using it to catch
too large values.
Reproducer:
modprobe -r nvme
modprobe nvme write_queues=`nproc`
echo $((`nproc`+1)) > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
[ 657.069000] ------------[ cut here ]------------
[ 657.069022] WARNING: CPU: 10 PID: 1163 at kernel/irq/affinity.c:390 irq_create_affinity_masks+0x47c/0x4a0
[ 657.069056] dm_region_hash dm_log dm_mod
[ 657.069059] CPU: 10 PID: 1163 Comm: kworker/u193:9 Kdump: loaded Tainted: G W 5.6.0+ #8
[ 657.069060] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 657.069064] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 657.069066] RIP: 0010:irq_create_affinity_masks+0x47c/0x4a0
[ 657.069067] Code: fe ff ff 48 c7 c0 b0 89 14 95 48 89 46 20 e9 e9 fb ff ff 31 c0 e9 90 fc ff ff 0f 0b 48 c7 44 24 08 00 00 00 00 e9 e9 fc ff ff <0f> 0b e9 87 fe ff ff 48 8b 7c 24 28 e8 33 a0 80 00 e9 b6 fc ff ff
[ 657.069068] RSP: 0018:ffffb505ce1ffc78 EFLAGS: 00010202
[ 657.069069] RAX: 0000000000000060 RBX: ffff9b97921fe5c0 RCX: 0000000000000000
[ 657.069069] RDX: ffff9b67bad80000 RSI: 00000000ffffffa0 RDI: 0000000000000000
[ 657.069070] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9b97921fe718
[ 657.069070] R10: ffff9b97921fe710 R11: 0000000000000001 R12: 0000000000000064
[ 657.069070] R13: 0000000000000060 R14: 0000000000000000 R15: 0000000000000001
[ 657.069071] FS: 0000000000000000(0000) GS:ffff9b67c0880000(0000) knlGS:0000000000000000
[ 657.069072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 657.069072] CR2: 0000559eac6fc238 CR3: 000000057860a002 CR4: 00000000007606e0
[ 657.069073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 657.069073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 657.069073] PKRU: 55555554
[ 657.069074] Call Trace:
[ 657.069080] __pci_enable_msix_range+0x233/0x5a0
[ 657.069085] ? kernfs_put+0xec/0x190
[ 657.069086] pci_alloc_irq_vectors_affinity+0xbb/0x130
[ 657.069089] nvme_reset_work+0x6e6/0xeab [nvme]
[ 657.069093] ? __switch_to_asm+0x34/0x70
[ 657.069094] ? __switch_to_asm+0x40/0x70
[ 657.069095] ? nvme_irq_check+0x30/0x30 [nvme]
[ 657.069098] process_one_work+0x1a7/0x370
[ 657.069101] worker_thread+0x1c9/0x380
[ 657.069102] ? max_active_store+0x80/0x80
[ 657.069103] kthread+0x112/0x130
[ 657.069104] ? __kthread_parkme+0x70/0x70
[ 657.069105] ret_from_fork+0x35/0x40
[ 657.069106] ---[ end trace f4f06b7d24513d06 ]---
[ 657.077110] nvme nvme0: 95/1/0 default/read/poll queues
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
call-sites
Have routines handle errors and just bail out of the poll loop.
This simplifies the code and will help as we may enhance the poll
loop logic and these are somewhat in the way.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
when trying to send the pdu data digest, we should set this
flag.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We can signal the stack that this is not the last page coming and the
stack can build a larger tso segment, so go ahead and use it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We can signal the stack that this is not the last page coming and the
stack can build a larger tso segment, so go ahead and use it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
|
It is more efficient to use kmemdup_nul() if the size is known exactly.
The doc in kernel:
"Note: Use kmemdup_nul() instead if the size is known exactly."
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Change 'handeled' to 'handled' in the Kconfig help for SCTP.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: Bug fixes.
3 bnxt_en driver fixes, covering a bug in preserving the counters during
some resets, proper error code when flashing NVRAM fails, and an
endian bug when extracting the firmware response message length.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The explicit mask and shift is not the appropriate way to parse fields
out of a little endian struct. The length field is internally __le16
and the strategy employed only happens to work on little endian machines
because the offset used is actually incorrect (length is at offset 6).
Also remove the related and no longer used definitions from bnxt.h.
Fixes: 845adfe40c2a ("bnxt_en: Improve valid bit checking in firmware response message.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When NVRAM directory is not found, return the error code
properly as per firmware command failure instead of the hardcode
-ENOBUFS.
Fixes: 3a707bed13b7 ("bnxt_en: Return -EAGAIN if fw command returns BUSY")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have logic to maintain network counters across resets by storing
the counters in bp->net_stats_prev before reset. But not all resets
will clear the counters. Certain resets that don't need to change
the number of rings do not clear the counters. The current logic
accumulates the counters before all resets, causing big jumps in
the counters after some resets, such as ethtool -G.
Fix it by only accumulating the counters during reset if the irq_re_init
parameter is set. The parameter signifies that all rings and interrupts
will be reset and that means that the counters will also be reset.
Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Fixes: b8875ca356f1 ("bnxt_en: Save ring statistics before reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for Telit LE910C1-EUX composition
0x1031: tty, tty, tty, rmnet
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
amd_energy_init()
Add the missing platform_driver_unregister() before return
from amd_energy_init() in the error handling case.
Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Naveen krishna Chatradhi <nchatrad@amd.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20200527022417.105620-1-weiyongjun1@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with gso size exceeding len.
These packets are dropped in tcp_gso_segment and udp[46]_ufo_fragment.
But they may affect gso size calculations earlier in the path.
Now that we have thlen as of commit 9274124f023b ("net: stricter
validation of untrusted gso packets"), check gso_size at entry too.
Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the MPTCP receive path we must cope with TCP fallback
on blocking recvmsg(). Currently in such code path we detect
the fallback condition, but we don't fetch the struct socket
required for fallback.
The above allowed syzkaller to trigger a NULL pointer
dereference:
general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline]
RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904
Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d
RSP: 0018:ffffc90001077b98 EFLAGS: 00010202
RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa
R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc
R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000
FS: 00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891
inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:886 [inline]
sock_recvmsg net/socket.c:904 [inline]
__sys_recvfrom+0x2f3/0x470 net/socket.c:2057
__do_sys_recvfrom net/socket.c:2075 [inline]
__se_sys_recvfrom net/socket.c:2071 [inline]
__x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071
do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Address the issue initializing the struct socket reference
before entering the fallback code.
Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com
Suggested-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: 8ab183deb26a ("mptcp: cope with later TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Documentation says that gpll0 is parent of gpll0_out_even, somehow
driver coded that as bi_tcxo, so fix it
Fixes: 2a1d7eb854bb ("clk: qcom: gcc: Add global clock controller driver for SM8150")
Reported-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lkml.kernel.org/r/20200521052728.2141377-1-vkoul@kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The driver will always fail to probe without QCOM_GDSC, so select it.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Link: https://lkml.kernel.org/r/20200523040947.31946-1-jonathan@marek.ca
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Fixes: 3e5770921a88 ("clk: qcom: gcc: Add global clock controller driver for SM8250")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
For rx filter 'HWTSTAMP_FILTER_PTP_V2_EVENT', it should be
PTP v2/802.AS1, any layer, any kind of event packet, but HW only
take timestamp snapshot for below PTP message: sync, Pdelay_req,
Pdelay_resp.
Then it causes below issue when test E2E case:
ptp4l[2479.534]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.423]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.758]: port 1: received DELAY_REQ without timestamp
ptp4l[2483.524]: port 1: received DELAY_REQ without timestamp
ptp4l[2484.233]: port 1: received DELAY_REQ without timestamp
ptp4l[2485.750]: port 1: received DELAY_REQ without timestamp
ptp4l[2486.888]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.265]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.316]: port 1: received DELAY_REQ without timestamp
Timestamp snapshot dependency on register bits in received path:
SNAPTYPSEL TSMSTRENA TSEVNTENA PTP_Messages
01 x 0 SYNC, Follow_Up, Delay_Req,
Delay_Resp, Pdelay_Req, Pdelay_Resp,
Pdelay_Resp_Follow_Up
01 0 1 SYNC, Pdelay_Req, Pdelay_Resp
For dwmac v5.10a, enabling all events by setting register
DWC_EQOS_TIME_STAMPING[SNAPTYPSEL] to 2’b01, clearing bit [TSEVNTENA]
to 0’b0, which can support all required events.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Ahern says:
====================
nexthops: Fix 2 fundamental flaws with nexthop groups
Nik's torture tests have exposed 2 fundamental mistakes with the initial
nexthop code for groups. First, the nexthops entries and num_nh in the
nh_grp struct should not be modified once the struct is set under rcu.
Doing so has major affects on the datapath seeing valid nexthop entries.
Second, the helpers in the header file were convenient for not repeating
code, but they cause datapath walks to potentially see 2 different group
structs after an rcu replace, disrupting a walk of the path objects.
This second problem applies solely to IPv4 as I re-used too much of the
existing code in walking legs of a multipath route.
Patches 1 is refactoring change to simplify the overhead of reviewing and
understanding the change in patch 2 which fixes the update of nexthop
groups when a compnent leg is removed.
Patches 3-5 address the second problem. Patch 3 inlines the multipath
check such that the mpath lookup and subsequent calls all use the same
nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
with iterative calls to fib_info_nhc.
fib_info_num_path can be used in control plane path in a 'for loop' with
subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
only changed while holding the rtnl; the combination can not be used in
the data plane with external nexthops as it involves repeated dereferences
of nh_grp struct which can change between calls.
Similarly, nexthop_is_multipath can be used for branching decisions in
the datapath since the nexthop type can not be changed (a group can not
be converted to standalone and vice versa).
Patch set developed in coordination with Nikolay Aleksandrov. He did a
lot of work creating a good reproducer, discussing options to fix it
and testing iterations.
I have adapted Nik's commands into additional tests in the nexthops
selftest script which I will send against -next.
v2
- fixed whitespace errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to the last path, need to fix fib_info_nh_uses_dev for
external nexthops to avoid referencing multiple nh_grp structs.
Move the device check in fib_info_nh_uses_dev to a helper and
create a nexthop version that is called if the fib_info uses an
external nexthop.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FIB lookups can return an entry that references an external nexthop.
While walking the nexthop struct we do not want to make multiple calls
into the nexthop code which can result in 2 different structs getting
accessed - one returning the number of paths the rest of the loop
seeing a different nh_grp struct. If the nexthop group shrunk, the
result is an attempt to access a fib_nh_common that does not exist for
the new nh_grp struct but did for the old one.
To fix that move the device evaluation code to a helper that can be
used for inline fib_nh path as well as external nexthops.
Update the existing check for fi->nh in fib_table_lookup to call a
new helper, nexthop_get_nhc_lookup, which walks the external nexthop
with a single rcu dereference.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I got too fancy consolidating checks on multipath type. The result
is that path lookups can access 2 different nh_grp structs as exposed
by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
avoid multiple, nh_grp dereferences and make decisions based on the
consistent struct.
Only 2 places left using nexthop_is_multipath are within IPv6, both
only check that the nexthop is a multipath for a branching decision
which are acceptable.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must avoid modifying published nexthop groups while they might be
in use, otherwise we might see NULL ptr dereferences. In order to do
that we allocate 2 nexthoup group structures upon nexthop creation
and swap between them when we have to delete an entry. The reason is
that we can't fail nexthop group removal, so we can't handle allocation
failure thus we move the extra allocation on creation where we can
safely fail and return ENOMEM.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move nh_grp dereference and check for removing nexthop group due to
all members gone into remove_nh_grp_entry.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The io_uring interfaces have been replaced by do_statx() and are no
longer needed.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is a prepatory patch to allow io_uring to invoke statx directly.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case
in floppy_queue_rq() is not handled correctly.
In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy
locked due to another request still being in-flight), we put the request
on the list of requests and return BLK_STS_OK to the block core, without
actually scheduling delayed work / doing further processing of the
request. This means that processing of this request is postponed until
another request comes and passess uncontended.
Which in some cases might actually never happen and we keep waiting
indefinitely. The simple testcase is
for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done
run in quemu. That reliably causes blkid eventually indefinitely hanging
in __floppy_read_block_0() waiting for completion, as the BIO callback
never happens, and no further IO is ever submitted on the (non-existent)
floppy device. This was observed reliably on qemu-emulated device.
Fix that by not queuing the request in the contended case, and return
BLK_STS_RESOURCE instead, so that blk core handles the request
rescheduling and let it pass properly non-contended later.
Fixes: a9f38e1dec107a ("floppy: convert to blk-mq")
Cc: stable@vger.kernel.org
Tested-by: Libor Pechacek <lpechacek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
the origin design will use varible of "attr->states" to save node
supported states on current gpu device, but for multi gpu device, when
probe second gpu device, the driver will check attribute node states
from previous gpu device wthether to create attribute node.
it will cause other gpu device create attribute node faild.
1. add member attr_list into amdgpu_device to link supported device attribute node.
2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state.
3. drop member "states" from amdgpu_device_attr.
v2:
1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list".
2. refine create & remove device node functions parameter.
fix:
drm/amdgpu: optimize amdgpu device attribute code
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|