Age | Commit message (Collapse) | Author |
|
Make use of feature introduced with v3.2 commit 294436914454 ("[SCSI] scsi:
Added support for adapter and firmware reset"). The common code interface
was introduced for commit 95d31262b3c1 ("[SCSI] qla4xxx: Added support for
adapter and firmware reset").
$ echo adapter > /sys/class/scsi_host/host<N>/host_reset
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : scshr_y SCSI sysfs host_reset yes
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x0000000000000000 none (invalid)
D_ID : 0x00000000 none (invalid)
Adapter status : 0x4500050b
Port status : 0x00000000 none (invalid)
LUN status : 0x00000000 none (invalid)
Ready count : 0x00000001
Running count : 0x00000000
ERP want : 0x04 ZFCP_ERP_ACTION_REOPEN_ADAPTER
ERP need : 0x04 ZFCP_ERP_ACTION_REOPEN_ADAPTER
This is the common code equivalent to the zfcp-specific
&dev_attr_adapter_failed.attr in zfcp_sysfs_adapter_attrs.attrs[]:
$ echo 0 > /sys/bus/ccw/drivers/zfcp/<devbusid>/failed
The unsupported case returns EOPNOTSUPP:
$ echo firmware > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Operation not supported
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : scshr_n SCSI sysfs host_reset no
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0xffffffff none (invalid)
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0xffffffa1 -EOPNOTSUPP==-95
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000
For any other invalid value, common code returns EINVAL without invoking
our callback:
$ echo foo > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Invalid argument
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
While the default did already correctly print "Initiator" let's make it
explicit and convert zfcp to the feature.
$ cat /sys/class/scsi_host/host0/supported_mode
Initiator
$ cat /sys/class/scsi_host/host0/active_mode
Initiator
The default worked, because not setting the field has it initialized to zero
== MODE_UNKNOWN. scsi_host_alloc() sets shost->active_mode = MODE_INITIATOR
in this case. The sysfs accessor function show_shost_supported_mode()
assumes MODE_INITIATOR in this case. This default behavior was introduced
with v2.6.24 commit 7a39ac3f25be ("[SCSI] make supported_mode default to
initiator."). The feature flag was introduced with v2.6.24 commit
5dc2b89e1242 ("[SCSI] add supported_mode and active_mode attributes to the
host"). So there was no release where zfcp would have shown "unknown".
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since v2.6.27 commit 553448f6c483 ("[SCSI] zfcp: Message cleanup"), none of
the callers has been interested any more. Values were not returned
consistently in all ERP trigger functions.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Simplify its signature to return boolean and rename it to
zfcp_erp_action_is_running() to indicate its actual unmodified semantics.
It has always been used like this since v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
All constant defines were introduced with v2.6.0 history commit ea127f975424
("[PATCH] s390 (7/7): zfcp host adapter.") and refactored into enums with
commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c").
ZFCP_STATUS_ERP_DISMISSING and ZFCP_ERP_STEP_FSF_XCONFIG were never used.
v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c")
removed the use of ZFCP_ERP_ACTION_READY on refactoring
zfcp_erp_action_exists() to now only check adapter->erp_running_head but no
longer adapter->erp_ready_head. The same commit could have changed the
function return type from int to "enum zfcp_erp_act_state".
ZFCP_ERP_ACTION_READY was never used outside of zfcp_erp_action_exists().
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
I've been mixing up
zfcp_task_mgmt_function() [SCSI] and
zfcp_fsf_fcp_task_mgmt() [FSF]
so often lately that I wanted to fix this.
SCSI changes complement v2.6.27 commit f76af7d7e363 ("[SCSI] zfcp: Cleanup
of code in zfcp_scsi.c").
While at it, also fixup the other inconsistencies elsewhere.
ERP changes complement v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") which introduced status_change_set().
FC changes complement v2.6.32 commit 6f53a2d2ecae ("[SCSI] zfcp: Apply
common naming conventions to zfcp_fc"). by renaming a leftover introduced
with v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote
ports").
FSF changes fixup v2.6.32 commit a4623c467ff7 ("[SCSI] zfcp: Improve request
allocation through mempools"). which replaced zfcp_fsf_alloc_qtcb()
introduced with v2.6.27 commit c41f8cbddd4e ("[SCSI] zfcp: zfcp_fsf
cleanup.").
SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126aa
("[SCSI] zfcp: transport class adaptations").
SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe6f
("[SCSI] zfcp: Add port_state attribute to sysfs").
SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit
338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target
port is unavailable").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
as context
As a prerequisite, complement commit 3d1cb2059d93 ("workqueue: include
workqueue info when printing debug dump of a worker task") to be usable with
kernel modules by exporting the symbol set_worker_desc(). Current built-in
user was introduced with commit ef3b101925f2 ("writeback: set worker desc to
identify writeback workers in task dumps").
Can help distinguishing work items which do not have adapter scope.
Description is printed out with task dump for debugging on WARN, BUG, panic,
or magic-sysrq [show-task-states(t)].
Example:
$ echo 0 >| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed &
$ echo 't' >| /proc/sysrq-trigger
$ dmesg
sysrq: SysRq : Show State
task PC stack pid father
...
zfcp_q_0.0.1880 S14640 2165 2 0x02000000
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
[<00000000009df57c>] schedule+0x94/0xc0
[<0000000000168654>] rescuer_thread+0x33c/0x3a0
[<000000000016f8be>] kthread+0x166/0x178
[<00000000009e71f2>] kernel_thread_starter+0x6/0xc
[<00000000009e71ec>] kernel_thread_starter+0x0/0xc
no locks held by zfcp_q_0.0.1880/2165.
...
kworker/u512:2 D11280 2193 2 0x02000000
Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327)
^^^^^^^^^^^^^^^^^^^^^
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
[<00000000009df57c>] schedule+0x94/0xc0
[<00000000009e50c0>] schedule_timeout+0x488/0x4d0
[<00000000001e425c>] msleep+0x5c/0x78 >>test code only<<
[<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp]
[<0000000000167154>] process_one_work+0x3b4/0x718
[<000000000016771c>] worker_thread+0x264/0x408
[<000000000016f8be>] kthread+0x166/0x178
[<00000000009e71f2>] kernel_thread_starter+0x6/0xc
[<00000000009e71ec>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u512:2/2193:
#0: (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
#1: ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
...
=============================================
Showing busy workqueues and worker pools:
workqueue zfcp_q_0.0.1880: flags=0x2000a
pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1
in-flight: 2193:zfcp_scsi_rport_work [zfcp]
pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311
Work items with adapter scope are already identified by the workqueue name
"zfcp_q_<devbusid>" and the work item function name.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Note: zfcp_scsi_eh_host_reset_handler() will be converted in a later patch.
zfcp_scsi_eh_device_reset_handler() now only depends on scsi_device.
zfcp_scsi_eh_target_reset_handler() now only depends on scsi_target.
All derive other objects from these intended callback arguments.
zfcp_scsi_eh_target_reset_handler() is special: The FCP channel requires a
valid LUN handle so we try to find ourselves a stand-in scsi_device as
suggested by Hannes Reinecke. If it cannot find a stand-in scsi device,
trace a record like the following (formatted with zfcpdbf from s390-tools):
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : tr_nosd target reset, no SCSI device
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x00000000 SCSI ID/target denoting scope
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0x00002003 field re-used for midlayer value: FAILED
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000
Actually change the signature of zfcp_task_mgmt_function() used by
zfcp_scsi_eh_device_reset_handler() & zfcp_scsi_eh_target_reset_handler().
Since it was prepared in a previous patch, we only need to delete a local
auto variable which is now the intended argument.
Suggested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Intentionally retrieve the rport by walking SCSI common code objects
rather than zfcp_sdev->port->rport.
The latter is used for pairing the calls to fc_remote_port_add() and
fc_remote_port_delete(). [see v2.6.31 commit 379d6bf6573e ("[SCSI] zfcp:
Add port only once to FC transport class")]
zfcp_scsi_rport_register() sets zfcp_port.rport to what
fc_remote_port_add() returned.
zfcp_scsi_rport_block() sets zfcp_port.rport = NULL after having called
fc_remote_port_delete().
Hence, while an rport is blocked (or in any subsequent state due to
scsi_transport_fc timeouts such as fast_io_fail_tmo or dev_loss_tmo),
zfcp_port.rport is NULL and cannot serve as argument to fc_block_rport().
During zfcp recovery, a just recovered zfcp_port can have the UNBLOCKED
status flag, but an async rport unblocking has only started via
zfcp_scsi_schedule_rport_register() in zfcp_erp_try_rport_unblock()
[see v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with
LUN recovery")] in zfcp_erp_action_cleanup(). Now zfcp_erp_wait() can
return. This would be sufficient to successfully send a TMF.
But the rport can still be blocked and zfcp_port.rport can still be NULL
until zfcp_port.rport_work was scheduled and has actually called
fc_remote_port_add() and assigned its return value to zfcp_port.rport.
We need an unblocked rport for a successful scsi_eh TUR.
Similarly, for a zfcp_port which has just lost its UNBLOCKED status flag,
the return of zfcp_erp_wait() can race with zfcp_port.rport_work queued
by zfcp_scsi_schedule_rport_block(). Therefore we cannot reliably access
zfcp_port.rport. However, we'd like to get fc_rport_block()'s opinion on
when fast_io_fail_tmo triggered. While we might use
flush_work(&port->rport_work) to sync with the work item, we can simply use
the other way to get an rport pointer.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Actually change the signature of zfcp_fsf_fcp_task_mgmt().
Since it was prepared in the previous patch, we only need to delete
a local auto variable which is now the intended argument.
Prepare zfcp_fsf_fcp_task_mgmt's caller zfcp_task_mgmt_function()
to have its function body only depend on a scsi_device and derived objects.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In zfcp_fsf_fcp_task_mgmt() resolve the still old argument scsi_cmnd into
scsi_device very early and only depend on scsi_device and derived objects in
the function body.
This prepares to later change the function signature replacing the scsi_cmnd
argument with scsi_device.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This reverts commit 2443c8b23aea ("[SCSI] zfcp: Merge FCP task management
setup with regular FCP command setup"), because this introduced a
dependency on the unsuitable SCSI command for scsi_eh / TMF.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Originally, I planned for TMF handling to have different context data in
fsf_req->data depending on the TMF scope in fcp_cmnd->fc_tm_flags:
* scsi_device if FCP_TMF_LUN_RESET,
* zfcp_port if FCP_TMF_TGT_RESET.
However, the FCP channel requires a valid LUN handle so we now use
scsi_device as context data with any TMF for the time being.
Regular SCSI I/O FCP requests continue using scsi_cmnd as req->data.
Hence, the callers of zfcp_fsf_fcp_handler_common() must resolve req->data
and pass scsi_device as common context. While at it, remove the detour
zfcp_sdev->port->adapter and use the more direct req->adapter as elsewhere
in this function already.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The SCSI command pointer passed to scsi_eh callbacks is just one arbitrary
command of potentially many that are in the eh queue to be processed. The
command is only used to indirectly pass the TMF scope in terms of SCSI
ID/target and SCSI LUN for LUN reset.
Hence, zfcp had filled in SCSI trace record fields which do not really
belong to the TMF. This was confusing.
Therefore, refactor the TMF tracing to work without SCSI command. Since the
FCP channel always requires a valid LUN handle, we use SCSI device as common
context for any TMF (even target reset). To make it even clearer, we set
all bits to 1 for the fields, which do not belong to the TMF, to indicate
that these fields are invalid.
The old zfcp_dbf_scsi() became zfcp_dbf_scsi_common() to now handle both
SCSI commands and TMFs. The old argument scsi_cmnd is now optional and can
be NULL with TMFs. The new argument scsi_device is mandatory to carry
context, as well as SCSI ID/target and SCSI LUN in case of TMFs.
New example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : [lt]r_....
Request ID : 0x<reqid> ID of FSF FCP request with TM flag
For cases without FSF request: 0x0 for none (invalid)
SCSI ID : 0x<scsi_id> SCSI ID/target denoting scope
SCSI LUN : 0x<scsi_lun> SCSI LUN denoting scope
SCSI LUN high : 0x<scsi_lun_high> SCSI LUN denoting scope
SCSI result : 0xffffffff none (invalid)
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0x00 FCP_RSP info code of TMF
FCP rsp IU : 00000000 00000000 00000100 00000000 ext FCP_RSP IU
00000000 00000008 ext FCP_RSP IU
FCP rsp IU len : 32 FCP_RSP IU length
Payload time : ...
FCP rsp IU all : 00000000 00000000 00000100 00000000 full FCP_RSP IU
00000000 00000008 00000000 00000000 full FCP_RSP IU
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xc0 ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.
Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP
trigger functions finally run through zfcp_erp_action_enqueue(). So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : .......
LUN : 0x...
WWPN : 0x...
D_ID : 0x...
Adapter status : 0x...
Port status : 0x...
LUN status : 0x...
Ready count : 0x...
Running count : 0x...
ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_...
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.
Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:
loose remote port
t workqueue
[s] zfcp_q_<dev> IRQ zfcperp<dev>
=== ================== =================== ============================
0 recv RSCN
q p.test_link_work
block rport
start fast_io_fail_tmo
send ADISC ELS
4 recv ADISC fail
block zfcp_port
port forced reopen
send open port
12 recv open port fail
q p.gid_pn_work
zfcp_erp_wakeup
(zfcp_erp_wait would return)
GID_PN fail
Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.
port.status |= ERP_FAILED
If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.
workqueue
fc_dl_<host>
==================
27 fc_timeout_fail_rport_io
fc_terminate_rport_io
zfcp_scsi_terminate_rport_io
zfcp_erp_port_forced_reopen
_zfcp_erp_port_forced_reopen
if (port.status & ERP_FAILED)
return;
Therefore, write a trace before above early return.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : sctrpi1 SCSI terminate rport I/O
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x<wwpn>
D_ID : 0x<n_port_id>
Adapter status : 0x...
Port status : 0x...
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
get_device() and its internally used kobject_get() only return NULL if they
get passed NULL as argument. zfcp_get_port_by_wwpn() loops over
adapter->port_list so the iteration variable port is always non-NULL.
Struct device is embedded in struct zfcp_port so &port->dev is always
non-NULL. This is the argument to get_device(). However, if we get an
fc_rport in terminate_rport_io() for which we cannot find a match within
zfcp_get_port_by_wwpn(), the latter can return NULL. v2.6.30 commit
70932935b61e ("[SCSI] zfcp: Fix oops when port disappears") introduced an
early return without adding a trace record for this case. Even if we don't
need recovery in this case, for debugging we should still see that our
callback was invoked originally by scsi_transport_fc.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : sctrpin SCSI terminate rport I/O, no zfcp port
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x<wwpn> WWPN
D_ID : 0x<n_port_id> N_Port-ID
Adapter status : 0x...
Port status : 0xffffffff unknown (-1)
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xc0 ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 70932935b61e ("[SCSI] zfcp: Fix oops when port disappears")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0 by
design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the
half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".
Old example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING
but not ZFCP_STATUS_COMMON_UNBLOCKED as it
was closed on close part of adapter reopen
ERP want : 0x01
ERP need : 0x01 misleading
However, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.
We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP
need" as 'not needed' but still keep the information which erp_action type,
that zfcp_erp_required_act() had decided upon, is needed. 0xc_ is chosen to
be visibly different from 0x0_ in "ERP want".
New example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000
ERP want : 0x01
ERP need : 0xc1 would need LUN ERP, but no action set up
^
Before v2.6.38 commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action as
argument and field from the trace.
This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.
See also commit fdbd1c5e27da ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of
our eh callback and an actual send/recv of an abort / TMF request. In order
to see the temporal sequence including any abort / TMF send retries, add a
trace before the above two blocking functions. This supports problem
determination with scsi_eh and parallel zfcp ERP.
No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to
an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we
now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes
almost the beginning of the callback.
No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case and
that is sufficient.
Example trace records formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : abrt_wt abort, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x<scsi_result_of_cmd_to_be_aborted>
SCSI retries : 0x<retries_of_cmd_to_be_aborted>
SCSI allowed : 0x<allowed_retries_of_cmd_to_be_aborted>
SCSI scribble : 0x<req_id_of_cmd_to_be_aborted>
SCSI opcode : <CDB_of_cmd_to_be_aborted>
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_wait LUN reset, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x... unrelated
SCSI retries : 0x.. unrelated
SCSI allowed : 0x.. unrelated
SCSI scribble : 0x... unrelated
SCSI opcode : ... unrelated
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For problem determination we need to see whether and why we were successful
or not. This allows deduction of scsi_eh escalation.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : schrh_r SCSI host reset handler result
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0xffffffff none (invalid)
SCSI LUN : 0xffffffff none (invalid)
SCSI LUN high : 0xffffffff none (invalid)
SCSI result : 0x00002002 field re-used for midlayer value: SUCCESS
or in other cases: 0x2009 == FAST_IO_FAIL
SCSI retries : 0xff none (invalid)
SCSI allowed : 0xff none (invalid)
SCSI scribble : 0xffffffffffffffff none (invalid)
SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid)
FCP rsp inf cod: 0xff none (invalid)
FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid)
00000000 00000000
v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") introduced the first return with something
other than the previously hardcoded single SUCCESS return path.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Calling qdio_release_memory() on error is just plain wrong. It frees
the main qdio_irq struct, when following code still uses it.
Also, no other error path in qdio_establish() does this. So trust
callers to clean up via qdio_free() if some step of the QDIO
initialization fails.
Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.")
Cc: <stable@vger.kernel.org> #v2.6.27+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Ever since CQ/QAOB support was added, calling qdio_free() straight after
qdio_alloc() results in qdio_release_memory() accessing uninitialized
memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a
kmem_cache_free() on the random AOB addresses.
For older kernels that don't have 6e30c549f6ca, the same applies if
qdio_establish() fails in the DEV_STATE_ONLINE check.
While initializing q->u.out.use_cq would be enough to fix this
particular bug, the more future-proof change is to just zero-alloc the
whole struct.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: <stable@vger.kernel.org> #v3.2+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Introduce compile time check for files which should avoid using .bss
section, because of the following reasons:
- .bss section has not been zeroed yet,
- initrd has not been moved to a safe location and could be overlapping
with .bss section.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
processed asynchronously and concurrently.
Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock
and then iterates with list_for_each() over the adapter's ERP ready list
without holding the ERP lock. This opens a race window in which the
current list entry can be moved to another list, causing list_for_each()
to iterate forever on the wrong list, as the erp_ready_head is never
encountered as terminal condition.
Meanwhile the ERP action can be processed in the ERP thread by
zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
lock and then calls zfcp_erp_action_to_running() to move the ERP action
from the ready to the running list. zfcp_erp_action_to_running() can
move the ERP action using list_move() just during the aforementioned
race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
held by the infinitely looping kworker, it effectively spins forever.
Example Sequence Diagram:
Process ERP Thread rport_work
------------------- ------------------- -------------------
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
zfcp_scsi_schedule_rports_block()
lock ERP zfcp_scsi_rport_work()
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
list_add_tail() on ready !(rport_task==RPORT_ADD)
wake_up() ERP thread zfcp_scsi_rport_block()
zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig()
unlock ERP lock DBF REC
zfcp_erp_wait() lock ERP
| zfcp_erp_action_to_running()
| list_for_each() ready
| list_move() current entry
| ready to running
| zfcp_dbf_rec_run() endless loop over running
| zfcp_dbf_rec_run_lvl()
| lock DBF REC spins forever
Any adapter recovery can trigger this, such as setting the device offline
or reboot.
V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport
during rport gone") introduced additional tracing of (un)blocking of
rports. It missed that the adapter->erp_lock must be held when calling
zfcp_dbf_rec_trig().
This fix uses the approach formerly introduced by commit aa0fec62391c
("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.").
Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
acquires and releases the adapter->erp_lock for read.
Reported-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone")
Cc: <stable@vger.kernel.org> # 2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Overlapping changes in selftests Makefile.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pick up urgent fixes to apply dependent cleanup patch
|
|
If READ MAC fails to fetch a valid MAC address, allow some more device
types (IQD and z/VM OSD) to fall back to a random address.
Also use eth_hw_addr_random(), for indicating to userspace that the
address type is NET_ADDR_RANDOM.
Note that while z/VM has various protection schemes to prohibit
custom addresses on its NICs, they are all optional. So we should at
least give it a try.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check if a qeth device supports IPv6 RX checksum offload, and hook it up
into the existing NETIF_F_RXCSUM support.
As NETIF_F_RXCSUM is now backed by a combination of HW Assists, we need
to be a little smarter when dealing with errors during a configuration
change:
- switching on NETIF_F_RXCSUM only makes sense if at least one HW Assist
was enabled successfully.
- for switching off NETIF_F_RXCSUM, all available HW Assists need to be
deactivated.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check if a qeth device supports IPv6 TX checksum offload, and advertise
NETIF_F_IPV6_CSUM accordingly. Add support for setting the relevant bits
in IPv6 packet descriptors.
Currently this has only limited use (ie. UDP, or Jumbo Frames). For any
TCP traffic with a standard MSS, the TCP checksum gets calculated
as part of the linear GSO segmentation.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add some wrappers to make the protocol-specific Assist code a little
more generic, and use them for sending protocol-agnostic commands in
the Checksum Offload Assist code.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For new functionality, the L2 subdriver will start using IPv6 assists.
So move the query from the L3 subdriver into the common setup path.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This matches the statistics we gather for the TX offload path.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kernel does its own validation of the IPv4 header checksum,
drivers/HW are not required to handle this.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This consolidates the checksum offload code that was duplicated
over the two qeth subdrivers.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial cleanup, in preparation for a subsequent patch.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct net_device contains a dev_port field. Store the OSA port number
in this field.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When removing a VLAN ID on a L3 device, the driver currently attempts to
walk and unregister the VLAN device's IP addresses.
This can be safely removed - before qeth_l3_vlan_rx_kill_vid() even gets
called, we receive an inet[6]addr event for each IP on the device and
qeth_l3_handle_ip_event() unregisters the address accordingly.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the vid_list is only accessed from process context, there's no need to
protect it with a spinlock.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both qeth sub drivers use the same QDIO queue handlers, there's no need
to expose them via the driver's discipline. No functional change.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the translation of a channel program fails, we may end up attempting
to clean up (free, unpin) stuff that never got translated (and allocated,
pinned) in the first place.
By adjusting the lengths of the chains accordingly (so the element that
failed, and all subsequent elements are excluded) cleanup activities
based on false assumptions can be avoided.
Let's make sure cp_free works properly after cp_prefetch returns with an
error by setting ch_len of a ccw chain to the number of the translated
CCWs on that chain.
Cc: stable@vger.kernel.org #v4.12+
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20180423110113.59385-2-bjsdjshi@linux.vnet.ibm.com>
[CH: fixed typos]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A couple of bug fixes:
- correct some CPU-MF counter names for z13 and z14
- correct locking in the vfio-ccw fsm_io_helper function
- provide arch_uretprobe_is_alive to avoid sigsegv with uretprobes
- fix a corner case with CPU-MF sampling in regard to execve
- fix expoline code revert for loadable modules
- update chpid descriptor for resource accessibility events
- fix dasd I/O errors due to outdated device alias infomation"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct module section names for expoline code revert
vfio: ccw: process ssch with interrupts disabled
s390: update sampling tag after task pid change
s390/cpum_cf: rename IBM z13/z14 counter names
s390/dasd: fix IO error for newly defined devices
s390/uprobes: implement arch_uretprobe_is_alive()
s390/cio: update chpid descriptor after resource accessibility event
|
|
When we call ssch, an interrupt might already be pending once we
return from the START SUBCHANNEL instruction. Therefore we need to
make sure interrupts are disabled while holding the subchannel lock
until after we're done with our processing.
Cc: stable@vger.kernel.org #v4.12+
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
When a new CKD storage volume is defined at the storage server, Linux
may be relying on outdated information about that volume, which leads to
the following errors:
1. Command Reject Errors for minidisk on z/VM:
dasd-eckd.b3193d: 0.0.XXXX: An error occurred in the DASD device driver,
reason=09
dasd(eckd): I/O status report for device 0.0.XXXX:
dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:02 CS:00
RC:0
dasd(eckd): device 0.0.2046: Failing CCW: 00000000XXXXXXXX
dasd(eckd): Sense(hex) 0- 7: 80 00 00 00 00 00 00 00
dasd(eckd): Sense(hex) 8-15: 00 00 00 00 00 00 00 00
dasd(eckd): Sense(hex) 16-23: 00 00 00 00 e1 00 0f 00
dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 00 00 00
dasd(eckd): 24 Byte: 0 MSG 0, no MSGb to SYSOP
2. Equipment Check errors on LPAR or for dedicated devices on z/VM:
dasd(eckd): I/O status report for device 0.0.XXXX:
dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:0E CS:40
fcxs:01 schxs:00 RC:0
dasd(eckd): device 0.0.9713: Failing TCW: 00000000XXXXXXXX
dasd(eckd): Sense(hex) 0- 7: 10 00 00 00 13 58 4d 0f
dasd(eckd): Sense(hex) 8-15: 67 00 00 00 00 00 00 04
dasd(eckd): Sense(hex) 16-23: e5 18 05 33 97 01 0f 0f
dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 04 58 0d
dasd(eckd): 24 Byte: 0 MSG f, no MSGb to SYSOP
Fix this problem by using the up-to-date information provided during
online processing via the device specific SNEQ to detect the case of
outdated LCU data. If there is a difference, perform a re-read of that
data.
Cc: stable@vger.kernel.org
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Channel path descriptors have been seen as something stable (as
long as the chpid is configured). Recent tests have shown that the
descriptor can also be altered when the link state of a channel path
changes. Thus it is necessary to update the descriptor during
handling of resource accessibility events.
Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
For z/VM NICs, qeth needs to consider which of the three CCW devices in
an MPC group it uses for requesting a managed MAC address.
On the Base device, the hypervisor returns a default MAC which is
pre-assigned when creating the NIC (this MAC is also returned by the
READ MAC primitive). Querying any other device results in the allocation
of an additional MAC address.
For consistency with READ MAC and to avoid using up more addresses than
necessary, it is preferable to use the NIC's default MAC. So switch the
the diag26c over to using a NIC's Read device, which should always be
identical to the Base device.
Fixes: ec61bd2fd2a2 ("s390/qeth: use diag26c to get MAC address on L2")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Submitting a cmd IO request (usually on the WRITE device, but for IDX
also on the READ device) is currently done with ccw_device_start()
and a manual timeout in the caller.
On timeout, the caller cleans up the related resources (eg. IO buffer).
But 1) the IO might still be active and utilize those resources, and
2) when the IO completes, qeth_irq() will attempt to clean up the
same resources again.
Instead of introducing additional resource locking, switch to
ccw_device_start_timeout() to ensure IO termination after timeout, and
let the IRQ handler alone deal with cleaning up after a request.
This also removes a stray write->irq_pending reset from
clear_ipacmd_list(). The routine doesn't terminate any pending IO on
the WRITE device, so this should be handled properly via IO timeout
in the IRQ handler.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When changing the MAC address on a L2 qeth device, current code first
unregisters the old address, then registers the new one.
If HW rejects the new address (or the IO fails), the device ends up with
no operable address at all.
Re-order the code flow so that the old address only gets dropped if the
new address was registered successfully. While at it, add logic to catch
some corner-cases.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creating the global workqueue during driver init may fail, deal with it.
Also, destroy the created workqueue on any subsequent error.
Fixes: 0f54761d167f ("qeth: Support VEPA mode")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|