summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-20s390/qeth: keep cmd alive after IO completionJulian Wiedmann
Current code releases the cmd struct after its initial IO has completed. Any reply processing is done independently, using a separate qeth_reply struct. In preparation for merging the cmd and reply structs together, take an additional reference on the cmd object so that it stays around all the way until qeth_send_control_data() returns. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20s390/qeth: use correct length field in SNMP cmd callbackJulian Wiedmann
qeth_snmp_command_cb() is the only cmd callback that pulls the reply's data length from a low-level transport header field. This requires additional complexity (ie. reply->offset) to make the header accessible to what is supposed to be a pure IPA cmd callback. Adapter cmds have a length field in their sub-cmd header, get the data length from there instead. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20s390/qeth: propagate length of processed cmd IO data to callbackJulian Wiedmann
When an cmd IO completes in qeth_irq(), calculate how much data was processed by the device and pass this value to the cmd's callback. This allows cmds that retrieve data from the device to check whether sufficient data was received, so we do that in qeth_read_conf_data_cb(). Suggested-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20s390/qeth: use node_descriptor structJulian Wiedmann
Rather than fumbling with hard-coded offsets, use the proper struct to access the retrieved RCD information. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20netdevsim: Fix build error without CONFIG_INETYueHaibing
If CONFIG_INET is not set, building fails: drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work': dev.c:(.text+0x67b): undefined reference to `ip_send_check' Use ip_fast_csum instead of ip_send_check to avoid dependencies on CONFIG_INET. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net/mlx5: Fix the order of fc_stats cleanupGavi Teitz
Previously, mlx5_cleanup_fc_stats() would cleanup the flow counter pool beofre releasing all the counters to it, which would result in flow counter bulks not getting freed. Resolve this by changing the order in which elements of fc_stats are cleaned up, so that the flow counter pool is cleaned up after all the counters are released. Also move cleanup actions for freeing the bulk query memory and destroying the idr to the end of mlx5_cleanup_fc_stats(). Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Fix deallocation of non-fully init encap entriesVlad Buslov
Recent rtnl lock dependency refactoring changed encap entry attach code to insert encap entry to hash table before it was fully initialized in order to allow concurrent tc users to wait on completion for encap entry to finish initialization. That change required all the users of encap entry to obtain reference to it first and for caller that creates encap to put reference to it on error, instead of freeing the entry memory directly. However, releasing reference to such encap entry that wasn't fully initialized causes NULL pointer dereference in mlx5e_rep_encap_entry_detach() which expects e->out_dev to be set and encap to be attached to nhe: [ 1092.454517] BUG: unable to handle page fault for address: 00000000000420e8 [ 1092.454571] #PF: supervisor read access in kernel mode [ 1092.454602] #PF: error_code(0x0000) - not-present page [ 1092.454632] PGD 800000083032c067 P4D 800000083032c067 PUD 84107d067 PMD 0 [ 1092.454673] Oops: 0000 [#1] SMP PTI [ 1092.454697] CPU: 20 PID: 22393 Comm: tc Not tainted 5.3.0-rc3+ #589 [ 1092.454733] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1092.454806] RIP: 0010:mlx5e_rep_encap_entry_detach+0x1c/0x630 [mlx5_core] [ 1092.454845] Code: be f4 ff ff ff e9 11 ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 30 <48> 8b 87 28 16 04 00 48 89 f7 48 05 d0 03 00 00 48 89 45 c8 e8 cb [ 1092.454942] RSP: 0018:ffffb6f08421f5a0 EFLAGS: 00010286 [ 1092.454974] RAX: 0000000000000000 RBX: ffff8ab668644e00 RCX: ffffb6f08421f56c [ 1092.455013] RDX: ffff8ab668644e40 RSI: ffff8ab668644e00 RDI: 0000000000000ac0 [ 1092.455053] RBP: ffffb6f08421f5f8 R08: 0000000000000001 R09: 0000000000000000 [ 1092.455092] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000ac0 [ 1092.455131] R13: 00000000ffffff9b R14: ffff8ab63f200ac0 R15: ffff8ab668644e40 [ 1092.455171] FS: 00007fa195bdc480(0000) GS:ffff8ab66fa00000(0000) knlGS:0000000000000000 [ 1092.455216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.455249] CR2: 00000000000420e8 CR3: 0000000867522001 CR4: 00000000001606e0 [ 1092.455288] Call Trace: [ 1092.455315] ? __mutex_unlock_slowpath+0x4d/0x2a0 [ 1092.455365] mlx5e_encap_dealloc.isra.0+0x31/0x60 [mlx5_core] [ 1092.455424] mlx5e_tc_add_fdb_flow+0x596/0x750 [mlx5_core] [ 1092.455484] __mlx5e_add_fdb_flow+0x152/0x210 [mlx5_core] [ 1092.455534] mlx5e_configure_flower+0x4d5/0xe30 [mlx5_core] [ 1092.455574] tc_setup_cb_call+0x67/0xb0 [ 1092.455601] fl_hw_replace_filter+0x142/0x300 [cls_flower] [ 1092.455639] fl_change+0xd24/0x1bdb [cls_flower] [ 1092.455675] tc_new_tfilter+0x3e0/0x970 [ 1092.455709] ? tc_del_tfilter+0x720/0x720 [ 1092.455735] rtnetlink_rcv_msg+0x389/0x4b0 [ 1092.455763] ? netlink_deliver_tap+0x95/0x400 [ 1092.455791] ? rtnl_dellink+0x2d0/0x2d0 [ 1092.455817] netlink_rcv_skb+0x49/0x110 [ 1092.455844] netlink_unicast+0x171/0x200 [ 1092.455872] netlink_sendmsg+0x224/0x3f0 [ 1092.455901] sock_sendmsg+0x5e/0x60 [ 1092.455924] ___sys_sendmsg+0x2ae/0x330 [ 1092.455950] ? task_work_add+0x43/0x50 [ 1092.455976] ? fput_many+0x45/0x80 [ 1092.456004] ? __lock_acquire+0x248/0x18e0 [ 1092.456033] ? find_held_lock+0x2b/0x80 [ 1092.456058] ? task_work_run+0x7b/0xd0 [ 1092.456085] __sys_sendmsg+0x59/0xa0 [ 1092.457013] do_syscall_64+0x5c/0xb0 [ 1092.457924] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1092.458842] RIP: 0033:0x7fa195da27b8 [ 1092.459918] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1092.462634] RSP: 002b:00007fff94409298 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1092.464011] RAX: ffffffffffffffda RBX: 000000005d515b0e RCX: 00007fa195da27b8 [ 1092.465391] RDX: 0000000000000000 RSI: 00007fff94409300 RDI: 0000000000000003 [ 1092.466761] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1092.468121] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1092.469456] R13: 0000000000480640 R14: 0000000000000016 R15: 0000000000000001 [ 1092.470766] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm irqbypass crct10dif_pclmul mei_me crc32_pclmul crc32 c_intel igb iTCO_wdt ghash_clmulni_intel ses mlxfw intel_cstate iTCO_vendor_support ptp intel_uncore lpc_ich pps_core mei i2c_i801 joydev intel_rapl_perf ioatdma enclosure ipmi_ssif pcspkr dca wmi ipmi_ si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 1092.479618] CR2: 00000000000420e8 [ 1092.481214] ---[ end trace ce2e0f4d9a67f604 ]--- To fix the issue, set e->compl_result to positive value after encap was initialized successfully. Check e->compl_result value in mlx5e_encap_dealloc() and only detach and dealloc encap if the value is positive. Fixes: d589e785baf5 ("net/mlx5e: Allow concurrent creation of encap entries") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20Documentation: net: mlx5: Devlink health documentation updatesAya Levin
Add documentation for devlink health rx reporter supported by mlx5. Update tx reporter documentation. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from CQE with error on RQAya Levin
Add support for report and recovery from error on completion on RQ by setting the queue back to ready state. Handle only errors with a syndrome indicating the RQ might enter error state and could be recovered. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: RX, Handle CQE with error at the earliest stageSaeed Mahameed
Just to be aligned with the MPWQE handlers, handle RX WQE with error for legacy RQs in the top RX handlers, just before calling skb_from_cqe(). CQE error handling will now be called at the same stage regardless of the RQ type or netdev mode NIC, Representor, IPoIB, etc .. This will be useful for down stream patch to improve error CQE handling. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from rx timeoutAya Levin
Add support for report and recovery from rx timeout. On driver open we post NOP work request on the rx channels to trigger napi in order to fillup the rx rings. In case napi wasn't scheduled due to a lost interrupt, perform EQ recovery. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from CQE error on ICOSQAya Levin
Add support for report and recovery from error on completion on ICOSQ. Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to ready state (firmware) and reset the ICOSQ and the RQ (software resources). Finally, activate the ICOSQ and the RQ. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Split open/close ICOSQ into stagesAya Levin
Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into open and activate where open handles creation and activate enables the queue. Do a symmetric thing in close flow: split into close and deactivate. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add support to rx reporter diagnoseAya Levin
Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add helper functions for reporter's basicsAya Levin
Introduce helper functions for create and destroy reporters and update channels. In the following patch, rx reporter is added and it will use these helpers too. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add cq info to tx reporter diagnoseAya Levin
Add cq information to general diagnose output: CQ size and stride size. Per SQ add information about the related CQ: cqn and CQ's HW status. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common Config": { "SQ": { "stride size": 64, "size": 1024 }, "CQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1030, "HW status": 0 } },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1034, "HW status": 0 } },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1038, "HW status": 0 } },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1042, "HW status": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Extend tx reporter diagnostics outputAya Levin
Enhance tx reporter's diagnostics output to include: information common to all SQs: SQ size, SQ stride size. In addition add channel ix, tc, txq ix, cc and pc. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common config: SQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common config": { "SQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Extend tx diagnose functionAya Levin
The following patches in the set enhance the diagnostics info of tx reporter. Therefore, it is better to pass a pointer to the SQ for further data extraction. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Generalize tx reporter's functionalityAya Levin
Prepare for code sharing with rx reporter, which is added in the following patches in the set. Introduce a generic error_ctx for agnostic recovery despatch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Change naming convention for reporter's functionsAya Levin
Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following patches in the set rx reporter is added, the new naming convention is more uniformed. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Rename reporter header fileAya Levin
Rename reporter.h -> health.h so patches in the set can use it for health related functionality. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net: fix __ip_mc_inc_group usageLi RongQing
in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected by __ip_mc_inc_group similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed here, so use ____ip_mc_inc_group(...) Fixes: 9fb20801dab4 ("net: Fix ip_mc_{dec,inc}_group allocation context") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net/ncsi: Ensure 32-bit boundary for data cksumTerry S. Duncan
The NCSI spec indicates that if the data does not end on a 32 bit boundary, one to three padding bytes equal to 0x00 shall be present to align the checksum field to a 32-bit boundary. Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20Merge branch 'net-dsa-enable-and-disable-all-ports'David S. Miller
Vivien Didelot says: ==================== net: dsa: enable and disable all ports The DSA stack currently calls the .port_enable and .port_disable switch callbacks for slave ports only. However, it is useful to call them for all port types. For example this allows some drivers to delay the optimization of power consumption after the switch is setup. This can also help reducing the setup code of drivers a bit. The first DSA core patches enable and disable all ports of a switch, regardless their type. The last mv88e6xxx patches remove redundant code from the driver setup and the said callbacks, now that they handle SERDES power for all ports. Changes in v2: do not guard .port_disable for broadcom switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: mv88e6xxx: wrap SERDES IRQ in power functionVivien Didelot
Now that mv88e6xxx_serdes_power is only called after driver setup, we can wrap the SERDES IRQ code directly within it for clarity. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: mv88e6xxx: enable SERDES after setupVivien Didelot
SERDES is powered on for CPU and DSA ports and powered down for unused ports at setup time. But now that DSA calls mv88e6xxx_port_enable and mv88e6xxx_port_disable for all ports, the SERDES power can now be handled after setup inconditionally for all ports. Using the port enable and disable callbacks also have the benefit to handle the SERDES IRQ for non user ports as well. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: mv88e6xxx: do not change STP state on port disablingVivien Didelot
When disabling a port, that is not for the driver to decide what to do with the STP state. This is already handled by the DSA layer. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: enable and disable all portsVivien Didelot
Call the .port_enable and .port_disable functions for all ports, not only the user ports, so that drivers may optimize the power consumption of all ports after a successful setup. Unused ports are now disabled on setup. CPU and DSA ports are now enabled on setup and disabled on teardown. User ports were already enabled at slave creation and disabled at slave destruction. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: do not enable or disable non user portsVivien Didelot
The .port_enable and .port_disable operations are currently only called for user ports, hence assuming they have a slave device. In preparation for using these operations for other port types as well, simply guard all implementations against non user ports and return directly in such case. Note that bcm_sf2_sw_suspend() currently calls bcm_sf2_port_disable() (and thus b53_disable_port()) against the user and CPU ports, so do not guards those functions. They will be called for unused ports in the future, but that was expected by those drivers anyway. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: dsa: use a single switch statement for port setupVivien Didelot
It is currently difficult to read the different steps involved in the setup and teardown of ports in the DSA code. Keep it simple with a single switch statement for each port type: UNUSED, CPU, DSA, or USER. Also no need to call devlink_port_unregister from within dsa_port_setup as this step is inconditionally handled by dsa_port_teardown on error. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20ice: Restructure VFs initialization flowsAkeem G Abodunrin
This patch restructures how VFs are configured, and resources allocated. Instead of freeing resources that were never allocated, and resetting empty VFs that have never been created - the new flow will just allocate resources for number of requested VFs based on the availability. During VFs initialization process, global interrupt is disabled, and rearmed after getting MSIX vectors for VFs. This allows immediate mailbox communications, instead of delaying it till later and VFs. PF communications resulted to using polling instead of actual interrupt. The issue manifested when creating higher number of VFs (128 VFs) per PF. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: Assume that more than one Rx queue is rare in ice_napi_pollBrett Creeley
Currently we divide budget by the number of Rx queues per Rx ring container in ice_napi_poll even if there is only 1. This is an unnecessary divide for the normal case of 1 Rx ring per Rx ring container. Fix this by using an unlikely() call in the case where we actually need to divide. Also, we will always set budget_per_ring even if there are no Rx rings in the Rx ring container so we don't need to initialize it to 0. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: Use the software based tail when checking for hung Tx ringBrett Creeley
Currently in ice_get_tx_pending we try to read a Tx ring's tail. This is then compared with the software based head (next_to_clean) to determine if we have pending work. This will never work because reading of the Tx ring's tail is no longer supported. Fix this by using the software based tail (next_to_use) to determine if there is pending work. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20net/smc: make sure EPOLLOUT is raisedJason Baron
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE being set when -EAGAIN is returned to ensure that EPOLLOUT is raised. Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior that Eric Dumazet introduced for tcp sockets. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20r8152: divide the tx and rx bottom functionsHayes Wang
Move the tx bottom function from NAPI to a new tasklet. Then, for multi-cores, the bottom functions of tx and rx may be run at same time with different cores. This is used to improve performance. On x86, Tx/Rx 943/943 Mbits/sec -> 945/944. For arm platform, Tx/Rx: 917/917 Mbits/sec -> 933/933. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - a few regression fixes for wacom driver (including fix for my earlier mismerge) from Aaron Armstrong Skomra and Jason Gerecke - revert of a few Logitech device ID additions which turn out to not work perfectly with the hidpp driver at the moment; proper support is now scheduled for 5.4. Fixes from Benjamin Tissoires - scheduling-in-atomic fix for cp2112 driver, from Benjamin Tissoires - new device ID to intel-ish, from Even Xu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: wacom: correct misreported EKR ring values HID: cp2112: prevent sleeping function called from invalid context HID: intel-ish-hid: ipc: add EHL device id HID: wacom: Correct distance scale for 2nd-gen Intuos devices HID: logitech-hidpp: remove support for the G700 over USB Revert "HID: logitech-hidpp: add USB PID for a few more supported mice" HID: wacom: add back changes dropped in merge commit
2019-08-20infiniband: hfi1: fix memory leaksWenwen Wang
In fault_opcodes_write(), 'data' is allocated through kcalloc(). However, it is not deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20infiniband: hfi1: fix a memory leak bugWenwen Wang
In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get() fails, leading to a memory leak. To fix this bug, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx4: Fix memory leaksWenwen Wang
In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through kcalloc(). However, it is not always deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, free 'tun_qp->tx_ring' whenever an error occurs. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Acked-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20RDMA/cma: fix null-ptr-deref Read in cma_cleanupzhengbin
In cma_init, if cma_configfs_init fails, need to free the previously memory and return fail, otherwise will trigger null-ptr-deref Read in cma_cleanup. cma_cleanup cma_configfs_exit configfs_unregister_subsystem Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx5: Block MR WR if UMR is not possibleMoni Shoua
Check conditions that are mandatory to post_send UMR WQEs. 1. Modifying page size. 2. Modifying remote atomic permissions if atomic access is required. If either condition is not fulfilled then fail to post_send() flow. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-9-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx5: Fix MR re-registration flow to use UMR properlyMoni Shoua
The UMR WQE in the MR re-registration flow requires that modify_atomic and modify_entity_size capabilities are enabled. Therefore, check that the these capabilities are present before going to umr flow and go through slow path if not. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx5: Report and handle ODP support properlyMoni Shoua
ODP depends on the several device capabilities, among them is the ability to send UMR WQEs with that modify atomic and entity size of the MR. Therefore, only if all conditions to send such a UMR WQE are met then driver can report that ODP is supported. Use this check of conditions in all places where driver needs to know about ODP support. Also, implicit ODP support depends on ability of driver to send UMR WQEs for an indirect mkey. Therefore, verify that all conditions to do so are met when reporting support. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx5: Consolidate use_umr checks into single functionMoni Shoua
Introduce helper function to unify various use_umr checks. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20RDMA/restrack: Rewrite PID namespace check to be reliableLeon Romanovsky
task_active_pid_ns() is wrong API to check PID namespace because it posses some restrictions and return PID namespace where the process was allocated. It created mismatches with current namespace, which can be different. Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable results without any relation to allocated PID namespace. Fixes: 8be565e65fa9 ("RDMA/nldev: Factor out the PID namespace check") Fixes: 6a6c306a09b5 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20RDMA/counters: Properly implement PID checksLeon Romanovsky
"Auto" configuration mode is called for visible in that PID namespace and it ensures that all counters and QPs are coexist in the same namespace and belong to same PID. Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/core: Fix NULL pointer dereference when bind QP to counterIdo Kalir
If QP is not visible to the pid, then we try to decrease its reference count and return from the function before the QP pointer is initialized. This lead to NULL pointer dereference. Fix it by pass directly the res to the rdma_restract_put as arg instead of &qp->res. This fixes below call trace: [ 5845.110329] BUG: kernel NULL pointer dereference, address: 00000000000000dc [ 5845.120482] Oops: 0002 [#1] SMP PTI [ 5845.129119] RIP: 0010:rdma_restrack_put+0x5/0x30 [ib_core] [ 5845.169450] Call Trace: [ 5845.170544] rdma_counter_get_qp+0x5c/0x70 [ib_core] [ 5845.172074] rdma_counter_bind_qpn_alloc+0x6f/0x1a0 [ib_core] [ 5845.173731] nldev_stat_set_doit+0x314/0x330 [ib_core] [ 5845.175279] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [ 5845.176772] ? __kmalloc_node_track_caller+0x20b/0x2b0 [ 5845.178321] rdma_nl_rcv+0xcb/0x120 [ib_core] [ 5845.179753] netlink_unicast+0x179/0x220 [ 5845.181066] netlink_sendmsg+0x2d8/0x3d0 [ 5845.182338] sock_sendmsg+0x30/0x40 [ 5845.183544] __sys_sendto+0xdc/0x160 [ 5845.184832] ? syscall_trace_enter+0x1f8/0x2e0 [ 5845.186209] ? __audit_syscall_exit+0x1d9/0x280 [ 5845.187584] __x64_sys_sendto+0x24/0x30 [ 5845.188867] do_syscall_64+0x48/0x120 [ 5845.190097] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1bd8e0a9d0fd1 ("RDMA/counter: Allow manual mode configuration support") Signed-off-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Drop stale TID RDMA packets that cause TIDErrKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order. A stale TID RDMA data packet could lead to TidErr if the TID entries have been released by duplicate data packets generated from retries, and subsequently erroneously force the qp into error state in the current implementation. Since the payload has already been dropped by hardware, the packet can be simply dropped and it is no longer necessary to put the qp into error state. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packetKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA WRITE DATA packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: d72fe7d5008b ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Add additional checks when handling TID RDMA READ RESP packetKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA READ RESP packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>