summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-13RDMA/iw_cxgb4: atomic find and reference for listening endpointsHariprasad S
Add get_ep_from_stid() which will atomically find and reference the endpoint struct if found. This avoids touch-after-free races between threads destroying listening endpoints and the CPL processing thread processing an incoming PASS_ACCEPT_REQ CPL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Handle ULP accept/reject during ABORTINGHariprasad S
c4iw_reject() and c4iw_accept() need to handle the case where the endpoint has timed out and is in the middle of ABORTING the connection. Here is the flow that causes the BUG_ON() to fire on the server side: 1) offload connection setup and endpoint timer started 2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP 3) endpoint timer fires, and process_timeout() aborts the connection, this moves the endpoint state to ABORTING until HW sends up the ABORT_RPL_RSS. 4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls c4iw_reject_cr() to destroy this connection request. 5) WHAMO: BUG_ON() because the state is ABORTING. The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the operation if the state is not in MPA_REQ_RCVD vs in DEAD. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Release ep for for FPDU_MODE and MPA_REQ_RCVD in process_timeoutHariprasad S
ARP failure may also happen when ep in FPDU_MODE and these failures need to be handled by process_timeout(). process_timeout() also has to handle case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Free skb in case of arp failure in _c4iw_free_ep()Hariprasad S
Arp failure for send_mpa_reply/reject() is handled by freeing the mpa_skb in c4iw_free_ep() before releasing ep. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: atomically lookup ep and get a referenceHariprasad S
There is a race between ULP threads calling c4iw_ep_disconnect() via c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread can free the endpoint just after the ingress CPL thread finds the ep pointer in the tid table. To avoid this, we now use the hwtid_idr table for lookups instead of the LLD tid table so we can lock around insert, remove, and lookup+get_ep to avoid the race. The CPL handlers now will either find the ep ptr and have a ref on it, or not find it and they can discard the CPL. Callers of get_ep_from_tid() will have a ref on the ep if found, and thus must deref when they are done. Negative advice in peer_abort_intr() need to dereference the ep. therefore peer_abort() is scheduled to dereference the ep later. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Handle return value of c4iw_ofld_send() in abort_arp_failure()Hariprasad S
In abort_arp_failure(), the return value from c4iw_ofld_send() is ignored and thus if the CPL isn't sent, the endpoint is stuck and never gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and the ep resources are released in a safer context through process_work(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: in process_timeout() don't move ep state to ABORTINGHariprasad S
Moving the state to ABORTING causes the ep to get stuck because c4iw_ep_timeout() thinks the ABORT has already been done. So leave the state alone and let c4iw_ep_disconnect() do the right thing given the ep state. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: handle return value of c4iw_l2t_send() and send_mpa_req()Hariprasad S
->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is no handling of the return value of send_fw_act_open_req(). ->In send_fw_act_open_req(), there is no handling of return value of c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers on connection establish failure. ->send_mpa_req() should act on the return from c4iw_l2t_send() and return the error to the caller. ->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without starting the timer and not changing the ep state, which is further handled by act_establish() -> In act_establish()?if send_mpa_request's get_skb returns an error, may cause an ep leak. So handle return value of send_mpa_req() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiationHariprasad S
->Stop the ep timer after MPA negotiation so that the arp failures during send_mpa_reply/reject will be handled by process_timeout() after the ep timer expires. ->Added case MPA_REP_SENT in process_timeout(). ->For MPA reject, c4iw_ep_disconnect tries to start an already started timer, which leads to warning message "timer already started". -> In case of mpa reject stop the timer and call send_mpa_reject(). -> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer only for send_mpa_reply(), which is set in c4iw_accept_cr(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Do not stop timer in case of incomplete messagesHariprasad S
In case of incomplete mpa messages we should not stop timer as it results in return with timeout for the next mpa message Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: parent_ep has to be dereferenced in case of passive accept ↵Hariprasad S
failure -> On passive side of connection parent_ep referenced during connection request has to be dereferenced during the passive accept failure. -> As passive accept failure error handlinglogic runs in atomic context, the parent ep is dereferenced by scheduling work request. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: set the correct FID value in DSGL commandsHariprasad S
The FID value in a ULP_MEMIO command needs to be set to an IQ ID of a queue configured for our PF. The FID/IQ id is used to index into the PCIE FID table, to find out on which function the DMA needs to be issued. Essentially, every DMA needs to have the ingress queue. The exact ingress queue doesn't matter, but it needs to be an ingress queue associated with the function you want to see the DMA on. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Correct RFC number of MPAHariprasad S
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13RDMA/iw_cxgb4: Add few history bits for epHariprasad S
- add EP_DISC_FAIL history bit - add QP_REFED/DEREFED history bits - Add functions to ref/deref the cm_id and add history bit for the same - add CLOSE_CON_RPL history Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During v4.6-rc1 cgroup namespace support was merged. There is an issue where it's impossible to tell whether a given cgroup mount point is bind mounted or namespaced. Serge has been working on the issue but it took longer than expected to resolve, so the late pull request. Given that it's a completely new feature and the patches don't touch anything else, the risk seems acceptable. However, if this is too late, an alternative is plugging new cgroup ns creation for v4.6 and retrying for v4.7" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix compile warning kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding DOWN_PREPARE which can trigger a WARN_ON() in workqueue. The bug has been there for a very long time. It only triggers if CPU down fails at a specific point and I don't think it has adverse effects other than the warning messages. The fix is very low impact" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix rebind bound workers warning
2016-05-13e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctlJacob Keller
The e1000e_config_hwtstamp function was incorrectly resetting the SYSTIM registers every time the ioctl was being run. If you happened to be running ptp4l and lost the PTP connect (removing cable, or blocking the UDP traffic for example), then ptp4l will eventually perform a restart which involves re-requesting timestamp settings. In e1000e this has the unfortunate and incorrect result of resetting SYSTIME to the kernel time. Since kernel time is usually in UTC, and PTP time is in TAI, this results in the leap second being re-applied. Fix this by extracting the SYSTIME reset out into its own function, e1000e_ptp_reset, which we call during reset to restore the hardware registers. This function will (a) restart the timecounter based on the new system time, (b) restore the previous PPB setting, and (c) restore the previous hwtstamp settings. In order to perform (b), I had to modify the adjfreq ptp function pointer to store the old delta each time it is called. This also has the side effect of restoring the correct base timinca register correctly. The driver does not need to explicitly zero the ptp_delta variable since the entire adapter structure comes zero-initialized. Reported-by: Brian Walsh <brian@walsh.ws> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Brian Walsh <brian@walsh.ws> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13igb/igbvf: Add support for GSO partialAlexander Duyck
This patch adds support for partial GSO segmentation in the case of tunnels. Specifically with this change the driver an perform segmentation as long as the frame either has IPv6 inner headers, or we are allowed to mangle the IP IDs on the inner header. This is needed because we will not be modifying any fields from the start of the start of the outer transport header to the start of the inner transport header as we are treating them like they are just a block of IP options. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: mark shifted values as unsignedJacob Keller
The E1000_ICH_NVM_SIG_MASK value is shifted, out to the 31st bit, which is the signed bit for signed constants. Mark these values as unsigned to prevent compiler warnings and issues on platforms which a different signed bit implementation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: use BIT() macro for bit definesJacob Keller
This prevents signed bitshift issues when the shift would overwrite the signed bit, and prevents making this mistake in the future when copying and modifying code. Use GENMASK or the unsigned postfix for cases which aren't suitable for BIT() macro. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13igbvf: use BIT() macro instead of shiftsJacob Keller
To prevent signed bitshift issues, and improve code readability, use the BIT() macro. Also make use of GENMASK or the unsigned postfix where this is more appropriate than BIT() Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13igbvf: remove unused variable and dead codeJacob Keller
The variable rdlen is set but never used, and thus setting it is dead code. Remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13igb: adjust PTP timestamps for Tx/Rx latencyNathan Sullivan
Table 7-62 on page 338 of the i210 datasheet lists TX and RX latencies for the various speeds the chip supports. To give better PTP timestamp accuracy, adjust the timestamps by the amounts Intel gives based on current link speed. Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13ACPI / video: mark acpi_video_get_levels() inlineArnd Bergmann
A recent patch added a stub function for acpi_video_get_levels when CONFIG_ACPI_VIDEO is disabled. However, this is marked as 'static' and causes a warning about an unused function whereever the header gets included: In file included from ../drivers/gpu/drm/radeon/radeon_acpi.c:28:0: include/acpi/video.h:74:12: error: 'acpi_video_get_levels' defined but not used [-Werror=unused-function] This makes the declaration 'static inline', which gets rid of the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 059500940def (ACPI/video: export acpi_video_get_levels) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-13e1000e: e1000e_cyclecounter_read(): do overflow check only if neededDenys Vlasenko
SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0] er32(SYSTIML) are probably moderately expensive (they are pci bus reads). Can we avoid one of them? Yes, we can. If the SYSTIML value we see is smaller than 0xff000000, the overflow into SYSTIMH would require at least two increments. We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order. Even if one increment happens between them, the overflow into SYSTIMH is impossible, and we can avoid doing another er32(SYSTIML) read and overflow check. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow checkDenys Vlasenko
If two consecutive reads of the counter are the same, it is also not an overflow. "systimel_1 < systimel_2" should be "systimel_1 <= systimel_2". Before the patch, we could perform an *erroneous* correction: Let's say that systimel_1 == systimel_2 == 0xffffffff. "systimel_1 < systimel_2" is false, we think it's an overflow, we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented, and use "(systimeh << 32) + systimel_2" value which is 2^32 too large. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: intel-wired-lan@lists.osuosl.org Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64Denys Vlasenko
"incvalue" variable holds a result of "er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)" as a divisor. Thus, "u64 incvalue" declaration is probably a mistake. Even though it seems to be a harmless one, let's fix it. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13Merge tag 'qcom-soc-for-4.7-2' into net-nextBjorn Andersson
This merges the Qualcomm SOC tree with the net-next, solving the merge conflict in the SMD API between the two. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-05-13igb: make igb_update_pf_vlvf staticJacob Keller
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13igb: use BIT() macro or unsigned prefixJacob Keller
For bitshifts, we should make use of the BIT macro when possible, and ensure that other bitshifts are marked as unsigned. This helps prevent signed bitshift errors, and ensures similar style. Make use of GENMASK and the unsigned postfix where BIT() isn't appropriate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: Cleanup consistency in ret_val variable usageBrian Walsh
Fixed the file to use a consistent ret_val for return value checking. Signed-off-by: Brian Walsh <brian@walsh.ws> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13e1000e: fix ethtool autoneg off for non-copperSteve Shih
This patch fixes the issues for disabling auto-negotiation and forcing speed and duplex settings for the non-copper media. For non-copper media, e1000_get_settings should return ETH_TP_MDI_INVALID for eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent e1000_set_settings call would not fail with -EOPNOTSUPP. e1000_set_spd_dplx should not automatically turn autoneg back on for forced 1000 Mbps full duplex settings for non-copper media. Cc: xe-kernel@external.cisco.com Cc: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steve Shih <sshih@cisco.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13Merge branch 'for-4.6-fixes' into for-4.7Tejun Heo
2016-05-13MIPS: CM: Fix compilation error when !MIPS_CMTony Wu
Fix mips_cm_lock_other compilation error when MIPS_CM is not selected. This was introduced in commit 23d5de8efb9a (MIPS: CM: Introduce core-other locking functions) Signed-off-by: Tony Wu <tung7970@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/11698/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-13MIPS: Fix genvdso error on rebuildJames Hogan
The genvdso program modifies the debug and stripped versions of the VDSOs in place, and errors if the modification has already taken place. Unfortunately this means that a rebuild which tries to rerun genvdso to generate vdso*-image.c without also rebuilding vdso.so.dbg (for example if genvdso.c is modified) hits a build error like this: arch/mips/vdso/genvdso 'arch/mips/vdso/vdso.so.dbg' already contains a '.MIPS.abiflags' section This is fixed by reorganising the rules such that unmodified .so files have a .raw suffix, and these are copied in the same rule that runs genvdso on the copies. I.e. previously we had: cmd_vdsold: link objects -> vdso.so.dbg cmd_genvdso: strip vdso.so.dbg -> vdso.so run genvdso -> vdso-image.c and modify vdso.so.dbg and vdso.so in place Now we have: cmd_vdsold: link objects -> vdso.so.dbg.raw a new cmd_objcopy based strip rule (inspired by ARM): strip vdso.so.dbg.raw -> vdso.so.raw cmd_genvdso: copy vdso.so.dbg.raw -> vdso.so.dbg copy vdso.so.raw -> vdso.so run genvdso -> vdso-image.c and modify vdso.so.dbg and vdso.so in place Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13250/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-05-13ring-buffer: Prevent overflow of size in ring_buffer_resize()Steven Rostedt (Red Hat)
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Cc: stable@vger.kernel.org # 3.5+ Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "This is a revert to fix an interactivity problem. The proper fixes for the problems that the reverted commit exposed are now in sched/core (consisting of 3 patches), but were too risky for v4.6 and will arrive in the v4.7 merge window" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched/fair: Fix fairness issue on migration"
2016-05-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An uncharacteristically large number of bugs popped up in the last week: - various tooling fixes, two crashes and build problems - two Intel PT fixes - an KNL uncore driver fix - an Intel PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf stat: Fallback to user only counters when perf_event_paranoid > 1 perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback() perf evsel: Improve EPERM error handling in open_strerror() tools lib traceevent: Do not reassign parg after collapse_tree() perf probe: Check if dwarf_getlocations() is available perf dwarf: Guard !x86_64 definitions under #ifdef else clause perf tools: Use readdir() instead of deprecated readdir_r() perf thread_map: Use readdir() instead of deprecated readdir_r() perf script: Use readdir() instead of deprecated readdir_r() perf tools: Use readdir() instead of deprecated readdir_r() perf/core: Disable the event on a truncated AUX record perf/x86/intel/pt: Generate PMI in the STOP region as well perf/x86: Fix undefined shift on 32-bit kernels perf/x86/msr: Fix SMI overflow perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform perf diff: Fix duplicated output column
2016-05-13i40iw: pass hw_stats by reference rather than by valueColin Ian King
passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13i40iw: Remove unnecessary synchronize_irq() before free_irq()Lars-Peter Clausen
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13i40iw: constify i40iw_vf_cqp_ops structureJulia Lawall
The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13i40e: constify i40e_client_ops structureJulia Lawall
The i40e_client_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Do not register memory if never_register has been setBart Van Assche
This makes it easier to test the code path that does not use memory registration (srp_map_sg_dma()). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Prevent mapping failuresBart Van Assche
If both max_sectors and the queue_depth are high enough it can happen that the MR pool is depleted temporarily. This causes the SRP initiator to report mapping failures. Although the SRP initiator recovers from such mapping failures, prevent that this can happen by allocating more memory regions. Additionally, only enable memory registration if at least two pages can be registered per memory region. Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Swap two code blocks in srp_add_one()Bart Van Assche
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Enhance ib_map_mr_sg()Bart Van Assche
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Fix srp_create_target() error handlingBart Van Assche
Avoid that the following kernel oops occurs if memory pool allocation fails: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa048d0a0>] ib_drain_rq+0x0/0x20 [ib_core] Call Trace: [<ffffffffa04af386>] srp_create_target+0xca6/0x13a9 [ib_srp] [<ffffffff813cc863>] dev_attr_store+0x13/0x20 [<ffffffff81214b50>] sysfs_kf_write+0x40/0x50 [<ffffffff81213f1c>] kernfs_fop_write+0x13c/0x180 [<ffffffff81197683>] __vfs_write+0x23/0xf0 [<ffffffff81198744>] vfs_write+0xa4/0x1a0 [<ffffffff81199a44>] SyS_write+0x44/0xa0 [<ffffffff8159e3e9>] entry_SYSCALL_64_fastpath+0x1c/0xac Fixes: 1dc7b1f10dcb ("IB/srp: use the new CQ API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Fix a memory descriptor leak in an error pathBart Van Assche
If an error occurs after srp_fr_pool_get() succeeded and before the descriptor is stored in srp_map_state (*state->fr.next++ = desc) then srp_unmap_data() won't free the newly allocated memory descriptor. Hence free the descriptor explicitly. Fixes: f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Sagi Grimberg <sai@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Print "ib_srp: " prefix onceBart Van Assche
pr_debug() already prints prefix PFX. Avoid that PFX is printed twice if the debug statement in srp_add_target() is enabled. Fixes: 34aa654ecb8e ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/isert: convert to the generic RDMA READ/WRITE APIChristoph Hellwig
Replace the homegrown RDMA READ/WRITE code in isert with the generic API, which also adds iWarp support to the I/O path as a side effect. Note that full iWarp operation will need a few additional patches from Steve. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>