summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-08-27rxrpc: Use skb_unshare() rather than skb_cow_data()David Howells
The in-place decryption routines in AF_RXRPC's rxkad security module currently call skb_cow_data() to make sure the data isn't shared and that the skb can be written over. This has a problem, however, as the softirq handler may be still holding a ref or the Rx ring may be holding multiple refs when skb_cow_data() is called in rxkad_verify_packet() - and so skb_shared() returns true and __pskb_pull_tail() dislikes that. If this occurs, something like the following report will be generated. kernel BUG at net/core/skbuff.c:1463! ... RIP: 0010:pskb_expand_head+0x253/0x2b0 ... Call Trace: __pskb_pull_tail+0x49/0x460 skb_cow_data+0x6f/0x300 rxkad_verify_packet+0x18b/0xb10 [rxrpc] rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc] rxrpc_kernel_recv_data+0x126/0x240 [rxrpc] afs_extract_data+0x51/0x2d0 [kafs] afs_deliver_fs_fetch_data+0x188/0x400 [kafs] afs_deliver_to_call+0xac/0x430 [kafs] afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs] afs_make_call+0x282/0x3f0 [kafs] afs_fs_fetch_data+0x164/0x300 [kafs] afs_fetch_data+0x54/0x130 [kafs] afs_readpages+0x20d/0x340 [kafs] read_pages+0x66/0x180 __do_page_cache_readahead+0x188/0x1a0 ondemand_readahead+0x17d/0x2e0 generic_file_read_iter+0x740/0xc10 __vfs_read+0x145/0x1a0 vfs_read+0x8c/0x140 ksys_read+0x4a/0xb0 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by using skb_unshare() instead in the input path for DATA packets that have a security index != 0. Non-DATA packets don't need in-place encryption and neither do unencrypted DATA packets. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Use the tx-phase skb flag to simplify tracingDavid Howells
Use the previously-added transmit-phase skbuff private flag to simplify the socket buffer tracing a bit. Which phase the skbuff comes from can now be divined from the skb rather than having to be guessed from the call state. We can also reduce the number of rxrpc_skb_trace values by eliminating the difference between Tx and Rx in the symbols. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Add a private skb flag to indicate transmission-phase skbsDavid Howells
Add a flag in the private data on an skbuff to indicate that this is a transmission-phase buffer rather than a receive-phase buffer. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Abstract out rxtx ring cleanupDavid Howells
Abstract out rxtx ring cleanup into its own function from its two callers. This makes it easier to apply the same changes to both. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Pass the input handler's data skb reference to the Rx ringDavid Howells
Pass the reference held on a DATA skb in the rxrpc input handler into the Rx ring rather than getting an additional ref for this and then dropping the original ref at the end. Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Use info in skbuff instead of reparsing a jumbo packetDavid Howells
Use the information now cached in the skbuff private data to avoid the need to reparse a jumbo packet. We can find all the subpackets by dead reckoning, so it's only necessary to note how many there are, whether the last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK flag set. This is necessary as once recvmsg() can see the packet, it can start modifying it, such as doing in-place decryption. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27rxrpc: Improve jumbo packet countingDavid Howells
Improve the information stored about jumbo packets so that we don't need to reparse them so much later. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-08-27x86/boot/compressed/64: Fix missing initialization in ↵Kirill A. Shutemov
find_trampoline_placement() Gustavo noticed that 'new' can be left uninitialized if 'bios_start' happens to be less or equal to 'entry->addr + entry->size'. Initialize the variable at the begin of the iteration to the current value of 'bios_start'. Fixes: 0a46fff2f910 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table") Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box
2019-08-27drm/i915: Call dma_set_max_seg_size() in i915_driver_hw_probe()Lyude Paul
Currently, we don't call dma_set_max_seg_size() for i915 because we intentionally do not limit the segment length that the device supports. However, this results in a warning being emitted if we try to map anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG enabled: [ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer than device claims to support [len=98304] [max=65536] [ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220 debug_dma_map_sg+0x20f/0x340 This was originally brought up on https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus there was it wasn't really useful to set a limit (and that dma-debug isn't really all that useful for i915 in the first place). Unfortunately though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for various distro kernels. Since a WARN_ON() will disable automatic problem reporting (and cause any CI with said option enabled to start complaining), we really should just fix the problem. Note that as me and Chris Wilson discussed, the other solution for this would be to make DMA-API not make such assumptions when a driver hasn't explicitly set a maximum segment size. But, taking a look at the commit which originally introduced this behavior, commit 78c47830a5cb ("dma-debug: check scatterlist segments"), there is an explicit mention of this assumption and how it applies to devices with no segment size: Conversely, devices which are less limited than the rather conservative defaults, or indeed have no limitations at all (e.g. GPUs with their own internal MMU), should be encouraged to set appropriate dma_parms, as they may get more efficient DMA mapping performance out of it. So unless there's any concerns (I'm open to discussion!), let's just follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit to silence any warnings. Changes since v3: * Drop patch for enabling CONFIG_DMA_API_DEBUG_SG in CI. It looks like just turning it on causes the kernel to spit out bogus WARN_ONs() during some igt tests which would otherwise require teaching igt to disable the various DMA-API debugging options causing this. This is too much work to be worth it, since DMA-API debugging is useless for us. So, we'll just settle with this single patch to squelch WARN_ONs() during driver load for users that have CONFIG_DMA_API_DEBUG_SG turned on for some reason. * Move dma_set_max_seg_size() call into i915_driver_hw_probe() - Chris Wilson Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v4.18+ Link: https://patchwork.freedesktop.org/patch/msgid/20190823205251.14298-1-lyude@redhat.com (cherry picked from commit acd674af95d3f627062007429b9c195c6b32361d) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-08-27drm/i915/dp: Fix DSC enable code to use cpu_transcoder instead of encoder->typeManasi Navare
This patch fixes the intel_configure_pps_for_dsc_encoder() function to use cpu_transcoder instead of encoder->type to select the correct DSC registers that was wrongly used in the original patch for one DSC register isntance. Fixes: 7182414e2530 ("drm/i915/dp: Configure i915 Picture parameter Set registers during DSC enabling") Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190821215950.24223-1-manasi.d.navare@intel.com (cherry picked from commit d4c61c4a16decd8ace8660f22c81609a539fccba) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-08-27drm/i915: Don't deballoon unused ggtt drm_mm_node in linux guestXiong Zhang
The following call trace may exist in linux guest dmesg when guest i915 driver is unloaded. [ 90.776610] [drm:vgt_deballoon_space.isra.0 [i915]] deballoon space: range [0x0 - 0x0] 0 KiB. [ 90.776621] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0 [ 90.776691] IP: drm_mm_remove_node+0x4d/0x320 [drm] [ 90.776718] PGD 800000012c7d0067 P4D 800000012c7d0067 PUD 138e4c067 PMD 0 [ 90.777091] task: ffff9adab60f2f00 task.stack: ffffaf39c0fe0000 [ 90.777142] RIP: 0010:drm_mm_remove_node+0x4d/0x320 [drm] [ 90.777573] Call Trace: [ 90.777653] intel_vgt_deballoon+0x4c/0x60 [i915] [ 90.777729] i915_ggtt_cleanup_hw+0x121/0x190 [i915] [ 90.777792] i915_driver_unload+0x145/0x180 [i915] [ 90.777856] i915_pci_remove+0x15/0x20 [i915] [ 90.777890] pci_device_remove+0x3b/0xc0 [ 90.777916] device_release_driver_internal+0x157/0x220 [ 90.777945] driver_detach+0x39/0x70 [ 90.777967] bus_remove_driver+0x51/0xd0 [ 90.777990] pci_unregister_driver+0x23/0x90 [ 90.778019] SyS_delete_module+0x1da/0x240 [ 90.778045] entry_SYSCALL_64_fastpath+0x24/0x87 [ 90.778072] RIP: 0033:0x7f34312af067 [ 90.778092] RSP: 002b:00007ffdea3da0d8 EFLAGS: 00000206 [ 90.778297] RIP: drm_mm_remove_node+0x4d/0x320 [drm] RSP: ffffaf39c0fe3dc0 [ 90.778344] ---[ end trace f4b1bc8305fc59dd ]--- Four drm_mm_node are used to reserve guest ggtt space, but some of them may be skipped and not initialised due to space constraints in intel_vgt_balloon(). If drm_mm_remove_node() is called with uninitialized drm_mm_node, the above call trace occurs. This patch check drm_mm_node's validity before calling drm_mm_remove_node(). Fixes: ff8f797557c7("drm/i915: return the correct usable aperture size under gvt environment") Cc: stable@vger.kernel.org Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1566279978-9659-1-git-send-email-xiong.y.zhang@intel.com (cherry picked from commit 4776f3529d6b1e47f02904ad1d264d25ea22b27b) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-08-27drm/i915: Do not create a new max_bpc prop for MST connectorsVille Syrjälä
We're not allowed to create new properties after device registration so for MST connectors we need to either create the max_bpc property earlier, or we reuse one we already have. Let's do the latter apporach since the corresponding SST connector already has the prop and its min/max are correct also for the MST connector. The problem was highlighted by commit 4f5368b5541a ("drm/kms: Catch mode_object lifetime errors") which results in the following spew: [ 1330.878941] WARNING: CPU: 2 PID: 1554 at drivers/gpu/drm/drm_mode_object.c:45 __drm_mode_object_add+0xa0/0xb0 [drm] ... [ 1330.879008] Call Trace: [ 1330.879023] drm_property_create+0xba/0x180 [drm] [ 1330.879036] drm_property_create_range+0x15/0x30 [drm] [ 1330.879048] drm_connector_attach_max_bpc_property+0x62/0x80 [drm] [ 1330.879086] intel_dp_add_mst_connector+0x11f/0x140 [i915] [ 1330.879094] drm_dp_add_port.isra.20+0x20b/0x440 [drm_kms_helper] ... Cc: stable@vger.kernel.org Cc: Lyude Paul <lyude@redhat.com> Cc: sunpeng.li@amd.com Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sean Paul <sean@poorly.run> Fixes: 5ca0ef8a56b8 ("drm/i915: Add max_bpc property for DP MST") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190820161657.9658-1-ville.syrjala@linux.intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> (cherry picked from commit 1b9bd09630d4db4827cc04d358a41a16a6bc2cb0) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2019-08-26ice: fix adminq calls during removeHenry Tieman
The order of operations was incorrect in ice_remove(). The code would try to use adminq operations after the adminq was disabled. This caused all adminq calls to fail and possibly timeout waiting. Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Rework ice_ena_msix_rangeAnirudh Venkataramanan
The current implementation of ice_ena_msix_range is difficult to read and has subtle issues. This patch reworks the said function for clarity and correctness. More specifically, 1. Add more checks to bail out of 'needed' is greater than 'v_left'. 2. Simplify fallback logic 3. Do not set pf->num_avail_sw_msix in ice_ena_msix_range as it gets overwritten by ice_init_interrupt_scheme. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Fix VF configuration issues due to resetAkeem G Abodunrin
This patch fixes a critical reset issue that resulting to the server reboot when an Admin changes VF configuration on the host, for example changing VF to Trusted/non_Trusted mode, the PF driver send reset notification to AVF driver while also continue with reset flow. However, AVF driver schedule another reset due to notification, which causes two concurrent reset going on, and trigger lock up in the FW, with AQ call to delete VSI. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Alloc queue management bitmaps and arrays dynamicallyAnirudh Venkataramanan
The total number of queues available on the device is divided between multiple physical functions (PF) in the firmware and provided to the driver when it gets function capabilities from the firmware. Thus each PF knows how many Tx/Rx queues it has. These queues are then doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.) To track usage of these queues at the PF level, the driver uses two bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi instances) the driver uses two arrays txq_map and rxq_map, to track ownership of VSIs' queues in avail_txqs and avail_rxqs respectively. The aforementioned bitmaps and arrays should be allocated dynamically, because the number of queues supported by a PF is only available once function capabilities have been queried. The current static allocation consumes way more memory than required. This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs and instead uses bitmap_zalloc to allocate the bitmaps during init. Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays. As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed. Also as txq_map and rxq_map are now allocated and freed, some code reordering was required in ice_vsi_rebuild for correct functioning. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmapPaul Greenwalt
The VF driver can call VIRTCHNL_OP_[ENABLE|DISABLE]_QUEUES separately for each queue. Add support for virtchnl_queue_select.[tx|rx]_queues bitmap which is used to indicate which queues to enable and disable. Add tracing of VF Tx/Rx per queue enable state to avoid enabling enabled queues and disabling disabled queues. Add total queues enabled count and clear ICE_VF_STATE_QS_ENA when count is zero. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Peng Huang <peng.huang@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-27mfd: rk808: Mark pm functions __maybe_unusedArnd Bergmann
The newly added suspend/resume functions are only used if CONFIG_PM is enabled: drivers/mfd/rk808.c:752:12: error: 'rk8xx_resume' defined but not used [-Werror=unused-function] drivers/mfd/rk808.c:732:12: error: 'rk8xx_suspend' defined but not used [-Werror=unused-function] Mark them as __maybe_unused so the compiler can silently drop them when they are not needed. Fixes: 586c1b4125b3 ("mfd: rk808: Add RK817 and RK809 support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-08-26ice: add support for enabling/disabling single queuesMaciej Fijalkowski
Refactor the queue handling functions that are going through queue arrays in a way that the logic done for a single queue is pulled out and it will be called for each ring when traversing ring array. This implies that when disabling Tx rings we won't fill up q_ids, q_teids and q_handles arrays. Drop also 'offset' parameter; the value from vsi's txq_map is stored in ring->reg_idx and that drops the need for mentioned parameter. Introduce the ice_vsi_cfg_txq, ice_vsi_stop_tx_ring and ice_vsi_ctrl_rx_ring that are the functions with pulled out logic. There's several Tx queue meta data (q_id, q_handle, q_teid and other) that need to be set up during Tx queue disablement, so let's as well add a helper structure that wraps it up and a function that will be filling it up. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: fix potential infinite loopColin Ian King
The loop counter of a for-loop is a u8 however this is being compared to an int upper bound and this can lead to an infinite loop if the upper bound is greater than 255 since the loop counter will wrap back to zero. Fix this potential issue by making the loop counter an int. Addresses-Coverity: ("Infinite loop") Fixes: c7aeb4d1b9bf ("ice: Disable VFs until reset is completed") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: fix ice_is_tc_enaJacob Keller
ice_is_tc_ena is used to check whether a given traffic class is enabled. Because there are only 8 traffic classes, the function took a u8 bitmap. This causes problems because it is cast to an unsigned long causing a static analysis warning regarding Out-of-bounds read. Fix this by simply updating ice_is_tc_ena to take an unsigned long. Passing a u8 to this function should implicitly convert the value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: add validation in OP_CONFIG_VSI_QUEUES VF messageMichal Swiatkowski
Check num_queue_pairs to avoid access to unallocated field of vsi->tx_rings/vsi->rx_rings. Without this validation we can set vsi->alloc_txq/vsi->alloc_rxq to value smaller than ICE_MAX_BASE_QS_PER_VF and send this command with num_queue_pairs greater than vsi->alloc_txq/vsi->alloc_rxq. This lead to access to unallocated memory. In VF vsi alloc_txq and alloc_rxq should be the same. Get minimum because looks more readable. Also add validation for ring_len param. It should be greater than 32 and be multiple of 32. Incorrect value leads to hang traffic on PF. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Don't clog kernel debug log with VF MDD events errorsAkeem G Abodunrin
In case of MDD events on VF, don't clog kernel log with unlimited VF MDD events message "VF 0 has had 1018 MDD events since last boot" - limit events log message to 30, based on the observation in some experimentation with sending malicious packet once, and number of events reported before device stopped observing MDD events. Also removed defunct macro "ICE_DFLT_NUM_MDD_EVENTS_ALLOWED" for tracking number of MDD events allowed before disabling the interface... Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Introduce a local variable for a VSI in the rebuild pathKrzysztof Kazimierczak
When a VSI is accessed inside the ice_for_each_vsi macro in the rebuild path (ice_vsi_rebuild_all() and ice_vsi_replay_all()), it is referred to as pf->vsi[i]. Introduce local variables to improve readability. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: shorten local and add debug printsJesse Brandeburg
Add some verbose debugging for dyndbg to help us when we are having issues with link and/or PHY. While there, shorten some strings used by locals that were causing long line wrapping. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Sanitize ice_ena_vsi and ice_dis_vsiAnirudh Venkataramanan
1. ndo_open and ndo_stop are implemented by ice_open and ice_stop respectively. When enabling/disabling VSIs, just call ice_open/ice_stop instead of ndo_open/ndo_stop. 2. Rework logic around rtnl_lock/rtnl_unlock 3. In ice_ena_vsi, remove an unnecessary stack variable and return 0 instead of err when __ICE_NEEDS_RESTART is not set. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: added sibling head to parse nodesVictor Raj
There was a bug in the previous code which never traverses all the children to get the first node of the requested layer. Add a sibling head pointer to point the first node of each layer per TC. This helps traverse easier and quicker and also removes the recursion. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-26ice: Fix ethtool port and PFC stats for 4x25G cardsUsha Ketineni
This patch fixes the issue where port and PFC statistics counters are incrementing at the wrong port with 4x25G cards. Read the GLPRT port registers using lport parameter instead of pf_id to update the statistics otherwise the pf_ids are flipped for ports 2 and 3 when read from the HW register PF_FUNC_RID and this is expected as per hardware specification. Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-27KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handlingAlexey Kardashevskiy
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we do anything with the existing tables, there is a small window when a check in kvmppc_tce_validate might pass and right after that the guest alters the page of TCEs, causing an early exit from the handler and leaving srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap) (real mode) locked. This fixes the bug by jumping to the common exit code with an appropriate unlock. Cc: stable@vger.kernel.org # v4.11+ Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-08-26nfp: add AMDA0058 boards to firmware listJakub Kicinski
Add MODULE_FIRMWARE entries for AMDA0058 boards. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26r8169: improve DMA handling in rtl_rxHeiner Kallweit
Move the call to dma_sync_single_for_cpu after calling napi_alloc_skb. This avoids calling dma_sync_single_for_cpu w/o handing control back to device if the memory allocation should fail. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26Merge branch 'cls-hw-offload-rtnl'David S. Miller
Vlad Buslov says: ==================== Refactor cls hardware offload API to support rtnl-independent drivers Currently, all cls API hardware offloads driver callbacks require caller to hold rtnl lock when calling them. This patch set introduces new API that allows drivers to register callbacks that are not dependent on rtnl lock and unlocked classifiers to offload filters without obtaining rtnl lock first, which is intended to allow offloading tc rules in parallel. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are already registered with this flag and only take rtnl lock when qdisc or classifier requires it. Classifiers can indicate that their ops callbacks don't require caller to hold rtnl lock by setting the TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower classifier is now upstreamed. However, this implementation still obtains rtnl lock before calling hardware offloads API. Implement following cls API changes: - Introduce new "unlocked_driver_cb" flag to struct flow_block_offload to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Drivers that doesn't require users of its tc offload callbacks to hold rtnl lock sets the flag to true on block bind/unbind. Internally tcf_block is extended with additional lockeddevcnt counter that is used to count number of devices that require rtnl lock that block is bound to. When this counter is zero, tc_setup_cb_*() functions execute callbacks without obtaining rtnl lock. - Extend cls API single hardware rule update tc_setup_cb_call() function with tc_setup_cb_add(), tc_setup_cb_replace(), tc_setup_cb_destroy() and tc_setup_cb_reoffload() functions. These new APIs are needed to move management of block offload counter, filter in hardware counter and flag from classifier implementations to cls API, which is now responsible for managing them in concurrency-safe manner. Access to cb_list from callback execution code is synchronized by obtaining new 'cb_lock' rw_semaphore in read mode, which allows executing callbacks in parallel, but excludes any modifications of data from register/unregister code. tcf_block offloads counter type is changed to atomic integer to allow updating the counter concurrently. - Extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks which are used to notify unlocked classifiers when filter is successfully added or deleted to hardware without releasing cb_lock. This is necessary to update classifier state atomically with callback list traversal and updating of all relevant counters and allows unlocked classifiers to synchronize with concurrent reoffload without requiring any changes to driver callback API implementations. New tc flow_action infrastructure is also modified to allow its user to execute without rtnl lock protection. Function tc_setup_flow_action() is modified to conditionally obtain rtnl lock before accessing action state. Action data that is accessed by reference is either copied or reference counted to prevent concurrent action overwrite from deallocating it. New function tc_cleanup_flow_action() is introduced to cleanup/release all such data obtained by tc_setup_flow_action(). Flower classifier (only unlocked classifier at the moment) is modified to use new cls hardware offloads API and no longer obtains rtnl lock before calling it. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26drm/powerplay: Fix Vega20 power reading againKent Russell
For the 40.46 SMU release, they changed CurrSocketPower to AverageSocketPower, but this was changed back in 40.47 so just check if it's 40.46 and make the appropriate change Tested with 40.45, 40.46 and 40.47 successfully Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-26drm/powerplay: Fix Vega20 Average Power value v4Kent Russell
The SMU changed reading from CurrSocketPower to AverageSocketPower, so reflect this accordingly. This fixes the issue where Average Power Consumption was being reported as 0 from SMU 40.46-onward v2: Fixed headline prefix v3: Add check for SMU version for proper compatibility v4: Style fix Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-26net: sched: flower: don't take rtnl lock for cls hw offloads APIVlad Buslov
Don't manually take rtnl lock in flower classifier before calling cls hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held' parameter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: copy tunnel info when setting flow_action entry->tunnelVlad Buslov
In order to remove dependency on rtnl lock, modify tc_setup_flow_action() to copy tunnel info, instead of just saving pointer to tunnel_key action tunnel info. This is necessary to prevent concurrent action overwrite from releasing tunnel info while it is being used by rtnl-unlocked driver. Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info with all its options to dynamically allocated memory block. Modify tc_cleanup_flow_action() to free dynamically allocated tunnel info. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take reference to action dev before calling offloadsVlad Buslov
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take rtnl lock in tc_setup_flow_action()Vlad Buslov
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: conditionally obtain rtnl lock in cls hw offloads APIVlad Buslov
In order to remove dependency on rtnl lock from offloads code of classifiers, take rtnl lock conditionally before executing driver callbacks. Only obtain rtnl lock if block is bound to devices that require it. Block bind/unbind code is rtnl-locked and obtains block->cb_lock while holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions to prevent deadlock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: add API for registering unlocked offload block callbacksVlad Buslov
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Extend tcf_block with additional lockeddevcnt counter that is incremented for each non-unlocked driver callback attached to device. This counter is necessary to conditionally obtain rtnl lock before calling hardware callbacks in following patches. Register mlx5 tc block offload callbacks as "unlocked". Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: notify classifier on successful offload add/deleteVlad Buslov
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: refactor block offloads counter usageVlad Buslov
Without rtnl lock protection filters can no longer safely manage block offloads counter themselves. Refactor cls API to protect block offloadcnt with tcf_block->cb_lock that is already used to protect driver callback list and nooffloaddevcnt counter. The counter can be modified by concurrent tasks by new functions that execute block callbacks (which is safe with previous patch that changed its type to atomic_t), however, block bind/unbind code that checks the counter value takes cb_lock in write mode to exclude any concurrent modifications. This approach prevents race conditions between bind/unbind and callback execution code but allows for concurrency for tc rule update path. Move block offload counter, filter in hardware counter and filter flags management from classifiers into cls hardware offloads API. Make functions tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API private. Implement following new cls API to be used instead: tc_setup_cb_add() - non-destructive filter add. If filter that wasn't already in hardware is successfully offloaded, increment block offloads counter, set filter in hardware counter and flag. On failure, previously offloaded filter is considered to be intact and offloads counter is not decremented. tc_setup_cb_replace() - destructive filter replace. Release existing filter block offload counter and reset its in hardware counter and flag. Set new filter in hardware counter and flag. On failure, previously offloaded filter is considered to be destroyed and offload counter is decremented. tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block offloads counter. tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and call tc_cls_offload_cnt_update() if cb() didn't return an error. Refactor all offload-capable classifiers to atomically offload filters to hardware, change block offload counter, and set filter in hardware counter and flag by means of the new cls API functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: change tcf block offload counter type to atomic_tVlad Buslov
As a preparation for running proto ops functions without rtnl lock, change offload counter type to atomic. This is necessary to allow updating the counter by multiple concurrent users when offloading filters to hardware from unlocked classifiers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: protect block offload-related fields with rw_semaphoreVlad Buslov
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock' rwsem and use it to protect flow_block->cb_list and related counters from concurrent modification. The lock is taken in read mode for read-only traversal of cb_list in tc_setup_cb_call() and write mode in all other cases. This approach ensures that: - cb_list is not changed concurrently while filters is being offloaded on block. - block->nooffloaddevcnt is checked while holding the lock in read mode, but is only changed by bind/unbind code when holding the cb_lock in write mode to prevent concurrent modification. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26bpf: handle 32-bit zext during constant blindingNaveen N. Rao
Since BPF constant blinding is performed after the verifier pass, the ALU32 instructions inserted for doubleword immediate loads don't have a corresponding zext instruction. This is causing a kernel oops on powerpc and can be reproduced by running 'test_cgroup_storage' with bpf_jit_harden=2. Fix this by emitting BPF_ZEXT during constant blinding if prog->aux->verifier_zext is set. Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-26nfp: bpf: fix latency bug when updating stack index registerJiong Wang
NFP is using Local Memory to model stack. LM_addr could be used as base of a 16 32-bit word region of Local Memory. Then, if the stack offset is beyond the current region, the local index needs to be updated. The update needs at least three cycles to take effect, therefore the sequence normally looks like: local_csr_wr[ActLMAddr3, gprB_5] nop nop nop If the local index switch happens on a narrow loads, then the instruction preparing value to zero high 32-bit of the destination register could be counted as one cycle, the sequence then could be something like: local_csr_wr[ActLMAddr3, gprB_5] nop nop immed[gprB_5, 0] However, we have zero extension optimization that zeroing high 32-bit could be eliminated, therefore above IMMED insn won't be available for which case the first sequence needs to be generated. Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen") Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-26drm/amdgpu: fix dma_fence_wait without referenceChristian König
We need to grab a reference to the fence we wait for. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-26NFS: Fix writepage(s) error handling to not report errors twiceTrond Myklebust
If writepage()/writepages() saw an error, but handled it without reporting it, we should not be re-reporting that error on exit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26NFS: Fix spurious EIO read errorsTrond Myklebust
If the client attempts to read a page, but the read fails due to some spurious error (e.g. an ACCESS error or a timeout, ...) then we need to allow other processes to retry. Also try to report errors correctly when doing a synchronous readpage. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26pNFS/flexfiles: Don't time out requests on hard mountsTrond Myklebust
If the mount is hard, we should ignore the 'io_maxretrans' module parameter so that we always keep retrying. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>