summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-06bcache: Update continue_at() documentationDan Carpenter
continue_at() doesn't have a return statement anymore. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: silence static checker warningDan Carpenter
In olden times, closure_return() used to have a hidden return built in. We removed the hidden return but forgot to add a new return here. If "c" were NULL we would oops on the next line, but fortunately "c" is never NULL. Let's just remove the if statement. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix for gc and write-back raceTang Junhui
gc and write-back get raced (see the email "bcache get stucked" I sended before): gc thread write-back thread | |bch_writeback_thread() |bch_gc_thread() | | |==>read_dirty() |==>bch_btree_gc() | |==>btree_root() //get btree root | | //node write locker | |==>bch_btree_gc_root() | | |==>read_dirty_submit() | |==>write_dirty() | |==>continue_at(cl, | | write_dirty_finish, | | system_wq); | |==>write_dirty_finish()//excute | | //in system_wq | |==>bch_btree_insert() | |==>bch_btree_map_leaf_nodes() | |==>__bch_btree_map_nodes() | |==>btree_root //try to get btree | | //root node read | | //lock | |-----stuck here |==>bch_btree_set_root() |==>bch_journal_meta() |==>bch_journal() |==>journal_try_write() |==>journal_write_unlocked() //journal_full(&c->journal) | //condition satisfied |==>continue_at(cl, journal_write, system_wq); //try to excute | //journal_write in system_wq | //but work queue is excuting | //write_dirty_finish() |==>closure_sync(); //wait journal_write execute | //over and wake up gc, |-------------stuck here |==>release root node write locker This patch alloc a separate work-queue for write-back thread to avoid such race. (Commit log re-organized by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: increase the number of open bucketsTang Junhui
In currently, we only alloc 6 open buckets for each cache set, but in usually, we always attach about 10 or so backend devices for each cache set, and the each bcache device are always accessed by about 10 or so threads in top application layer. So 6 open buckets are too few, It has led to that each of the same thread write data to different buckets, which would cause low efficiency write-back, and also cause buckets inefficient, and would be Very easy to run out of. I add debug message in bch_open_buckets_alloc() to print alloc bucket info, and test with ten bcache devices with a cache set, and each bcache device is accessed by ten threads. From the debug message, we can see that, after the modification, One bucket is more likely to assign to the same thread, and the data from the same thread are more likely to write the same bucket. Usually the same thread always write/read the same backend device, so it is good for write-back and also promote the usage efficiency of buckets. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Correct return value for sysfs attach errorsTony Asleson
If you encounter any errors in bch_cached_dev_attach it will return a negative error code. The variable 'v' which stores the result is unsigned, thus user space sees a very large value returned for bytes written which can cause incorrect user space behavior. Utilize 1 signed variable to use throughout the function to preserve error return capability. Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: correct cache_dirty_target in __update_writeback_rate()Tang Junhui
__update_write_rate() uses a Proportion-Differentiation Controller algorithm to control writeback rate. A dirty target number is used in this PD controller to control writeback rate. A larger target number will make the writeback rate smaller, on the versus, a smaller target number will make the writeback rate larger. bcache uses the following steps to calculate the target number, 1) cache_sectors = all-buckets-of-cache-set * buckets-size 2) cache_dirty_target = cache_sectors * cached-device-writeback_percent 3) target = cache_dirty_target * (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set) The calculation at step 1) for cache_sectors is incorrect, which does not consider dirty blocks occupied by flash only volume. A flash only volume can be took as a bcache device without cached device. All data sectors allocated for it are persistent on cache device and marked dirty, they are not touched by bcache writeback and garbage collection code. So data blocks of flash only volume should be ignore when calculating cache_sectors of cache set. Current code does not subtract dirty sectors of flash only volume, which results a larger target number from the above 3 steps. And in sequence the cache device's writeback rate is smaller then a correct value, writeback speed is slower on all cached devices. This patch fixes the incorrect slower writeback rate by subtracting dirty sectors of flash only volumes in __update_writeback_rate(). (Commit log composed by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: gc does not work when triggering by manual commandTang Junhui
I try to execute the following command to trigger gc thread: [root@localhost internal]# echo 1 > trigger_gc But it does not work, I debug the code in gc_should_run(), It works only if in invalidating or sectors_to_gc < 0. So set sectors_to_gc to -1 to meet the condition when we trigger gc by manual command. (Code comments aded by Coly Li) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Don't reinvent the wheel but use existing llist APIByungchul Park
Although llist provides proper APIs, they are not used. Make them used. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: do not subtract sectors_to_gc for bypassed IOTang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix sequential large write IO bypassTang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache mode, these IOs were expected to be bypassed, but actually they did not. We debug the code, and find in check_should_bypass(): if (!congested && mode == CACHE_MODE_WRITEBACK && op_is_write(bio_op(bio)) && (bio->bi_opf & REQ_SYNC)) goto rescale that means, If in writeback mode, a write IO with REQ_SYNC flag will not be bypassed though it is a sequential large IO, It's not a correct thing to do actually, so this patch remove these codes. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Fix leak of bdev referenceJan Kara
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the block device using lookup_bdev() to detect which situation we are in to properly report error. However we never drop the reference returned to us from lookup_bdev(). Fix that. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06mac80211: fix deadlock in driver-managed RX BA session startJohannes Berg
When an RX BA session is started by the driver, and it has to tell mac80211 about it, the corresponding bit in tid_rx_manage_offl gets set and the BA session work is scheduled. Upon testing this bit, it will call __ieee80211_start_rx_ba_session(), thus deadlocking as it already holds the ampdu_mlme.mtx, which that acquires again. Fix this by adding ___ieee80211_start_rx_ba_session(), a version of the function that requires the mutex already held. Cc: stable@vger.kernel.org Fixes: 699cb58c8a52 ("mac80211: manage RX BA session offload without SKB queue") Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06mac80211: Complete ampdu work schedule during session tear downIlan peer
Commit 7a7c0a6438b8 ("mac80211: fix TX aggregation start/stop callback race") added a cancellation of the ampdu work after the loop that stopped the Tx and Rx BA sessions. However, in some cases, e.g., during HW reconfig, the low level driver might call mac80211 APIs to complete the stopping of the BA sessions, which would queue the ampdu work to handle the actual completion. This work needs to be performed as otherwise mac80211 data structures would not be properly synced. Fix this by checking if BA session STOP_CB bit is set after the BA session cancellation and properly clean the session. Signed-off-by: Ilan Peer <ilan.peer@intel.com> [Johannes: the work isn't flushed because that could do other things we don't want, and the locking situation isn't clear] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06cfg80211: honor NL80211_RRF_NO_HT40{MINUS,PLUS}Emmanuel Grumbach
Honor the NL80211_RRF_NO_HT40{MINUS,PLUS} flags in reg_process_ht_flags_channel. Not doing so leads can lead to a firmware assert in iwlwifi for example. Fixes: b0d7aa59592b ("cfg80211: allow wiphy specific regdomain management") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06genirq/msi: Fix populating multiple interruptsJohn Keeping
On allocating the interrupts routed via a wire-to-MSI bridge, the allocator iterates over the MSI descriptors to build the hierarchy, but fails to use the descriptor interrupt number, and instead uses the base number, generating the wrong IRQ domain mappings. The fix is to use the MSI descriptor interrupt number when setting up the interrupt instead of the base interrupt for the allocation range. The only saving grace is that although the MSI descriptors are allocated in bulk, the wired interrupts are only allocated one by one (so desc->irq == virq) and the bug went unnoticed so far. Fixes: 2145ac9310b60 ("genirq/msi: Add msi_domain_populate_irqs") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170906103540.373864a2.john@metanate.com
2017-09-06s390/mm: use a single lock for the fields in mm_context_tMartin Schwidefsky
The three locks 'lock', 'pgtable_lock' and 'gmap_lock' in the mm_context_t can be reduced to a single lock. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/mm: fix race on mm->context.flush_mmMartin Schwidefsky
The order in __tlb_flush_mm_lazy is to flush TLB first and then clear the mm->context.flush_mm bit. This can lead to missed flushes as the bit can be set anytime, the order needs to be the other way aronud. But this leads to a different race, __tlb_flush_mm_lazy may be called on two CPUs concurrently. If mm->context.flush_mm is cleared first then another CPU can bypass __tlb_flush_mm_lazy although the first CPU has not done the flush yet. In a virtualized environment the time until the flush is finally completed can be arbitrarily long. Add a spinlock to serialize __tlb_flush_mm_lazy and use the function in finish_arch_post_lock_switch as well. Cc: <stable@vger.kernel.org> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/mm: fix local TLB flushing vs. detach of an mm address spaceMartin Schwidefsky
The local TLB flushing code keeps an additional mask in the mm.context, the cpu_attach_mask. At the time a global flush of an address space is done the cpu_attach_mask is copied to the mm_cpumask in order to avoid future global flushes in case the mm is used by a single CPU only after the flush. Trouble is that the reset of the mm_cpumask is racy against the detach of an mm address space by switch_mm. The current order is first the global TLB flush and then the copy of the cpu_attach_mask to the mm_cpumask. The order needs to be the other way around. Cc: <stable@vger.kernel.org> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize AP queue interrupt controlHarald Freudenberger
KVM has a need to control the interrupts on real and virtualized AP queue devices. This fix provides a new function to control the interrupt facilities of an AP queue device. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize AP config info queryHarald Freudenberger
KVM has a need to fetch the crypto configuration information as it is returned by the PQAP(QCI) instruction. This patch introduces a new API ap_query_configuration() which provides this info in a handy way for the caller. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize test AP queueTony Krowiak
Under certain specified conditions, the Test AP Queue (TAPQ) subfunction of the Process Adjunct Processor Queue (PQAP) instruction will be intercepted by a guest VM. The guest VM must have a means for executing the intercepted instruction. The vfio_ap driver will provide an interface to execute the PQAP(TAPQ) instruction subfunction on behalf of a guest VM. The code for executing the AP instructions currently resides in the AP bus. This patch refactors the AP bus code to externalize access to the PQAP(TAPQ) instruction subfunction to make it available to the vfio_ap driver. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade]Martin Schwidefsky
The BUG_ON in crst_table_[upgrade|downgrade] is a debugging aid, replace it with VM_BUG_ON. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-05Merge branch 'next/dt64' into next/dtOlof Johansson
* next/dt64: (233 commits) arm64: dts: marvell: mcbin: enable more networking ports arm64: dts: marvell: add a reference to the sysctrl syscon in the ppv2 node arm64: dts: marvell: add TX interrupts for PPv2.2 arm64: dts: uniphier: add PXs3 SoC support arm64: dts: uniphier: fix size of sdctrl node arm64: dts: uniphier: add AIDET nodes arm64: dts: uniphier: add reset controller node of analog amplifier arm64: dts: marvell: add Device Tree files for Armada-8KP arm64: dts: rockchip: add Haikou baseboard with RK3399-Q7 SoM arm64: dts: rockchip: add RK3399-Q7 (Puma) SoM dt-bindings: add rk3399-q7 SoM arm64: dts: rockchip: add rk3328-rock64 board arm64: dts: rockchip: add rk3328 pdm node ARM64: dts: meson-gxl-libretech-cc: Add GPIO lines names ARM64: dts: meson-gx: Add AO CEC nodes ARM64: dts: meson-gx: update AO clkc to new bindings arm64: dts: rockchip: add more rk3399 iommu nodes arm64: dts: rockchip: add rk3368 iommu nodes arm64: dts: rockchip: add rk3328 iommu nodes arm64: zynqmp: Add generic compatible string for I2C EEPROM ...
2017-09-05Merge branch 'next/defconfig' into next/socOlof Johansson
* next/defconfig: (45 commits) ARM: multi_v7_defconfig: make eSDHC driver built-in ARM: config: aspeed: Add I2C, VUART, LPC Snoop ARM: configs: aspeed: Update Aspeed G4 with VMSPLIT_2G ARM: davinci_all_defconfig: enable tinydrm and ST7586 ARM: defconfig: tegra: Enable ChipIdea UDC driver ARM: configs: Add Tegra I2S interfaces to multi_v7_defconfig ARM: tegra: Add Tegra I2S interfaces to defconfig ARM: tegra: Update default configuration for v4.13-rc1 ARM: multi_v7_defconfig: add CONFIG_BRCMSTB_THERMAL ARM: omap2plus_defconfig: Enable LP87565 ARM: omap2plus_defconfig: enable DP83867 phy driver ARM: configs: keystone: Enable D_CAN driver ARM: shmobile: Enable BQ32000 rtc in shmobile_defconfig ARM: configs: keystone: Enable MMC and regulators ARM: multi_v7_defconfig: Enable DMA for Renesas serial ports ARM: multi_v7_defconfig: Replace DRM_RCAR_HDMI by generic bridge options ARM: multi_v7_defconfig: Replace SND_SOC_RSRC_CARD by SND_SIMPLE_SCU_CARD ARM: shmobile: defconfig: Refresh ARM: shmobile: defconfig: Enable DMA for serial ports ARM: shmobile: defconfig: Replace DRM_RCAR_HDMI by generic bridge options ...
2017-09-05Merge branch 'next/arm64' into next/socOlof Johansson
* next/arm64: arm64: defconfig: enable rockchip graphics arm64: defconfig: Enable QCOM IPQ8074 clock and pinctrl arm64: defconfig: add CONFIG_BRCMSTB_THERMAL arm64: defconfig: add recently added crypto drivers as modules arm64: defconfig: enable CONFIG_UNIPHIER_WATCHDOG arm64: defconfig: Enable CONFIG_WQ_POWER_EFFICIENT_DEFAULT arm64: defconfig: enable DMA driver for hi3660 arm64: defconfig: enable OP-TEE arm64: defconfig: enable support for serial port connected device arm64: defconfig: enable CONFIG_SYSCON_REBOOT_MODE arm64: defconfig: enable support hi6421v530 PMIC arm64: defconfig: enable Kirin PCIe arm64: defconfig: enable SCSI_HISI_SAS_PCI arm64: defconfig: Enable REGULATOR_AXP20X arm64: defconfig: Enable MFD_AXP20X_RSB arm64: select PINCTRL for ZTE platform arm64: defconfig: enable fine-grained task level IRQ time accounting arm64: defconfig: compile ak4613 and renesas sound as modules arm64: defconfig: enable nop-xceiv PHY driver
2017-09-05Merge branch 'next/cleanup' into next/socOlof Johansson
* next/cleanup: soc: versatile: remove unnecessary static in realview_soc_probe() ARM: Convert to using %pOF instead of full_name ARM: hisi: Fix typo in comment ARM: OMAP4+: PRM: fix of_irq_get() result checks ARM: OMAP3+: PRM: fix of_irq_get() result check ARM: dts: dra72-evm-revc: workaround incorrect DP83867 RX_CTRL pin strap ARM: dts: dra71-evm: workaround incorrect DP83867 RX_CTRL pin strap ARM: OMAP2+: omap_device: drop broken RPM status update from suspend_noirq bus: omap-ocp2scp: Fix error handling in omap_ocp2scp_probe
2017-09-05f2fs: use generic terms used for encrypted block managementJaegeuk Kim
This patch renames functions regarding to buffer management via META_MAPPING used for encrypted blocks especially. We can actually use them in generic way. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-05f2fs: introduce f2fs_encrypted_file for clean-upJaegeuk Kim
This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with f2fs_encrypted_file(), which gives no functional change. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-05Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2017-09-05 This series contains fixes for i40e only. These two patches fix an issue where our nvmupdate tool does not work on RHEL 7.4 and newer kernels, in fact, the use of the nvmupdate tool on newer kernels can cause the cards to be non-functional unless these patches are applied. Anjali reworks the locking around accessing the NVM so that NVM acquire timeouts do not occur which was causing the failed firmware updates. Jake correctly updates the wb_desc when reading the NVM through the AdminQ. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-09-05selftests: Enhance kselftest_harness.h to print which assert failedMickaël Salaün
When a test process is not able to write to TH_LOG_STREAM, this step mechanism enable to print the assert number which triggered the failure. This can be enabled by setting _metadata->no_print to true at the beginning of the test sequence. Update the seccomp-bpf test to return 0 if a test succeeded. This feature is needed for the Landlock tests. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Link: https://lkml.kernel.org/r/CAGXu5j+D-FP8Kt9unNOqKrQJP4DYTpmgkJxWykZyrYiVPz3Y3Q@mail.gmail.com Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-05i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aqJacob Keller
When introducing the functions to read the NVM through the AdminQ, we did not correctly mark the wb_desc. Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722", 2015-06-05) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05i40e: avoid NVM acquire deadlock during NVM updateAnjali Singhai Jain
X722 devices use the AdminQ to access the NVM, and this requires taking the AdminQ lock. Because of this, we lock the AdminQ during i40e_read_nvm(), which is also called in places where the lock is already held, such as the firmware update path which wants to lock once and then unlock when finished after performing several tasks. Although this should have only affected X722 devices, commit 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02) added locking for all NVM reads, regardless of device family. This resulted in us accidentally causing NVM acquire timeouts on all devices, causing failed firmware updates which left the eeprom in a corrupt state. Create unsafe non-locked variants of i40e_read_nvm_word and i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer respectively. These variants will not take the NVM lock and are expected to only be called in places where the NVM lock is already held if needed. Since the only caller of i40e_read_nvm_buffer() was in such a path, remove it entirely in favor of the unsafe version. If necessary we can always add it back in the future. Additionally, we now need to hold the NVM lock in i40e_validate_checksum because the call to i40e_calc_nvm_checksum now assumes that the NVM lock is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD up a bit so that we do not need to acquire the NVM lock twice. This should resolve firmware updates and also fix potential raise that could have caused the driver to report an invalid NVM checksum upon driver load. Reported-by: Stefan Assmann <sassmann@kpanic.de> Fixes: 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02) Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handlerChuck Lever
Adopt the use of xprt_pin_rqst to eliminate contention between Call-side users of rb_lock and the use of rb_lock in rpcrdma_reply_handler. This replaces the mechanism introduced in 431af645cf66 ("xprtrdma: Fix client lock-up after application signal fires"). Use recv_lock to quickly find the completing rqst, pin it, then drop the lock. At that point invalidation and pull-up of the Reply XDR can be done. Both are often expensive operations. Finally, take recv_lock again to signal completion to the RPC layer. It also protects adjustment of "cwnd". This greatly reduces the amount of time a lock is held by the reply handler. Comparing lock_stat results shows a marked decrease in contention on rb_lock and recv_lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [trond.myklebust@primarydata.com: Remove call to rpcrdma_buffer_put() from the "out_norqst:" path in rpcrdma_reply_handler.] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-05Merge branch 'xgene-Misc-bug-fixes'David S. Miller
Iyappan Subramanian says: ==================== drivers: net: xgene: Misc bug fixes This patch set fixes bugs related to handling the case for ACPI for, reading and programming tx/rx delay values. ==================== Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Remove return statement from void functionIyappan Subramanian
commit 183db4 ("drivers: net: xgene: Correct probe sequence handling") changed the return type of xgene_enet_check_phy_handle() to void. This patch, removes the return statement from the last line. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Configure tx/rx delay for ACPIQuan Nguyen
This patch fixes configuring tx/rx delay values for ACPI. Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Read tx/rx delay for ACPIIyappan Subramanian
This patch fixes reading tx/rx delay values for ACPI. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rocker: fix kcalloc parameter orderZahari Doychev
The function calls to kcalloc use wrong parameter order and incorrect flags values. GFP_KERNEL is used instead of flags now and the order is corrected. The change was done using the following coccinelle script: @@ expression E1,E2; type T; @@ -kcalloc(E1, E2, sizeof(T)) +kcalloc(E2, sizeof(T), GFP_KERNEL) Signed-off-by: Zahari Doychev <zahari.doychev@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rds: Fix non-atomic operation on shared flag variableHåkon Bugge
The bits in m_flags in struct rds_message are used for a plurality of reasons, and from different contexts. To avoid any missing updates to m_flags, use the atomic set_bit() instead of the non-atomic equivalent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: sched: don't use GFP_KERNEL under spin lockJakub Kicinski
The new TC IDR code uses GFP_KERNEL under spin lock. Which leads to: [ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416 [ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc [ 582.636939] 2 locks held by tc/3379: [ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400 [ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.662217] Preemption disabled at: [ 582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287 [ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 582.691937] Call Trace: ... [ 582.742460] kmem_cache_alloc+0x286/0x540 [ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450 [ 582.753209] idr_get_free_cmn+0x627/0xf80 ... [ 582.815525] idr_alloc_cmn+0x1a8/0x270 ... [ 582.833804] tcf_idr_create+0x31b/0x8e0 ... Try to preallocate the memory with idr_prealloc(GFP_KERNEL) (as suggested by Eric Dumazet), and change the allocation flags under spin lock. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05vhost_net: correctly check tx avail during rx busy pollingJason Wang
We check tx avail through vhost_enable_notify() in the past which is wrong since it only checks whether or not guest has filled more available buffer since last avail idx synchronization which was just done by vhost_vq_avail_empty() before. What we really want is checking pending buffers in the avail ring. Fix this by calling vhost_vq_avail_empty() instead. This issue could be noticed by doing netperf TCP_RR benchmark as client from guest (but not host). With this fix, TCP_RR from guest to localhost restores from 1375.91 trans per sec to 55235.28 trans per sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz). Fixes: 030881372460 ("vhost_net: basic polling support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: mdio-mux: add mdio_mux parameter to mdio_mux_init()Corentin Labbe
mdio_mux_init() use the parameter dev for two distinct thing: 1) Have a device for all devm_ functions 2) Get device_node from it Since it is two distinct purpose, this patch add a parameter mdio_mux that is linked to task 2. This will also permit to register an of_node mdio-mux that lacks a direct owning device. For example a mdio-mux which is a subnode of a real device. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rxrpc: Make service connection lookup always check for retryDavid Howells
When an RxRPC service packet comes in, the target connection is looked up by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry check is, however, currently skipped if we got a match, but probably shouldn't be in case the connection we found gets replaced whilst we're doing a search. Make the lookup procedure always go through need_seqretry(), even if the lookup was successful. This makes sure we always pick up on a write-lock event. On the other hand, since we don't take a ref on the object, but rely on RCU to prevent its destruction after dropping the seqlock, I'm not sure this is necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: stmmac: Delete dead code for MDIO registrationRomain Perier
This code is no longer used, the logging function was changed by commit fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register"). It was previously showing information about the type of the IRQ, if it's polled, ignored or a normal interrupt. As we don't want information loss, I have moved this code to phy_attached_print(). Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register") Signed-off-by: Romain Perier <romain.perier@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05gianfar: Fix Tx flow control deactivationClaudiu Manoil
The wrong register is checked for the Tx flow control bit, it should have been maccfg1 not maccfg2. This went unnoticed for so long probably because the impact is hardly visible, not to mention the tangled code from adjust_link(). First, link flow control (i.e. handling of Rx/Tx link level pause frames) is disabled by default (needs to be enabled via 'ethtool -A'). Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few old boards), which results in Tx flow control remaining always on once activated. Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6Ganesh Goudar
MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence ignore this interrupt for T6. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Fix pause frame count in t4_get_port_statsGanesh Goudar
MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx] only control whether or not Pause Frames will be counted as part of the 64-Byte Tx/Rx Frame counters. These bits do not control whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes counters. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: fix memory leakGanesh Goudar
do not reuse the loop counter which is used iterate over the ports, so that sched_tbl will be freed for all the ports. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05tun: rename generic_xdp to skb_xdpJason Wang
Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the generic XDP which will be done at netif_receive_skb(). Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>