summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-06Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Scalability improvements when allocating inodes, and some miscellaneous bug fixes and cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid Y2038 overflow in recently_deleted() ext4: fix fault handling when mounted with -o dax,ro ext4: fix quota inconsistency during orphan cleanup for read-only mounts ext4: fix incorrect quotaoff if the quota feature is enabled ext4: remove useless test and assignment in strtohash functions ext4: backward compatibility support for Lustre ea_inode implementation ext4: remove timebomb in ext4_decode_extra_time() ext4: use sizeof(*ptr) ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets ext4: reduce lock contention in __ext4_new_inode ext4: cleanup goto next group ext4: do not unnecessarily allocate buffer in recently_deleted()
2017-09-06Merge tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS updates from Darrick Wong: "Here are the changes for xfs for 4.14. Most of these are cleanups and fixes for bad behavior, as we're mostly focusing on improving reliablity this cycle (read: there's potentially a lot of stuff on the horizon for 4.15 so better to spend a few weeks killing other bugs now). Summary: - Write unmount record for a ro mount to avoid unnecessary log replay - Clean up orphaned inodes when mounting fs readonly - Resubmit inode log items when buffer writeback fails to avoid umount hang - Fix log recovery corruption problems when log headers wrap around the end - Avoid infinite loop searching for free inodes when inode counters are wrong - Evict inodes involved with log redo so that we don't leak them later - Fix a potential race between reclaim and inode cluster freeing - Refactor the inode joining code w.r.t. transaction rolling & deferred ops - Fix a bug where the log doesn't properly deal with dirty buffers that are about to become ordered buffers - Fix the extent swap code to deal with making dirty buffers ordered properly - Consolidate page fault handlers - Refactor the incore extent manipulation functions to use the iext abstractions instead of directly modifying with extent data - Disable crashy chattr +/-x until we fix it - Don't allow us to set S_DAX for v2 inodes - Various cleanups - Clarify some documentation - Fix a problem where fsync and a log commit race to send the disk a flush command, resulting in a small window where power fail data loss could occur - Simplify some rmap operations in the fcollapse code - Fix some use-after-free problems in async writeback" * tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits) xfs: use kmem_free to free return value of kmem_zalloc xfs: open code end_buffer_async_write in xfs_finish_page_writeback xfs: don't set v3 xflags for v2 inodes xfs: fix compiler warnings fsmap: fix documentation of FMR_OF_LAST xfs: simplify the rmap code in xfs_bmse_merge xfs: remove unused flags arg from xfs_file_iomap_begin_delay xfs: fix incorrect log_flushed on fsync xfs: disable per-inode DAX flag xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents xfs: move some code around inside xfs_bmap_shift_extents xfs: use xfs_iext_get_extent in xfs_bmap_first_unused xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert xfs: add a xfs_iext_update_extent helper xfs: consolidate the various page fault handlers iomap: return VM_FAULT_* codes from iomap_page_mkwrite xfs: relog dirty buffers during swapext bmbt owner change ...
2017-09-06Merge tag 'gfs2-4.14.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got a whopping 29 GFS2 patches for this merge window, mainly because we held some back from the previous merge window until we could get them perfected and well tested. We have a couple patch sets, including my patch set for protecting glock gl_object and Andreas Gruenbacher's patch set to fix the long-standing shrink- slab hang, plus a bunch of assorted bugs and cleanups. Summary: - I fixed a bug whereby an IO error would lead to a double-brelse. - Andreas Gruenbacher made a minor cleanup to call his relatively new function, gfs2_holder_initialized, rather than doing it manually. This was just missed by a previous patch set. - Jan Kara fixed a bug whereby the SGID was being cleared when inheriting ACLs. - Andreas found a bug and fixed it in his previous patch, "Get rid of flush_delayed_work in gfs2_evict_inode". A call to flush_delayed_work was deleted from *gfs2_inode_lookup and added to gfs2_create_inode. - Wang Xibo found and fixed a list_add call in inode_go_lock that specified the parameters in the wrong order. - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's metadata reads that were accidentally missing them. - I submitted a 4-patch set to protect the glock gl_object field. GFS2 was setting and checking gl_object with no locking mechanism, so the value was occasionally stomped on, which caused file system corruption. - I submitted a small cleanup to function gfs2_clear_rgrpd. It was needlessly adding rgrp glocks to the lru list, then pulling them back off immediately. The rgrp glocks don't use the lru list anyway, so doing so was just a waste of time. - I submitted a patch that checks the GLOF_LRU flag on a glock before trying to remove it from the lru_list. This avoids a lot of unnecessary spin_lock contention. - I submitted a patch to delete GFS2's debugfs files only after we evict all the glocks. Before this patch, GFS2 would delete the debugfs files, and if unmount hung waiting for a glock, there was no way to debug the problem. Now, if a hang occurs during umount, we can examine the debugfs files to figure out why it's hung. - Andreas Gruenbacher submitted a patch to fix some trivial typos. - Andreas also submitted a five-part patch set to fix the longstanding hang involving the slab shrinker: dlm requires memory, calls the inode shrinker, which calls gfs2's evict, which calls back into DLM before it can evict an inode. - Abhi Das submitted a patch to forcibly flush the active items list to relieve memory pressure. This fixes a long-standing bug whereby GFS2 was getting hung permanently in balance_dirty_pages. - Thomas Tai submitted a patch to fix a slab corruption problem due to a residual pointer left in the lock_dlm lockstruct. - I submitted a patch to withdraw the file system if IO errors are encountered while writing to the journals or statfs system file which were previously not being sent back up. Before, some IO errors were sometimes not be detected for several hours, and at recovery time, the journal errors made journal replay impossible. - Andreas has a patch to fix an annoying format-truncation compiler warning so GFS2 compiles cleanly. - I have a patch that fixes a handful of sparse compiler warnings. - Andreas fixed up an useless gl_object warning caused by an earlier patch. - Arvind Yadav added a patch to properly constify our rhashtable params declare. - I added a patch to fix a regression caused by the non-recursive delete and truncate patch that caused file system blocks to not be properly freed. - Ernesto A. Fernández added a patch to fix a place where GFS2 would send back the wrong return code setting extended attributes. - Ernesto also added a patch to fix a case in which GFS2 was improperly setting an inode's i_mode, potentially granting access to the wrong users" * tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits) gfs2: preserve i_mode if __gfs2_set_acl() fails gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing GFS2: Fix non-recursive truncate bug gfs2: constify rhashtable_params GFS2: Fix gl_object warnings GFS2: Fix up some sparse warnings gfs2: Silence gcc format-truncation warning GFS2: Withdraw for IO errors writing to the journal or statfs gfs2: fix slab corruption during mounting and umounting gfs file system gfs2: forcibly flush ail to relieve memory pressure gfs2: Clean up waiting on glocks gfs2: Defer deleting inodes under memory pressure gfs2: gfs2_evict_inode: Put glocks asynchronously gfs2: Get rid of gfs2_set_nlink gfs2: gfs2_glock_get: Wait on freeing glocks gfs2: Fix trivial typos GFS2: Delete debugfs files only after we evict the glocks GFS2: Don't waste time locking lru_lock for non-lru glocks GFS2: Don't bother trying to add rgrps to the lru list GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode ...
2017-09-06drm/i915: Re-enable GTT following a device resetChris Wilson
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a reset. That was causing the display to show garbage on his 945gm. On my i915gm the effect was far more severe; re-enabling the display following the reset without PGETBL_CTL being enabled lead to an immediate hard hang. We do have a routine to re-enable PGETBL_CTL which is applicable to gen2-4, although on gen4 it is documented that a graphics reset doesn't alter the register (no such wording is given for gen3) and should be safe to call to punch back in the enable bit. However, that leaves the question of whether we need to completely re-initialise the register and the rest of the GSM. For g33/pnv/gen4+, where we do have a configurable page table, its contents do seem to be kept, and so we should be able to recover without having to reinitialise the GTT from scratch (as prior to g33, that register is configured by the BIOS and we leave alone except for the enable bit). This appears to have been broken by commit 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms"), which moved the intel_enable_gtt() from i915_gem_init_hw() (also used by reset) to add it earlier during hw init and resume, missing the reset path. v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to match init/resume Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit 0db8c961209153498fe7e279b8f0d3deb81808f0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06drm/i915: Annotate user relocs with __userVille Syrjälä
Add the missing __user to the urelocs cast to fix the following sparse warning: i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces) i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from i915_gem_execbuffer.c:1541:62: got char * Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc (cherry picked from commit 908a610557f4d8b46a0f82c01e31b30f5c998580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06loop: set physical block size to logical block sizeOmar Sandoval
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical block size still makes the most sense semantically, but let's just lie and always set it to the same value as the logical block size (same goes for io_min). In the future we might want to at least bump up io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play it safe. 1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2 Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06Merge branch 'topic/dmatest' into for-linusVinod Koul
2017-09-06Merge branch 'topic/qcom' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ppc4xx' into for-linusVinod Koul
2017-09-06Merge branch 'topic/of' into for-linusVinod Koul
2017-09-06Merge branch 'topic/k3dma' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ioat' into for-linusVinod Koul
2017-09-06Merge branch 'topic/bcm' into for-linusVinod Koul
2017-09-06Merge branch 'topic/altera' into for-linusVinod Koul
2017-09-06libata: zpodd: make arrays cdb static, reduces object code sizeColin Ian King
Don't populate the arrays cdb on the stack, instead make them static. Makes the object code smaller by 230 bytes: Before: text data bss dec hex filename 3797 240 0 4037 fc5 drivers/ata/libata-zpodd.o After: text data bss dec hex filename 3407 400 0 3807 edf drivers/ata/libata-zpodd.o Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-09-06ahci: don't use MSI for devices with the silly Intel NVMe remapping schemeChristoph Hellwig
Intel AHCI controllers that also hide NVMe devices in their bar can't use MSI interrupts, so disable them. Reported-by: John Loy <john.robert.loy@gmail.com> Tested-by: John Loy <john.robert.loy@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: d684a90d38e2 ("ahci: per-port msix support") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Tejun Heo <tj@kernel.org>
2017-09-06bcache: fix bch_hprint crash and improve outputMichael Lyle
Most importantly, solve a crash where %llu was used to format signed numbers. This would cause a buffer overflow when reading sysfs writeback_rate_debug, as only 20 bytes were allocated for this and %llu writes 20 characters plus a null. Always use the units mechanism rather than having different output paths for simplicity. Also, correct problems with display output where 1.10 was a larger number than 1.09, by multiplying by 10 and then dividing by 1024 instead of dividing by 100. (Remainders of >= 1000 would print as .10). Minor changes: Always display the decimal point instead of trying to omit it based on number of digits shown. Decide what units to use based on 1000 as a threshold, not 1024 (in other words, always print at most 3 digits before the decimal point). Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Update continue_at() documentationDan Carpenter
continue_at() doesn't have a return statement anymore. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: silence static checker warningDan Carpenter
In olden times, closure_return() used to have a hidden return built in. We removed the hidden return but forgot to add a new return here. If "c" were NULL we would oops on the next line, but fortunately "c" is never NULL. Let's just remove the if statement. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix for gc and write-back raceTang Junhui
gc and write-back get raced (see the email "bcache get stucked" I sended before): gc thread write-back thread | |bch_writeback_thread() |bch_gc_thread() | | |==>read_dirty() |==>bch_btree_gc() | |==>btree_root() //get btree root | | //node write locker | |==>bch_btree_gc_root() | | |==>read_dirty_submit() | |==>write_dirty() | |==>continue_at(cl, | | write_dirty_finish, | | system_wq); | |==>write_dirty_finish()//excute | | //in system_wq | |==>bch_btree_insert() | |==>bch_btree_map_leaf_nodes() | |==>__bch_btree_map_nodes() | |==>btree_root //try to get btree | | //root node read | | //lock | |-----stuck here |==>bch_btree_set_root() |==>bch_journal_meta() |==>bch_journal() |==>journal_try_write() |==>journal_write_unlocked() //journal_full(&c->journal) | //condition satisfied |==>continue_at(cl, journal_write, system_wq); //try to excute | //journal_write in system_wq | //but work queue is excuting | //write_dirty_finish() |==>closure_sync(); //wait journal_write execute | //over and wake up gc, |-------------stuck here |==>release root node write locker This patch alloc a separate work-queue for write-back thread to avoid such race. (Commit log re-organized by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: increase the number of open bucketsTang Junhui
In currently, we only alloc 6 open buckets for each cache set, but in usually, we always attach about 10 or so backend devices for each cache set, and the each bcache device are always accessed by about 10 or so threads in top application layer. So 6 open buckets are too few, It has led to that each of the same thread write data to different buckets, which would cause low efficiency write-back, and also cause buckets inefficient, and would be Very easy to run out of. I add debug message in bch_open_buckets_alloc() to print alloc bucket info, and test with ten bcache devices with a cache set, and each bcache device is accessed by ten threads. From the debug message, we can see that, after the modification, One bucket is more likely to assign to the same thread, and the data from the same thread are more likely to write the same bucket. Usually the same thread always write/read the same backend device, so it is good for write-back and also promote the usage efficiency of buckets. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Correct return value for sysfs attach errorsTony Asleson
If you encounter any errors in bch_cached_dev_attach it will return a negative error code. The variable 'v' which stores the result is unsigned, thus user space sees a very large value returned for bytes written which can cause incorrect user space behavior. Utilize 1 signed variable to use throughout the function to preserve error return capability. Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: correct cache_dirty_target in __update_writeback_rate()Tang Junhui
__update_write_rate() uses a Proportion-Differentiation Controller algorithm to control writeback rate. A dirty target number is used in this PD controller to control writeback rate. A larger target number will make the writeback rate smaller, on the versus, a smaller target number will make the writeback rate larger. bcache uses the following steps to calculate the target number, 1) cache_sectors = all-buckets-of-cache-set * buckets-size 2) cache_dirty_target = cache_sectors * cached-device-writeback_percent 3) target = cache_dirty_target * (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set) The calculation at step 1) for cache_sectors is incorrect, which does not consider dirty blocks occupied by flash only volume. A flash only volume can be took as a bcache device without cached device. All data sectors allocated for it are persistent on cache device and marked dirty, they are not touched by bcache writeback and garbage collection code. So data blocks of flash only volume should be ignore when calculating cache_sectors of cache set. Current code does not subtract dirty sectors of flash only volume, which results a larger target number from the above 3 steps. And in sequence the cache device's writeback rate is smaller then a correct value, writeback speed is slower on all cached devices. This patch fixes the incorrect slower writeback rate by subtracting dirty sectors of flash only volumes in __update_writeback_rate(). (Commit log composed by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: gc does not work when triggering by manual commandTang Junhui
I try to execute the following command to trigger gc thread: [root@localhost internal]# echo 1 > trigger_gc But it does not work, I debug the code in gc_should_run(), It works only if in invalidating or sectors_to_gc < 0. So set sectors_to_gc to -1 to meet the condition when we trigger gc by manual command. (Code comments aded by Coly Li) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Don't reinvent the wheel but use existing llist APIByungchul Park
Although llist provides proper APIs, they are not used. Make them used. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: do not subtract sectors_to_gc for bypassed IOTang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix sequential large write IO bypassTang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache mode, these IOs were expected to be bypassed, but actually they did not. We debug the code, and find in check_should_bypass(): if (!congested && mode == CACHE_MODE_WRITEBACK && op_is_write(bio_op(bio)) && (bio->bi_opf & REQ_SYNC)) goto rescale that means, If in writeback mode, a write IO with REQ_SYNC flag will not be bypassed though it is a sequential large IO, It's not a correct thing to do actually, so this patch remove these codes. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Fix leak of bdev referenceJan Kara
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the block device using lookup_bdev() to detect which situation we are in to properly report error. However we never drop the reference returned to us from lookup_bdev(). Fix that. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06mac80211: fix deadlock in driver-managed RX BA session startJohannes Berg
When an RX BA session is started by the driver, and it has to tell mac80211 about it, the corresponding bit in tid_rx_manage_offl gets set and the BA session work is scheduled. Upon testing this bit, it will call __ieee80211_start_rx_ba_session(), thus deadlocking as it already holds the ampdu_mlme.mtx, which that acquires again. Fix this by adding ___ieee80211_start_rx_ba_session(), a version of the function that requires the mutex already held. Cc: stable@vger.kernel.org Fixes: 699cb58c8a52 ("mac80211: manage RX BA session offload without SKB queue") Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06mac80211: Complete ampdu work schedule during session tear downIlan peer
Commit 7a7c0a6438b8 ("mac80211: fix TX aggregation start/stop callback race") added a cancellation of the ampdu work after the loop that stopped the Tx and Rx BA sessions. However, in some cases, e.g., during HW reconfig, the low level driver might call mac80211 APIs to complete the stopping of the BA sessions, which would queue the ampdu work to handle the actual completion. This work needs to be performed as otherwise mac80211 data structures would not be properly synced. Fix this by checking if BA session STOP_CB bit is set after the BA session cancellation and properly clean the session. Signed-off-by: Ilan Peer <ilan.peer@intel.com> [Johannes: the work isn't flushed because that could do other things we don't want, and the locking situation isn't clear] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06cfg80211: honor NL80211_RRF_NO_HT40{MINUS,PLUS}Emmanuel Grumbach
Honor the NL80211_RRF_NO_HT40{MINUS,PLUS} flags in reg_process_ht_flags_channel. Not doing so leads can lead to a firmware assert in iwlwifi for example. Fixes: b0d7aa59592b ("cfg80211: allow wiphy specific regdomain management") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-05Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2017-09-05 This series contains fixes for i40e only. These two patches fix an issue where our nvmupdate tool does not work on RHEL 7.4 and newer kernels, in fact, the use of the nvmupdate tool on newer kernels can cause the cards to be non-functional unless these patches are applied. Anjali reworks the locking around accessing the NVM so that NVM acquire timeouts do not occur which was causing the failed firmware updates. Jake correctly updates the wb_desc when reading the NVM through the AdminQ. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-09-05selftests: Enhance kselftest_harness.h to print which assert failedMickaël Salaün
When a test process is not able to write to TH_LOG_STREAM, this step mechanism enable to print the assert number which triggered the failure. This can be enabled by setting _metadata->no_print to true at the beginning of the test sequence. Update the seccomp-bpf test to return 0 if a test succeeded. This feature is needed for the Landlock tests. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Link: https://lkml.kernel.org/r/CAGXu5j+D-FP8Kt9unNOqKrQJP4DYTpmgkJxWykZyrYiVPz3Y3Q@mail.gmail.com Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-05i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aqJacob Keller
When introducing the functions to read the NVM through the AdminQ, we did not correctly mark the wb_desc. Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722", 2015-06-05) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05i40e: avoid NVM acquire deadlock during NVM updateAnjali Singhai Jain
X722 devices use the AdminQ to access the NVM, and this requires taking the AdminQ lock. Because of this, we lock the AdminQ during i40e_read_nvm(), which is also called in places where the lock is already held, such as the firmware update path which wants to lock once and then unlock when finished after performing several tasks. Although this should have only affected X722 devices, commit 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02) added locking for all NVM reads, regardless of device family. This resulted in us accidentally causing NVM acquire timeouts on all devices, causing failed firmware updates which left the eeprom in a corrupt state. Create unsafe non-locked variants of i40e_read_nvm_word and i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer respectively. These variants will not take the NVM lock and are expected to only be called in places where the NVM lock is already held if needed. Since the only caller of i40e_read_nvm_buffer() was in such a path, remove it entirely in favor of the unsafe version. If necessary we can always add it back in the future. Additionally, we now need to hold the NVM lock in i40e_validate_checksum because the call to i40e_calc_nvm_checksum now assumes that the NVM lock is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD up a bit so that we do not need to acquire the NVM lock twice. This should resolve firmware updates and also fix potential raise that could have caused the driver to report an invalid NVM checksum upon driver load. Reported-by: Stefan Assmann <sassmann@kpanic.de> Fixes: 96a39aed25e6 ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02) Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-05Merge branch 'xgene-Misc-bug-fixes'David S. Miller
Iyappan Subramanian says: ==================== drivers: net: xgene: Misc bug fixes This patch set fixes bugs related to handling the case for ACPI for, reading and programming tx/rx delay values. ==================== Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Remove return statement from void functionIyappan Subramanian
commit 183db4 ("drivers: net: xgene: Correct probe sequence handling") changed the return type of xgene_enet_check_phy_handle() to void. This patch, removes the return statement from the last line. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Configure tx/rx delay for ACPIQuan Nguyen
This patch fixes configuring tx/rx delay values for ACPI. Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05drivers: net: xgene: Read tx/rx delay for ACPIIyappan Subramanian
This patch fixes reading tx/rx delay values for ACPI. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rocker: fix kcalloc parameter orderZahari Doychev
The function calls to kcalloc use wrong parameter order and incorrect flags values. GFP_KERNEL is used instead of flags now and the order is corrected. The change was done using the following coccinelle script: @@ expression E1,E2; type T; @@ -kcalloc(E1, E2, sizeof(T)) +kcalloc(E2, sizeof(T), GFP_KERNEL) Signed-off-by: Zahari Doychev <zahari.doychev@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rds: Fix non-atomic operation on shared flag variableHåkon Bugge
The bits in m_flags in struct rds_message are used for a plurality of reasons, and from different contexts. To avoid any missing updates to m_flags, use the atomic set_bit() instead of the non-atomic equivalent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: sched: don't use GFP_KERNEL under spin lockJakub Kicinski
The new TC IDR code uses GFP_KERNEL under spin lock. Which leads to: [ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416 [ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc [ 582.636939] 2 locks held by tc/3379: [ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400 [ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.662217] Preemption disabled at: [ 582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287 [ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 582.691937] Call Trace: ... [ 582.742460] kmem_cache_alloc+0x286/0x540 [ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450 [ 582.753209] idr_get_free_cmn+0x627/0xf80 ... [ 582.815525] idr_alloc_cmn+0x1a8/0x270 ... [ 582.833804] tcf_idr_create+0x31b/0x8e0 ... Try to preallocate the memory with idr_prealloc(GFP_KERNEL) (as suggested by Eric Dumazet), and change the allocation flags under spin lock. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05vhost_net: correctly check tx avail during rx busy pollingJason Wang
We check tx avail through vhost_enable_notify() in the past which is wrong since it only checks whether or not guest has filled more available buffer since last avail idx synchronization which was just done by vhost_vq_avail_empty() before. What we really want is checking pending buffers in the avail ring. Fix this by calling vhost_vq_avail_empty() instead. This issue could be noticed by doing netperf TCP_RR benchmark as client from guest (but not host). With this fix, TCP_RR from guest to localhost restores from 1375.91 trans per sec to 55235.28 trans per sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz). Fixes: 030881372460 ("vhost_net: basic polling support") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: mdio-mux: add mdio_mux parameter to mdio_mux_init()Corentin Labbe
mdio_mux_init() use the parameter dev for two distinct thing: 1) Have a device for all devm_ functions 2) Get device_node from it Since it is two distinct purpose, this patch add a parameter mdio_mux that is linked to task 2. This will also permit to register an of_node mdio-mux that lacks a direct owning device. For example a mdio-mux which is a subnode of a real device. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rxrpc: Make service connection lookup always check for retryDavid Howells
When an RxRPC service packet comes in, the target connection is looked up by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry check is, however, currently skipped if we got a match, but probably shouldn't be in case the connection we found gets replaced whilst we're doing a search. Make the lookup procedure always go through need_seqretry(), even if the lookup was successful. This makes sure we always pick up on a write-lock event. On the other hand, since we don't take a ref on the object, but rely on RCU to prevent its destruction after dropping the seqlock, I'm not sure this is necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: stmmac: Delete dead code for MDIO registrationRomain Perier
This code is no longer used, the logging function was changed by commit fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register"). It was previously showing information about the type of the IRQ, if it's polled, ignored or a normal interrupt. As we don't want information loss, I have moved this code to phy_attached_print(). Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register") Signed-off-by: Romain Perier <romain.perier@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05gianfar: Fix Tx flow control deactivationClaudiu Manoil
The wrong register is checked for the Tx flow control bit, it should have been maccfg1 not maccfg2. This went unnoticed for so long probably because the impact is hardly visible, not to mention the tangled code from adjust_link(). First, link flow control (i.e. handling of Rx/Tx link level pause frames) is disabled by default (needs to be enabled via 'ethtool -A'). Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few old boards), which results in Tx flow control remaining always on once activated. Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6Ganesh Goudar
MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence ignore this interrupt for T6. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05cxgb4: Fix pause frame count in t4_get_port_statsGanesh Goudar
MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx] only control whether or not Pause Frames will be counted as part of the 64-Byte Tx/Rx Frame counters. These bits do not control whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes counters. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>