summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-06ceph: don't fill readdir cache for LSSNAP replyYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: cleanup ceph_readdir_prepopulate()Yan, Zheng
In LSSNAP case, req->r_dentry is already set to snapdir dentry. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: use errseq_t for writeback error reportingJeff Layton
Ensure that when writeback errors are marked that we report those to all file descriptions that were open at the time of the error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: new cap message flags indicate if there is pending capsnapYan, Zheng
These flags tell mds if there is pending capsnap explicitly. Without this explicit notification, mds can only conclude if client has pending capsnap. The method mds use is inefficient and error-prone. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: nuke startsync opYanhu Cao
startsync is a no-op, has been for years. Remove it. Link: http://tracker.ceph.com/issues/20604 Signed-off-by: Yanhu Cao <gmayyyha@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06rbd: silence bogus uninitialized use warning in rbd_acquire_lock()Kefeng Wang
drivers/block/rbd.c: In function 'rbd_acquire_lock': drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] Silence the warning, found it when built old kernel(3.10) with OBS(opensuse build service). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: validate correctness of some mount optionsYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: limit osd write sizeYan, Zheng
OSD has a configurable limitation of max write size. OSD return error if write request size is larger than the limitation. For now, set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small enough. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: limit osd read size to CEPH_MSG_MAX_DATA_LENYan, Zheng
libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN. Link: http://tracker.ceph.com/issues/20528 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: remove unused cap_release_safety mount optionYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06drm/i915: Re-enable GTT following a device resetChris Wilson
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a reset. That was causing the display to show garbage on his 945gm. On my i915gm the effect was far more severe; re-enabling the display following the reset without PGETBL_CTL being enabled lead to an immediate hard hang. We do have a routine to re-enable PGETBL_CTL which is applicable to gen2-4, although on gen4 it is documented that a graphics reset doesn't alter the register (no such wording is given for gen3) and should be safe to call to punch back in the enable bit. However, that leaves the question of whether we need to completely re-initialise the register and the rest of the GSM. For g33/pnv/gen4+, where we do have a configurable page table, its contents do seem to be kept, and so we should be able to recover without having to reinitialise the GTT from scratch (as prior to g33, that register is configured by the BIOS and we leave alone except for the enable bit). This appears to have been broken by commit 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms"), which moved the intel_enable_gtt() from i915_gem_init_hw() (also used by reset) to add it earlier during hw init and resume, missing the reset path. v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to match init/resume Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit 0db8c961209153498fe7e279b8f0d3deb81808f0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06drm/i915: Annotate user relocs with __userVille Syrjälä
Add the missing __user to the urelocs cast to fix the following sparse warning: i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces) i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from i915_gem_execbuffer.c:1541:62: got char * Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc (cherry picked from commit 908a610557f4d8b46a0f82c01e31b30f5c998580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06loop: set physical block size to logical block sizeOmar Sandoval
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical block size still makes the most sense semantically, but let's just lie and always set it to the same value as the logical block size (same goes for io_min). In the future we might want to at least bump up io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play it safe. 1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2 Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06lockd: Delete an error message for a failed memory allocation in reclaimer()Markus Elfring
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: remove jiffies field from access cacheNeilBrown
This field hasn't been used since commit 57b691819ee2 ("NFS: Cache access checks more aggressively"). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: flush data when locking a file to ensure cache coherence for mmap.NeilBrown
When a byte range lock (or flock) is taken out on an NFS file, the validity of the cached data is checked and the inode is marked NFS_INODE_INVALID_DATA. However the cached data isn't flushed from the page cache. This is sufficient for future read() requests or mmap() requests as they call nfs_revalidate_mapping() which performs the flush if necessary. However an existing mapping is not affected. Accessing data through that mapping will continue to return old data even though the inode is marked NFS_INODE_INVALID_DATA. This can easily be confirmed using the 'nfs' tool in git://github.com/okirch/twopence-nfs.git and running nfs coherence FILENAME on one client, and nfs coherence -r FILENAME on another client. It appears that prior to Linux 2.6.0 this worked correctly. However commit: http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0 removed the call to inode_invalidate_pages() from nfs_zap_caches(). I haven't tested this code, but inspection suggests that prior to this commit, file locking would invalidate all inode pages. This patch adds a call to nfs_revalidate_mapping() after a successful SETLK so that invalid data is flushed. With this patch the above test passes. To minimize impact (and possibly avoid a GETATTR call) this only happens if the mapping might be mapped into userspace. Cc: Olaf Kirch <okir@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06SUNRPC: remove some dead code.NeilBrown
RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo is set, which happens when RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set, which happens when NFS_CS_NO_RETRANS_TIMEOUT is set. This flag means "don't resend on a timeout, only resend if the connection gets broken for some reason". cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which happens when NFS_CS_DISCRTRY is set. This flag means "always disconnect before resending". NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set in nfs4_init_client(), and it always sets both. So we will never have a situation where only one of the flags is set. So this code, which tests if timeout retransmits are allowed, and disconnection is required, will never run. So it makes sense to remove this code as it cannot be tested and could confuse people reading the code (like me). (alternately we could leave it there with a comment saying it is never actually used). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: don't expect errors from mempool_alloc().NeilBrown
Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header") reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix usage of mempools.") had recently removed. Remove it again. Cc: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06Merge branch 'topic/dmatest' into for-linusVinod Koul
2017-09-06Merge branch 'topic/qcom' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ppc4xx' into for-linusVinod Koul
2017-09-06Merge branch 'topic/of' into for-linusVinod Koul
2017-09-06Merge branch 'topic/k3dma' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ioat' into for-linusVinod Koul
2017-09-06Merge branch 'topic/bcm' into for-linusVinod Koul
2017-09-06Merge branch 'topic/altera' into for-linusVinod Koul
2017-09-06libata: zpodd: make arrays cdb static, reduces object code sizeColin Ian King
Don't populate the arrays cdb on the stack, instead make them static. Makes the object code smaller by 230 bytes: Before: text data bss dec hex filename 3797 240 0 4037 fc5 drivers/ata/libata-zpodd.o After: text data bss dec hex filename 3407 400 0 3807 edf drivers/ata/libata-zpodd.o Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-09-06ahci: don't use MSI for devices with the silly Intel NVMe remapping schemeChristoph Hellwig
Intel AHCI controllers that also hide NVMe devices in their bar can't use MSI interrupts, so disable them. Reported-by: John Loy <john.robert.loy@gmail.com> Tested-by: John Loy <john.robert.loy@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: d684a90d38e2 ("ahci: per-port msix support") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Tejun Heo <tj@kernel.org>
2017-09-06bcache: fix bch_hprint crash and improve outputMichael Lyle
Most importantly, solve a crash where %llu was used to format signed numbers. This would cause a buffer overflow when reading sysfs writeback_rate_debug, as only 20 bytes were allocated for this and %llu writes 20 characters plus a null. Always use the units mechanism rather than having different output paths for simplicity. Also, correct problems with display output where 1.10 was a larger number than 1.09, by multiplying by 10 and then dividing by 1024 instead of dividing by 100. (Remainders of >= 1000 would print as .10). Minor changes: Always display the decimal point instead of trying to omit it based on number of digits shown. Decide what units to use based on 1000 as a threshold, not 1024 (in other words, always print at most 3 digits before the decimal point). Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Update continue_at() documentationDan Carpenter
continue_at() doesn't have a return statement anymore. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: silence static checker warningDan Carpenter
In olden times, closure_return() used to have a hidden return built in. We removed the hidden return but forgot to add a new return here. If "c" were NULL we would oops on the next line, but fortunately "c" is never NULL. Let's just remove the if statement. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix for gc and write-back raceTang Junhui
gc and write-back get raced (see the email "bcache get stucked" I sended before): gc thread write-back thread | |bch_writeback_thread() |bch_gc_thread() | | |==>read_dirty() |==>bch_btree_gc() | |==>btree_root() //get btree root | | //node write locker | |==>bch_btree_gc_root() | | |==>read_dirty_submit() | |==>write_dirty() | |==>continue_at(cl, | | write_dirty_finish, | | system_wq); | |==>write_dirty_finish()//excute | | //in system_wq | |==>bch_btree_insert() | |==>bch_btree_map_leaf_nodes() | |==>__bch_btree_map_nodes() | |==>btree_root //try to get btree | | //root node read | | //lock | |-----stuck here |==>bch_btree_set_root() |==>bch_journal_meta() |==>bch_journal() |==>journal_try_write() |==>journal_write_unlocked() //journal_full(&c->journal) | //condition satisfied |==>continue_at(cl, journal_write, system_wq); //try to excute | //journal_write in system_wq | //but work queue is excuting | //write_dirty_finish() |==>closure_sync(); //wait journal_write execute | //over and wake up gc, |-------------stuck here |==>release root node write locker This patch alloc a separate work-queue for write-back thread to avoid such race. (Commit log re-organized by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: increase the number of open bucketsTang Junhui
In currently, we only alloc 6 open buckets for each cache set, but in usually, we always attach about 10 or so backend devices for each cache set, and the each bcache device are always accessed by about 10 or so threads in top application layer. So 6 open buckets are too few, It has led to that each of the same thread write data to different buckets, which would cause low efficiency write-back, and also cause buckets inefficient, and would be Very easy to run out of. I add debug message in bch_open_buckets_alloc() to print alloc bucket info, and test with ten bcache devices with a cache set, and each bcache device is accessed by ten threads. From the debug message, we can see that, after the modification, One bucket is more likely to assign to the same thread, and the data from the same thread are more likely to write the same bucket. Usually the same thread always write/read the same backend device, so it is good for write-back and also promote the usage efficiency of buckets. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Correct return value for sysfs attach errorsTony Asleson
If you encounter any errors in bch_cached_dev_attach it will return a negative error code. The variable 'v' which stores the result is unsigned, thus user space sees a very large value returned for bytes written which can cause incorrect user space behavior. Utilize 1 signed variable to use throughout the function to preserve error return capability. Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: correct cache_dirty_target in __update_writeback_rate()Tang Junhui
__update_write_rate() uses a Proportion-Differentiation Controller algorithm to control writeback rate. A dirty target number is used in this PD controller to control writeback rate. A larger target number will make the writeback rate smaller, on the versus, a smaller target number will make the writeback rate larger. bcache uses the following steps to calculate the target number, 1) cache_sectors = all-buckets-of-cache-set * buckets-size 2) cache_dirty_target = cache_sectors * cached-device-writeback_percent 3) target = cache_dirty_target * (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set) The calculation at step 1) for cache_sectors is incorrect, which does not consider dirty blocks occupied by flash only volume. A flash only volume can be took as a bcache device without cached device. All data sectors allocated for it are persistent on cache device and marked dirty, they are not touched by bcache writeback and garbage collection code. So data blocks of flash only volume should be ignore when calculating cache_sectors of cache set. Current code does not subtract dirty sectors of flash only volume, which results a larger target number from the above 3 steps. And in sequence the cache device's writeback rate is smaller then a correct value, writeback speed is slower on all cached devices. This patch fixes the incorrect slower writeback rate by subtracting dirty sectors of flash only volumes in __update_writeback_rate(). (Commit log composed by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: gc does not work when triggering by manual commandTang Junhui
I try to execute the following command to trigger gc thread: [root@localhost internal]# echo 1 > trigger_gc But it does not work, I debug the code in gc_should_run(), It works only if in invalidating or sectors_to_gc < 0. So set sectors_to_gc to -1 to meet the condition when we trigger gc by manual command. (Code comments aded by Coly Li) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Don't reinvent the wheel but use existing llist APIByungchul Park
Although llist provides proper APIs, they are not used. Make them used. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: do not subtract sectors_to_gc for bypassed IOTang Junhui
Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: fix sequential large write IO bypassTang Junhui
Sequential write IOs were tested with bs=1M by FIO in writeback cache mode, these IOs were expected to be bypassed, but actually they did not. We debug the code, and find in check_should_bypass(): if (!congested && mode == CACHE_MODE_WRITEBACK && op_is_write(bio_op(bio)) && (bio->bi_opf & REQ_SYNC)) goto rescale that means, If in writeback mode, a write IO with REQ_SYNC flag will not be bypassed though it is a sequential large IO, It's not a correct thing to do actually, so this patch remove these codes. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06bcache: Fix leak of bdev referenceJan Kara
If blkdev_get_by_path() in register_bcache() fails, we try to lookup the block device using lookup_bdev() to detect which situation we are in to properly report error. However we never drop the reference returned to us from lookup_bdev(). Fix that. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06mac80211: fix deadlock in driver-managed RX BA session startJohannes Berg
When an RX BA session is started by the driver, and it has to tell mac80211 about it, the corresponding bit in tid_rx_manage_offl gets set and the BA session work is scheduled. Upon testing this bit, it will call __ieee80211_start_rx_ba_session(), thus deadlocking as it already holds the ampdu_mlme.mtx, which that acquires again. Fix this by adding ___ieee80211_start_rx_ba_session(), a version of the function that requires the mutex already held. Cc: stable@vger.kernel.org Fixes: 699cb58c8a52 ("mac80211: manage RX BA session offload without SKB queue") Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06mac80211: Complete ampdu work schedule during session tear downIlan peer
Commit 7a7c0a6438b8 ("mac80211: fix TX aggregation start/stop callback race") added a cancellation of the ampdu work after the loop that stopped the Tx and Rx BA sessions. However, in some cases, e.g., during HW reconfig, the low level driver might call mac80211 APIs to complete the stopping of the BA sessions, which would queue the ampdu work to handle the actual completion. This work needs to be performed as otherwise mac80211 data structures would not be properly synced. Fix this by checking if BA session STOP_CB bit is set after the BA session cancellation and properly clean the session. Signed-off-by: Ilan Peer <ilan.peer@intel.com> [Johannes: the work isn't flushed because that could do other things we don't want, and the locking situation isn't clear] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06cfg80211: honor NL80211_RRF_NO_HT40{MINUS,PLUS}Emmanuel Grumbach
Honor the NL80211_RRF_NO_HT40{MINUS,PLUS} flags in reg_process_ht_flags_channel. Not doing so leads can lead to a firmware assert in iwlwifi for example. Fixes: b0d7aa59592b ("cfg80211: allow wiphy specific regdomain management") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-09-06genirq/msi: Fix populating multiple interruptsJohn Keeping
On allocating the interrupts routed via a wire-to-MSI bridge, the allocator iterates over the MSI descriptors to build the hierarchy, but fails to use the descriptor interrupt number, and instead uses the base number, generating the wrong IRQ domain mappings. The fix is to use the MSI descriptor interrupt number when setting up the interrupt instead of the base interrupt for the allocation range. The only saving grace is that although the MSI descriptors are allocated in bulk, the wired interrupts are only allocated one by one (so desc->irq == virq) and the bug went unnoticed so far. Fixes: 2145ac9310b60 ("genirq/msi: Add msi_domain_populate_irqs") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170906103540.373864a2.john@metanate.com
2017-09-06s390/mm: use a single lock for the fields in mm_context_tMartin Schwidefsky
The three locks 'lock', 'pgtable_lock' and 'gmap_lock' in the mm_context_t can be reduced to a single lock. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/mm: fix race on mm->context.flush_mmMartin Schwidefsky
The order in __tlb_flush_mm_lazy is to flush TLB first and then clear the mm->context.flush_mm bit. This can lead to missed flushes as the bit can be set anytime, the order needs to be the other way aronud. But this leads to a different race, __tlb_flush_mm_lazy may be called on two CPUs concurrently. If mm->context.flush_mm is cleared first then another CPU can bypass __tlb_flush_mm_lazy although the first CPU has not done the flush yet. In a virtualized environment the time until the flush is finally completed can be arbitrarily long. Add a spinlock to serialize __tlb_flush_mm_lazy and use the function in finish_arch_post_lock_switch as well. Cc: <stable@vger.kernel.org> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/mm: fix local TLB flushing vs. detach of an mm address spaceMartin Schwidefsky
The local TLB flushing code keeps an additional mask in the mm.context, the cpu_attach_mask. At the time a global flush of an address space is done the cpu_attach_mask is copied to the mm_cpumask in order to avoid future global flushes in case the mm is used by a single CPU only after the flush. Trouble is that the reset of the mm_cpumask is racy against the detach of an mm address space by switch_mm. The current order is first the global TLB flush and then the copy of the cpu_attach_mask to the mm_cpumask. The order needs to be the other way around. Cc: <stable@vger.kernel.org> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize AP queue interrupt controlHarald Freudenberger
KVM has a need to control the interrupts on real and virtualized AP queue devices. This fix provides a new function to control the interrupt facilities of an AP queue device. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize AP config info queryHarald Freudenberger
KVM has a need to fetch the crypto configuration information as it is returned by the PQAP(QCI) instruction. This patch introduces a new API ap_query_configuration() which provides this info in a handy way for the caller. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-06s390/zcrypt: externalize test AP queueTony Krowiak
Under certain specified conditions, the Test AP Queue (TAPQ) subfunction of the Process Adjunct Processor Queue (PQAP) instruction will be intercepted by a guest VM. The guest VM must have a means for executing the intercepted instruction. The vfio_ap driver will provide an interface to execute the PQAP(TAPQ) instruction subfunction on behalf of a guest VM. The code for executing the AP instructions currently resides in the AP bus. This patch refactors the AP bus code to externalize access to the PQAP(TAPQ) instruction subfunction to make it available to the vfio_ap driver. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>