summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-29lightnvm: pblk: delete writer kick timer before stopping threadHans Holmberg
Unless we delete the timer that wakes up the write thread before we stop the thread we risk re-starting the thread, so delete the timer first. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: pblk: add padding distribution sysfs attributeHans Holmberg
When pblk receives a sync, all data up to that point in the write buffer must be comitted to persistent storage, and as flash memory comes with a minimal write size there is a significant cost involved both in terms of time for completing the sync and in terms of write amplification padded sectors for filling up to the minimal write size. In order to get a better understanding of the costs involved for syncs, Add a sysfs attribute to pblk: padded_dist, showing a normalized distribution of sectors padded. In order to facilitate measurements of specific workloads during the lifetime of the pblk instance, the distribution can be reset by writing 0 to the attribute. Do this by introducing counters for each possible padding: {0..(minimal write size - 1)} and calculate the normalized distribution when showing the attribute. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Rearranged total_buckets statement in pblk_sysfs_get_padding_dist Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: remove multiple groups in 1.2 data structureMatias Bjørling
Only one id group from the 1.2 specification is supported. Make sure that only the first group is accessible. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: remove mlc pairs structureMatias Bjørling
The known implementations of the 1.2 specification, and upcoming 2.0 implementation all expose a sequential list of pages to write. Remove the data structure, as it is no longer needed. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: pblk: export write amplification counters to sysfsHans Holmberg
In a SSD, write amplification, WA, is defined as the average number of page writes per user page write. Write amplification negatively affects write performance and decreases the lifetime of the disk, so it's a useful metric to add to sysfs. In plkb's case, the number of writes per user sector is the sum of: (1) number of user writes (2) number of sectors written by the garbage collector (3) number of sectors padded (i.e. due to syncs) This patch adds persistent counters for 1-3 and two sysfs attributes to export these along with WA calculated with five decimals: write_amp_mileage: the accumulated write amplification stats for the lifetime of the pblk instance write_amp_trip: resetable stats to facilitate delta measurements, values reset at creation and if 0 is written to the attribute. 64-bit counters are used as a 32 bit counter would wrap around already after about 17 TB worth of user data. It will take a long long time before the 64 bit sector counters wrap around. The counters are stored after the bad block bitmap in the first emeta sector of each written line. There is plenty of space in the first emeta sector, so we don't need to bump the major version of the line data format. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: pblk: check data lines version on recoveryHans Holmberg
As a preparation for future bumps of data line persistent storage versions, we need to start checking the emeta line version during recovery. Also slit up the current emeta/smeta version into two bytes (major,minor). Recovering lines with the same major number as the current pblk data line version must succeed. This means that any changes in the persistent format must be: (1) Backward compatible: if we switch back to and older kernel, recovery of lines stored with major == current_major and minor > current_minor must succeed. (2) Forward compatible: switching to a newer kernel, recovery of lines stored with major=current_major and minor < minor must handle the data format differences gracefully(i.e. initialize new data structures to default values). If we detect lines that have a different major number than the current we must abort recovery. The user must manually migrate the data in this case. Previously the version stored in the emeta header was copied from smeta, which has version 1, so we need to set the minor version to 1. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: pblk: handle bad sectors in the emeta area correctlyHans Holmberg
Unless we check if there are bad sectors in the entire emeta-area we risk ending up with valid bitmap / available sector count inconsistency. This results in lines with a bad chunk at the last LUN marked as bad, so go through the whole emeta area and mark up the invalid sectors. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm: remove chnl_offset in nvme_nvm_identityMatias Bjørling
The identity structure is initialized to zero in the beginning of the nvme_nvm_identity function. The chnl_offset is separately set to zero. Since both the variable and assignment is never changed, remove them. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-29lightnvm/pblk-gc: Delete an error message for a failed memory allocation in ↵Markus Elfring
pblk_gc_line_prepare_ws() Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-30Merge branch 'bpf-sockmap-ingress'Daniel Borkmann
John Fastabend says: ==================== This series adds the BPF_F_INGRESS flag support to the redirect APIs. Bringing the sockmap API in-line with the cls_bpf redirect APIs. We add it to both variants of sockmap programs, the first patch adds support for tx ulp hooks and the third patch adds support for the recv skb hooks. Patches two and four add tests for the corresponding ingress redirect hooks. Follow on patches can address busy polling support, but next series from me will move the sockmap sample program into selftests. v2: added static to function definition caught by kbuild bot v3: fixed an error branch with missing mem_uncharge in recvmsg op moved receive_queue check outside of RCU region ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap, more BPF_SK_SKB_STREAM_VERDICT testsJohn Fastabend
Add BPF_SK_SKB_STREAM_VERDICT tests for ingress hook. While we do this also bring stream tests in-line with MSG based testing. A map for skb options is added for userland to push options at BPF programs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT:John Fastabend
Add support for the BPF_F_INGRESS flag in skb redirect helper. To do this convert skb into a scatterlist and push into ingress queue. This is the same logic that is used in the sk_msg redirect helper so it should feel familiar. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap, add BPF_F_INGRESS testsJohn Fastabend
Add a set of tests to verify ingress flag in redirect helpers works correctly with various msg sizes. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30bpf: sockmap redirect ingress supportJohn Fastabend
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper. To do this add a scatterlist ring for receiving socks to check before calling into regular recvmsg call path. Additionally, because the poll wakeup logic only checked the skb recv queue we need to add a hook in TCP stack (similar to write side) so that we have a way to wake up polling socks when a scatterlist is redirected to that sock. After this all that is needed is for the redirect helper to push the scatterlist into the psock receive queue. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29iw_cxgb4: print mapped ports correctlyBharat Potnuri
c4iw_ep_common structure holds the mapped addresses, so while printing them, use appropriate pointers. Fixes: bab572f1d ("iw_cxgb4: Guard against null cm_id in dump_ep/qp") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29Documentation/process: update FUSE project websiteMartin Kepplinger
According to the old project site, https://sourceforge.net/projects/fuse/ the project has moved to https://github.com/libfuse/ so we update the link to point to the latest libfuse release. Signed-off-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-03-29docs: kernel-doc: fix parsing of arraysMauro Carvalho Chehab
The logic with parses array has a bug that prevents it to parse arrays like: struct { ... struct { u64 msdu[IEEE80211_NUM_TIDS + 1]; ... ... Fix the parser to accept it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-03-29dm mpath: fix support for loading scsi_dh modules during table loadMike Snitzer
The ability to have multipath dynamically attach a scsi_dh, that the user specified in the multipath table, was broken by commit e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue"). Restore the ability to load, and attach, a particular scsi_dh module if one is specified (as noticed by checking m->hw_handler_name). Fixes: e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue") Reported-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-29Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.17 Smaller new features to various drivers but nothing really out of ordinary. Major changes: ath10k * enable chip temperature measurement for QCA6174/QCA9377 * add firmware memory dump for QCA9984 * enable buffer STA on TDLS link for QCA6174 * support different beacon internals in multiple interface scenario for QCA988X/QCA99X0/QCA9984/QCA4019 iwlwifi * support for new PCI IDs for the 9000 family * support for a new firmware API version * support for advanced dwell and Optimized Connectivity Experience (OCE) in scanning btrsi * fix kconfig dependencies wil6210 * support multiple virtual interfaces ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29Merge tag 'mac80211-next-for-davem-2018-03-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a fair number of patches, but many of them are from the first bullet here: * EAPoL-over-nl80211 from Denis - this will let us fix some long-standing issues with bridging, races with encryption and more * DFS offload support from the qtnfmac folks * regulatory database changes for the new ETSI adaptivity requirements * various other fixes and small enhancements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29vhost: validate log when IOTLB is enabledJason Wang
Vq log_base is the userspace address of bitmap which has nothing to do with IOTLB. So it needs to be validated unconditionally otherwise we may try use 0 as log_base which may lead to pin pages that will lead unexpected result (e.g trigger BUG_ON() in set_bit_to_user()). Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29Fix vector raw inintialization logicAnton Ivanov
Vector RAW in UML needs to BPF filter its own MAC only if QDISC_BYPASS has failed. If QDISC_BYPASS is successful, the frames originated locally are not visible to readers on the raw socket. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-03-29Migrate vector timers to new timer APIAnton Ivanov
The patches for the UML vector drivers were in-flight when the timer changes happened and were not covered by them. This change migrates vector_kern.c to use the new timer API. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-03-29um: Compile with modern headersJason A. Donenfeld
Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2018-03-29RDMA/cma: Move rdma_cm_state to cma_priv.hParav Pandit
rdma_cm_state enum is internal to rdma_cm kernel module. It is not required to expose state enums to ULP modules. So lets keep its scope limited to rdma_cm module in cma_priv.h file. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29IB/addr: Constify dst_entry pointerParav Pandit
Make dst_entry pointer as const struct dst_entry* to improve code readablity to make sure that dst structure fields are not modified by various functions which are using it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29ALSA: hda - Silence PM ops build warningLukas Wunner
The system sleep PM ops azx_suspend() and azx_resume() were previously called by vga_switcheroo, but commit 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") removed their invocation. Unfortunately the commit neglected to update the #ifdef surrounding the two functions, so if CONFIG_PM_SLEEP is *not* enabled but all three of CONFIG_PM, CONFIG_VGA_SWITCHEROO and CONFIG_SND_HDA_CODEC_HDMI *are* enabled, the compiler now emits the following warning: sound/pci/hda/hda_intel.c:1024:12: warning: 'azx_resume' defined but not used [-Wunused-function] static int azx_resume(struct device *dev) ^~~~~~~~~~ sound/pci/hda/hda_intel.c:989:12: warning: 'azx_suspend' defined but not used [-Wunused-function] static int azx_suspend(struct device *dev) ^~~~~~~~~~~ Silence by updating the #ifdef. Because the #ifdef block now uses the same condition as the one immediately succeeding it, the two blocks can be collapsed together, shaving off another two lines. Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") Reviewed-by: Takashi Iwai <tiwai@suse.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.kernel.org/patch/10313441/ Link: https://patchwork.freedesktop.org/patch/msgid/b8e70e34a9acbd4f0a1a6c7673cea96888ae9503.1522323444.git.lukas@wunner.de
2018-03-29RDMA: Use u64_to_user_ptr everywhereJason Gunthorpe
This is already used in many places, get the rest of them too, only to make the code a bit clearer & simpler. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA/nldev: Provide netdevice name and indexLeon Romanovsky
Export the net device name and index to easily find connection between IB devices and relevant net devices. We also updated the comment regarding the devices without FW. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29IB/rxe: optimize mcast recv processZhu Yanjun
In mcast recv process, the function skb_clone is used. In fact, the refcount can be increased to replace cloning a new skb since the original skb will not be modified before it is freed. This can make the performance better and save the memory. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29qedr: Fix spelling mistake: "hanlde" -> "handle"Colin Ian King
Trivial fix to spelling mistake in DP_ERR message text Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29d_genocide: move export to definitionAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29fold dentry_lock_for_move() into its sole caller and clean it upAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29make non-exchanging __d_move() copy ->d_parent rather than swap themAl Viro
Currently d_move(from, to) does the following: * name/parent of from <- old name/parent of to, from hashed there * to is unhashed * name of to is preserved * if from used to be detached, to gets detached * if from used to be attached, parent of to <- old parent of from. That's both user-visibly bogus and complicates reasoning a lot. Much saner semantics would be * name/parent of from <- name/parent of to, from hashed there. * to is unhashed * name/parent of to is unchanged. The price, of course, is that old parent of from might lose a reference. However, * all potentially cross-directory callers of d_move() have both parents pinned directly; typically, dentries themselves are grabbed only after we have grabbed and locked both parents. IOW, the decrement of old parent's refcount in case of d_move() won't reach zero. * __d_move() from d_splice_alias() is done to detached alias. No refcount decrements in that case * __d_move() from __d_unalias() *can* get the refcount to zero. So let's grab a reference to alias' old parent before calling __d_unalias() and dput() it after we'd dropped rename_lock. That does make d_splice_alias() potentially blocking. However, it has no callers in non-sleepable contexts (and the case where we'd grown that dget/dput pair is _very_ rare, so performance is not an issue). Another thing that needs adjustment is unlocking in the end of __d_move(); folded it in. And cleaned the remnants of bogus ordering from the "lock them in the beginning" counterpart - it's never been right and now (well, for 7 years now) we have that thing always serialized on rename_lock anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29oprofilefs: don't oops on allocation failureAl Viro
... just short-circuit the creation of potential children Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29lustre: get rid of pointless casts to struct dentry *Al Viro
... when feeding const struct dentry * to primitives taking exactly that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29debugfs_lookup(): switch to lookup_one_len_unlocked()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29fold lookup_real() into __lookup_hash()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29take out orphan externs (empty_string/slash_string)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29split d_path() and friends into a separate fileAl Viro
Those parts of fs/dcache.c are pretty much self-contained. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29dcache.c: trim includesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29fs/dcache: Avoid a try_lock loop in shrink_dentry_list()John Ogness
shrink_dentry_list() holds dentry->d_lock and needs to acquire dentry->d_inode->i_lock. This cannot be done with a spin_lock() operation because it's the reverse of the regular lock order. To avoid ABBA deadlocks it is done with a trylock loop. Trylock loops are problematic in two scenarios: 1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are preemptible. As a consequence the i_lock holder can be preempted by a higher priority task. If that task executes the trylock loop it will do so forever and live lock. 2) In virtual machines trylock loops are problematic as well. The VCPU on which the i_lock holder runs can be scheduled out and a task on a different VCPU can loop for a whole time slice. In the worst case this can lead to starvation. Commits 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45f9 ("shrink_dentry_list(): take parent's d_lock earlier") are addressing exactly those symptoms. Avoid the trylock loop by using dentry_kill(). When pruning ancestors, the same code applies that is used to kill a dentry in dput(). This also has the benefit that the locking order is now the same. First the inode is locked, then the parent. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29get rid of trylock loop around dentry_kill()Al Viro
In case when trylock in there fails, deal with it directly in dentry_kill(). Note that in cases when we drop and retake ->d_lock, we need to recheck whether to retain the dentry. Another thing is that dropping/retaking ->d_lock might have ended up with negative dentry turning into positive; that, of course, can happen only once... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29handle move to LRU in retain_dentry()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29dput(): consolidate the "do we need to retain it?" into an inlined helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29split the slow part of lock_parent() offAl Viro
Turn the "trylock failed" part into uninlined __lock_parent(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29now lock_parent() can't run into killed dentryAl Viro
all remaining callers hold either a reference or ->i_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29get rid of trylock loop in locking dentries on shrink listAl Viro
In case of trylock failure don't re-add to the list - drop the locks and carefully get them in the right order. For shrink_dentry_list(), somebody having grabbed a reference to dentry means that we can kick it off-list, so if we find dentry being modified under us we don't need to play silly buggers with retries anyway - off the list it is. The locking logics taken out into a helper of its own; lock_parent() is no longer used for dentries that can be killed under us. [fix from Eric Biggers folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29Merge branch 'dsa-Add-ATU-VTU-statistics'David S. Miller
Andrew Lunn says: ==================== Add ATU/VTU statistics Previous patches have added basic support for Address Translation Unit and VLAN translation Unit violation interrupts. Add statistics counters for when these occur, which can be accessed using ethtool. Downgrade one of the particularly spammy warnings from VTU violations to debug only, now that we have a counter for it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29net: dsa: mv88e6xxx: Make VTU miss violations less spammyAndrew Lunn
VTU miss violations can happen under normal conditions. Don't spam the kernel log, downgrade the output to debug level only. The statistics counter will indicate it is happening, if anybody not debugging is interested. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>