summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-02rbd: switch to common striping frameworkIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: create+truncate for whole-object layered discardsIlya Dryomov
A whole-object layered discard is implemented as a truncate rather than a delete: a dummy object is needed to prevent the CoW machinery from kicking in. However, a truncate on a non-existent object is a no-op. If the object doesn't exist in HEAD, a discard request is effectively ignored, which violates our "discard zeroes data" promise and breaks REQ_OP_WRITE_ZEROES implementation. A non-exclusive create on an existing object is also a no-op, so the fix is to do a compound create+truncate instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move to obj_req->img_extentsIlya Dryomov
In preparation for rbd "fancy" striping, replace obj_req->img_offset with obj_req->img_extents. A single starting offset isn't sufficient because we want only one OSD request per object and will merge adjacent object extents in ceph_file_to_extents(). The final object extent may map into multiple different byte ranges in the image. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: incorporate ceph_object_extentIlya Dryomov
obj_req->object_no -> obj_req->ex.oe_objno obj_req->offset -> obj_req->ex.oe_off obj_req->length -> obj_req->ex.oe_len ... and use ex for linking object requests to image requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: move ceph_calc_file_object_mapping() to striper.cIlya Dryomov
ceph_calc_file_object_mapping() has nothing to do with osdmaps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph: striping framework implementationIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: store data_type in img_req instead of obj_reqIlya Dryomov
All object requests are associated with an image request now -- avoid duplicating the same info in each object request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove obj_req->flags fieldIlya Dryomov
There are no standalone (!IMG_DATA) object requests anymore. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request completion codeIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request completion codeIlya Dryomov
Do away with partial request completions and all the associated complexity. Individual object requests no longer need to be completed in order -- when the last one becomes ready, we complete the entire higher level request all at once. This also wraps up the conversion to a state machine model and eliminates the recursion described in commit 6d69bb536bac ("rbd: prevent kernel stack blow up on rbd map"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: update rbd_img_request_submit() signatureIlya Dryomov
It should be void now. Also, object requests are unlinked only in image request destructor, which can't run before rbd_img_request_put(), so no need for _safe. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: add img_req->op_type fieldIlya Dryomov
Store op_type in its own field instead of packing it into flags. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: simplify rbd_osd_req_create()Ilya Dryomov
No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags can be set in rbd_osd_req_format_{read,write}(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request handling codeIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request handling codeIlya Dryomov
The notable changes are: - instead of explicitly stat'ing the object to see if it exists before issuing the write, send the write optimistically along with the stat in a single OSD request - zero copyup optimization - all object requests are associated with an image request and have a valid ->img_request pointer; there are no standalone (!IMG_DATA) object requests anymore - code is structured as a state machine (vs a bunch of callbacks with implicit state) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph: handle zero-length data itemsIlya Dryomov
rbd needs this for null copyups -- if copyup data is all zeroes, we want to save some I/O and network bandwidth. See rbd_obj_issue_copyup() in the next commit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: move from raw pages to bvec data descriptorsIlya Dryomov
In preparation for rbd "fancy" striping which requires bio_vec arrays, wire up BVECS data type and kill off PAGES data type. There is nothing wrong with using page vectors for copyup requests -- it's just less iterator boilerplate code to write for the new striping framework. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02libceph: introduce BVECS data typeIlya Dryomov
In preparation for rbd "fancy" striping, introduce ceph_bvec_iter for working with bio_vec array data buffers. The wrappers are trivial, but make it look similar to ceph_bio_iter. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: get rid of img_req->copyup_pagesIlya Dryomov
The initiating object request is the proper owner -- save a bit of space. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: don't (ab)use obj_req->pages for stat requestsIlya Dryomov
obj_req->pages is for provided data buffers. stat requests are internal and should be NODATA. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: remove bio cloning helpersIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02libceph, rbd: new bio handling code (aka don't clone bios)Ilya Dryomov
The reason we clone bios is to be able to give each object request (and consequently each ceph_osd_data/ceph_msg_data item) its own pointer to a (list of) bio(s). The messenger then initializes its cursor with cloned bio's ->bi_iter, so it knows where to start reading from/writing to. That's all the cloned bios are used for: to determine each object request's starting position in the provided data buffer. Introduce ceph_bio_iter to do exactly that -- store position within bio list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: start enums at 1 instead of 0Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: change ceph_calc_file_object_mapping() signatureIlya Dryomov
- make it void - xlen (object extent length) out parameter should be u32 because only a single stripe unit is mapped at a time Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02libceph: eliminate overflows in ceph_calc_file_object_mapping()Ilya Dryomov
bl, stripeno and objsetno should be u64 -- otherwise large enough files get corrupted. How large depends on file layout: - 4M-objects layout (default): any file over 16P - 64K-objects layout (smallest possible object size): any file over 512T Only CephFS is affected, rbd doesn't use ceph_calc_file_object_mapping() yet. Fortunately, CephFS has a max_file_size configurable, the default for which is way below both of the above numbers. Reimplement the logic from scratch with no layout validation -- it's done on the MDS side. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: set max_segment_size to UINT_MAXIlya Dryomov
Commit 21acdf45f495 ("rbd: set max_segments to USHRT_MAX") removed the limit on max_segments. Remove the limit on max_segment_size as well. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-01ext4: force revalidation of directory pointer after seekdir(2)Theodore Ts'o
A malicious user could force the directory pointer to be in an invalid spot by using seekdir(2). Use the mechanism we already have to notice if the directory has changed since the last time we called ext4_readdir() to force a revalidation of the pointer. Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-04-01Merge branch 'net-bgmac-Couple-of-sparse-warnings'David S. Miller
Florian Fainelli says: ==================== net: bgmac: Couple of sparse warnings This patch series fixes a couple of warnings reported by sparse, should not cause any functional problems since bgmac is typically used on LE platforms anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()Florian Fainelli
bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian 32-bit word without using proper accessors, fix this, and because a length cannot be negative, use unsigned int while at it. Fixes: 9cde94506eac ("bgmac: implement scatter/gather support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01net: bgmac: Correctly annotate register spaceFlorian Fainelli
All the members: base, idm_base and nicpm_base should be annotated with __iomem since they are pointers to register space. This fixes a bunch of sparse reported warnings. Fixes: f6a95a24957a ("net: ethernet: bgmac: Add platform device support") Fixes: dd5c5d037f5e ("net: ethernet: bgmac: add NS2 support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01cifs: smbd: disconnect transport on RDMA errorsLong Li
On RDMA errors, transport should disconnect the RDMA CM connection. This will notify the upper layer, and it will attempt transport reconnect. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01cifs: smbd: avoid reconnect lockupLong Li
During transport reconnect, other processes may have registered memory and blocked on transport. This creates a deadlock situation because the transport resources can't be freed, and reconnect is blocked. Fix this by returning to upper layer on timeout. Before returning, transport status is set to reconnecting so other processes will release memory registration resources. Upper layer will retry the reconnect. This is not in fast I/O path so setting the timeout to 5 seconds. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01Don't log confusing message on reconnect by defaultSteve French
Change the following message (which can occur on reconnect) from a warning to an FYI message. It is confusing to users. [58360.523634] CIFS VFS: Free previous auth_key.response = 00000000a91cdc84 By default this message won't show up on reconnect unless the user bumps up the log level to include FYI messages. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-01Don't log expected error on DFS referral requestSteve French
STATUS_FS_DRIVER_REQUIRED is expected when DFS is not turned on on the server. Do not log it on DFS referral response. It clutters the dmesg log unnecessarily at mount time. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
2018-04-01fs: cifs: Replace _free_xid call in cifs_root_iget functionPhillip Potter
Modify end of cifs_root_iget function in fs/cifs/inode.c to call free_xid(xid) instead of _free_xid(xid), thereby allowing debug notification of this action when enabled. Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01SMB3.1.1 dialect is no longer experimentalSteve French
SMB3.1.1 is a very important dialect, with much improved security. We can remove the ExPERIMENTAL comments about it. It is widely supported by servers. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01Tree connect for SMB3.1.1 must be signed for non-encrypted sharesSteve French
SMB3.1.1 tree connect was only being signed when signing was mandatory but needs to always be signed (for non-guest users). See MS-SMB2 section 3.2.4.1.1 Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
2018-04-01fix smb3-encryption breakage when CONFIG_DEBUG_SG=yRonnie Sahlberg
We can not use the standard sg_set_buf() fucntion since when CONFIG_DEBUG_SG=y this adds a check that will BUG_ON for cifs.ko when we pass it an object from the stack. Create a new wrapper smb2_sg_set_buf() which avoids doing that particular check and use it for smb3 encryption instead. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01CIFS: fix sha512 check in cifs_crypto_secmech_releaseGustavo A. R. Silva
It seems this is a copy-paste error and that the proper variable to use in this particular case is _sha512_ instead of _md5_. Addresses-Coverity-ID: 1465358 ("Copy-paste error") Fixes: 1c6614d229e7 ("CIFS: add sha512 secmech") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01CIFS: implement v3.11 preauth integrityAurelien Aptel
SMB3.11 clients must implement pre-authentification integrity. * new mechanism to certify requests/responses happening before Tree Connect. * supersedes VALIDATE_NEGOTIATE * fixes signing for SMB3.11 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01CIFS: add sha512 secmechAurelien Aptel
* prepare for SMB3.11 pre-auth integrity * enable sha512 when SMB311 is enabled in Kconfig * add sha512 as a soft dependency Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01CIFS: refactor crypto shash/sdesc allocation&freeAurelien Aptel
shash and sdesc and always allocated and freed together. * abstract this in new functions cifs_alloc_hash() and cifs_free_hash(). * make smb2/3 crypto allocation independent from each other. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2018-04-01Update README file for cifs.koSteve French
Remove references to two obsolete /proc/fs/cifs parameters and update for a few minor SMB3 features. Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01Update TODO list for cifs.koSteve French
Update list of items still TODO in cifs.ko Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01cifs: fix memory leak in SMB2_open()Ronnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
2018-04-01CIFS: SMBD: fix spelling mistake: "faield" and "legnth"Colin Ian King
Trivial fix to spelling mistake in log_rdma_send and log_rdma_mr message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-01route: check sysctl_fib_multipath_use_neigh earlier than hashXin Long
Prior to this patch, when one packet is hashed into path [1] (hash <= nh_upper_bound) and it's neigh is dead, it will try path [2]. However, if path [2]'s neigh is alive but it's hash > nh_upper_bound, it will not return this alive path. This packet will never be sent even if path [2] is alive. 3.3.3.1/24: nexthop via 1.1.1.254 dev eth1 weight 1 <--[1] (dead neigh) nexthop via 2.2.2.254 dev eth2 weight 1 <--[2] With sysctl_fib_multipath_use_neigh set is supposed to find an available path respecting to the l3/l4 hash. But if there is no available route with this hash, it should at least return an alive route even with other hash. This patch is to fix it by processing fib_multipath_use_neigh earlier than the hash check, so that it will at least return an alive route if there is when fib_multipath_use_neigh is enabled. It's also compatible with before when there are alive routes with the l3/l4 hash. Fixes: a6db4494d218 ("net: ipv4: Consider failed nexthops in multipath routes") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01fix typo in command value in drivers/net/phy/mdio-bitbang.Frans Meulenbroeks
mdio-bitbang mentioned 10 for both read and write. However mdio read opcode is 10 and write opcode is 01 Fixed comment. Signed-off-by: Frans Meulenbroeks <fransmeulenbroeks@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01sky2: Increase D3 delay to sky2 stops working after suspendKai-Heng Feng
The sky2 ethernet stops working after system resume from suspend: [ 582.852065] sky2 0000:04:00.0: Refused to change power state, currently in D3 The current 150ms delay is not enough, change it to 200ms can solve the issue. BugLink: https://bugs.launchpad.net/bugs/1758507 Cc: Stable <stable@vger.kernel.org> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01net/mlx5e: Set EQE based as default TX interrupt moderation modeTal Gilboa
The default TX moderation mode was mistakenly set to CQE based. The intention was to add a control ability in order to improve some specific use-cases. In general, we prefer to use EQE based moderation as it gives much better numbers for the common cases. CQE based causes a degradation in the common case since it resets the moderation timer on CQE generation. This causes an issue when TSO is well utilized (large TSO sessions). The timer is set to 16us so traffic of ~64KB TSO sessions per second would mean timer reset (CQE per TSO session -> long time between CQEs). In this case we quickly reach the tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic. By setting EQE based moderation we make sure timer would expire after 16us regardless of the packet rate. This fixes an up to 40% packet rate and up to 23% bandwidth degradtions. Fixes: 0088cbbc4b66 ("net/mlx5e: Enable CQE based moderation on TX CQ") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>