summaryrefslogtreecommitdiff
path: root/include/linux/ceph/ceph_fs.h
AgeCommit message (Collapse)Author
2024-11-19ceph: correct ceph_mds_cap_peer field namePatrick Donnelly
The peer seq is used as the issue_seq. Use that name for consistency. See also ceph.git commit 1da6ef237fc7 ("include/ceph_fs: correct ceph_mds_cap_peer field name"). Link: https://tracker.ceph.com/issues/66704 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-19ceph: correct ceph_mds_cap_item field namePatrick Donnelly
The issue_seq is sent with bulk cap releases, not the current sequence number. See also ceph.git commit 655cddb7c9f3 ("include/ceph_fs: correct ceph_mds_cap_item field name"). Link: https://tracker.ceph.com/issues/66704 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-11-03ceph: handle idmapped mounts in create_request_message()Christian Brauner
Inode operations that create a new filesystem object such as ->mknod, ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. Instead the caller's fs{g,u}id is used for the {g,u}id of the new filesystem object. In order to ensure that the correct {g,u}id is used map the caller's fs{g,u}id for creation requests. This doesn't require complex changes. It suffices to pass in the relevant idmapping recorded in the request message. If this request message was triggered from an inode operation that creates filesystem objects it will have passed down the relevant idmaping. If this is a request message that was triggered from an inode operation that doens't need to take idmappings into account the initial idmapping is passed down which is an identity mapping. This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID which adds two new fields (owner_{u,g}id) to the request head structure. So, we need to ensure that MDS supports it otherwise we need to fail any IO that comes through an idmapped mount because we can't process it in a proper way. MDS server without such an extension will use caller_{u,g}id fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id values are unmapped. At the same time we can't map these fields with an idmapping as it can break UID/GID-based permission checks logic on the MDS side. This problem was described with a lot of details at [1], [2]. [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ Link: https://github.com/ceph/ceph/pull/52575 Link: https://tracker.ceph.com/issues/62217 Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-09-18Revert "ceph: make members in struct ceph_mds_request_args_ext a union"Ilya Dryomov
This reverts commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9. ceph_mds_request_args_ext was already (and remains to be) a union. An additional anonymous union inside is bogus: union ceph_mds_request_args_ext { union { union ceph_mds_request_args old; struct { ... } __attribute__ ((packed)) setattr_ext; }; } Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
2023-08-31ceph: make num_fwd and num_retry to __u32Xiubo Li
The num_fwd in MClientRequestForward is int32_t, while the num_fwd in ceph_mds_request_head is __u8. This is buggy when the num_fwd is larger than 256 it will always be truncate to 0 again. But the client couldn't recoginize this. This will make them to __u32 instead. Because the old cephs will directly copy the raw memories when decoding the reqeust's head, so we need to make sure this kclient will be compatible with old cephs. For newer cephs they will decode the requests depending the version, which will be much simpler and easier to extend new members. Link: https://tracker.ceph.com/issues/62145 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-31ceph: make members in struct ceph_mds_request_args_ext a unionXiubo Li
In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request. Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here. This will save 48 bytes memories for each request. Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-22ceph: fscrypt_auth handling for cephJeff Layton
Most fscrypt-enabled filesystems store the crypto context in an xattr, but that's problematic for ceph as xatts are governed by the XATTR cap, but we really want the crypto context as part of the AUTH cap. Because of this, the MDS has added two new inode metadata fields: fscrypt_auth and fscrypt_file. The former is used to hold the crypto context, and the latter is used to track the real file size. Parse new fscrypt_auth and fscrypt_file fields in inode traces. For now, we don't use fscrypt_file, but fscrypt_auth is used to hold the fscrypt context. Allow the client to use a setattr request for setting the fscrypt_auth field. Since this is not a standard setattr request from the VFS, we add a new field to __ceph_setattr that carries ceph-specific inode attrs. Have the set_context op do a setattr that sets the fscrypt_auth value, and get_context just return the contents of that field (since it should always be available). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03ceph: fix incorrect old_size length in ceph_mds_request_argsXiubo Li
The 'old_size' is a __le64 type since birth, not sure why the kclient incorrectly switched it to __le32. This change is okay won't break anything because union will always allocate more memory than the 'open' member needed. Rename 'file_replication' to 'pool' as ceph did. Though this 'open' struct may never be used in kclient in future, it's confusing when going through the ceph code. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03ceph: fix the incorrect comment for the ceph_mds_caps structXiubo Li
The incorrect comment is misleading. Acutally the last members in ceph_mds_caps strcut is a union for none export and export bodies. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: do not release the global snaprealm until unmountingXiubo Li
The global snaprealm would be created and then destroyed immediately every time when updating it. URL: https://tracker.ceph.com/issues/54362 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: remove incorrect and unused CEPH_INO_DOTDOT macroXiubo Li
Ceph have removed this macro and the 0x3 will be use for global dummy snaprealm. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: add getvxattr opMilind Changire
Problem: Some directory vxattrs (e.g. ceph.dir.pin.random) are governed by information that isn't necessarily shared with the client. Add support for the new GETVXATTR operation, which allows the client to query the MDS directly for vxattrs. When the client is queried for a vxattr that doesn't have a special handler, have it issue a GETVXATTR to the MDS directly. Solution: Adds new getvxattr op to fetch ceph.dir.pin*, ceph.dir.layout* and ceph.file.layout* vxattrs. If the entire layout for a dir or a file is being set, then it is expected that the layout be set in standard JSON format. Individual field value retrieval is not wrapped in JSON. The JSON format also applies while setting the vxattr if the entire layout is being set in one go. As a temporary measure, setting a vxattr can also be done in the old format. The old format will be deprecated in the future. URL: https://tracker.ceph.com/issues/51062 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-11-08ceph: don't rely on error_string to validate blocklisted session.Kotresh HR
The "error_string" in the metadata of MClientSession is being parsed by kclient to validate whether the session is blocklisted. The "error_string" is for humans and shouldn't be relied on it. Hence added the flag to MClientsession to indicate the session is blocklisted. [ jlayton: minor formatting cleanup ] URL: https://tracker.ceph.com/issues/47450 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-02ceph: flush mdlog before umountingXiubo Li
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: implement msgr2.1 protocol (crc and secure modes)Ilya Dryomov
Implement msgr2.1 wire protocol, available since nautilus 14.2.11 and octopus 15.2.5. msgr2.0 wire protocol is not implemented -- it has several security, integrity and robustness issues and therefore considered deprecated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: introduce connection modes and ms_mode optionIlya Dryomov
msgr2 supports two connection modes: crc (plain) and secure (on-wire encryption). Connection mode is picked by server based on input from client. Introduce ms_mode option: ms_mode=legacy - msgr1 (default) ms_mode=crc - crc mode, if denied fail ms_mode=secure - secure mode, if denied fail ms_mode=prefer-crc - crc mode, if denied agree to secure mode ms_mode=prefer-secure - secure mode, if denied agree to crc mode ms_mode affects all connections, we don't separate connections to mons like it's done in userspace with ms_client_mode vs ms_mon_client_mode. For now the default is legacy, to be flipped to prefer-crc after some time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: amend cephx init_protocol() and build_request()Ilya Dryomov
In msgr2, initial authentication happens with an exchange of msgr2 control frames -- MAuth message and struct ceph_mon_request_header aren't used. Make that optional. Stop reporting cephx protocol as "x". Use "cephx" instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: implement updated ceph_mds_request_head structureJeff Layton
When we added the btime feature in mainline ceph, we had to extend struct ceph_mds_request_args so that it could be set. Implement the same in the kernel client. Rename ceph_mds_request_head with a _old extension, and a union ceph_mds_request_args_ext to allow for the extended size of the new header format. Add the appropriate code to handle both formats in struct create_request_message and key the behavior on whether the peer supports CEPH_FEATURE_FS_BTIME. The gid_list field in the payload is now populated from the saved credential. For now, we don't add any support for setting the btime via setattr, but this does enable us to add that in the future. [ idryomov: break unnecessarily long lines ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: set osdmap epoch for setxattrXiubo Li
When setting the file/dir layout, it may need data pool info. So in mds server, it needs to check the osdmap. At present, if mds doesn't find the data pool specified, it will try to get the latest osdmap. Now if pass the osd epoch for setxattr, the mds server can only check this epoch of osdmap. URL: https://tracker.ceph.com/issues/48504 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03ceph: periodically send perf metrics to MDSesXiubo Li
This will send the caps/read/write/metadata metrics to any available MDS once per second, which will be the same as the userland client. It will skip the MDS sessions which don't support the metric collection, as the MDSs will close socket connections when they get an unknown type message. We can disable the metric sending via the disable_send_metrics module parameter. [ jlayton: fix up endianness bug in ceph_mdsc_send_metrics() ] URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: consider inode's last read/write when calculating wanted capsYan, Zheng
Add i_last_rd and i_last_wr to ceph_inode_info. These fields are used to track the last time the client acquired read/write caps for the inode. If there is no read/write on an inode for 'caps_wanted_delay_max' seconds, __ceph_caps_file_wanted() does not request caps for read/write even there are open files. Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted() calculates dir's wanted caps according to last dir read/modification. If there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If there is recent dir modification, also wants CEPH_CAP_FILE_EXCL. Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after readdir, as with that, modifications do not need to release CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: attempt to do async create when possibleJeff Layton
With the Octopus release, the MDS will hand out directory create caps. If we have Fxc caps on the directory, and complete directory information or a known negative dentry, then we can return without waiting on the reply, allowing the open() call to return very quickly to userland. We use the normal ceph_fill_inode() routine to fill in the inode, so we have to gin up some reply inode information with what we'd expect the newly-created inode to have. The client assumes that it has a full set of caps on the new inode, and that the MDS will revoke them when there is conflicting access. This functionality is gated on the wsync/nowsync mount options. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: cap tracking for async directory operationsJeff Layton
Track and correctly handle directory caps for asynchronous operations. Add aliases for Frc caps that we now designate at Dcu caps (when dealing with directories). Unlike file caps, we don't reclaim these when the session goes away, and instead preemptively release them. In-flight async dirops are instead handled during reconnect phase. The client needs to re-do a synchronous operation in order to re-get directory caps. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: track primary dentry linkJeff Layton
Newer versions of the MDS will flag a dentry as "primary". In later patches, we'll need to consult this info, so track it in di->flags. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: add flag to designate that a request is asynchronousJeff Layton
...and ensure that such requests are never queued. The MDS has need to know that a request is asynchronous so add flags and proper infrastructure for that. Also, delegated inode numbers and directory caps are associated with the session, so ensure that async requests are always transmitted on the first attempt and are never queued to wait for session reestablishment. If it does end up looking like we'll need to queue the request, then have it return -EJUKEBOX so the caller can reattempt with a synchronous request. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-07-08ceph: more precise CEPH_CLIENT_CAPS_PENDING_CAPSNAPYan, Zheng
Client uses this flag to tell mds if there is more cap snap need to flush. It's mainly for the case that client needs to re-send cap/snap flushes after mds failover, but CEPH_CAP_ANY_FILE_WR on corresponding inodes are all released before mds failover. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07ceph: snapshot nfs re-exportYan, Zheng
To support snapshot nfs re-export, we need a way to lookup snapped inode by file handle. For directory inode, snapped metadata are always stored together with head inode. Client just need to pass vinodeno_t to MDS. For non-directory inode, there can be multiple version of snapped inodes and they can be stored in different dirfrags. Besides vinodeno_t, client also need to tell mds from which dirfrag it got the snapped inode. Another problem of supporting snapshot nfs re-export is that there can be multiple paths to access a snapped inode. For example: mkdir -p d1/d2/d3 mkdir d1/.snap/s1 Paths 'd1/.snap/s1/d2/d3', 'd1/d2/.snap/_s1_<inode number of d1>/d3' and 'd1/d2/d3/.snap/_s1_<inode number of d1>' are all reference to the same snapped inode. For a given snapped inode, There is no convenient way to get the first form and the second form paths. For simplicity, ceph_get_parent() return snapdir for snapped directory inode. Furthermore, client may access snapshot of deleted directory. For example: mkdir -p d1/d2 mkdir d1/.snap/s1 open d1/.snap/s1/d2 rm -rf d1/d2 <nfs server restart> The path constucted by ceph_get_parent() and ceph_get_name() is '<inode of d2>/.snap/_s1_<inode number of d1>'. Futher lookup parent of <inode of d2> will fail. To workaround this case, this patch uses d_obtain_root() to get dentry for snapdir of deleted directory. snapdir dentry has no DCACHE_DISCONNECTED flag set, reconnect_path() stops when it reaches snapdir dentry. Link: http://tracker.ceph.com/issues/22105 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: always get rstat from auth mdsYan, Zheng
rstat is not tracked by capability. client can't know if rstat from non-auth mds is uptodate or not. Link: http://tracker.ceph.com/issues/23538 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: add initial infrastructure to support cephfs quotasLuis Henriques
This patch adds the infrastructure required to support cephfs quotas as it is currently implemented in the ceph fuse client. Cephfs quotas can be set on any directory, and can restrict the number of bytes or the number of files stored beneath that point in the directory hierarchy. Quotas are set using the extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', and can be removed by setting these attributes to '0'. Link: http://tracker.ceph.com/issues/22372 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-06ceph: more accurate statfsDouglas Fuller
Improve accuracy of statfs reporting for Ceph filesystems comprising exactly one data pool. In this case, the Ceph monitor can now report the space usage for the single data pool instead of the global data for the entire Ceph cluster. Include support for this message in mon_client and leverage it in ceph/super. Signed-off-by: Douglas Fuller <dfuller@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: new cap message flags indicate if there is pending capsnapYan, Zheng
These flags tell mds if there is pending capsnap explicitly. Without this explicit notification, mds can only conclude if client has pending capsnap. The method mds use is inefficient and error-prone. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: respect RADOS_BACKOFF backoffsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-05-04ceph: fix file open flags on ppc64Alexander Graf
The file open flags (O_foo) are platform specific and should never go out to an interface that is not local to the system. Unfortunately these flags have leaked out onto the wire in the cephfs implementation. That lead to bogus flags getting transmitted on ppc64. This patch converts the kernel view of flags to the ceph view of file open flags. Fixes: 124e68e74 ("ceph: file operations") Signed-off-by: Alexander Graf <agraf@suse.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-05-04ceph: make seeky readdir more efficientYan, Zheng
Current cephfs client uses string to indicate start position of readdir. The string is last entry of previous readdir reply. This approach does not work for seeky readdir because we can not easily convert the new postion to a string. For seeky readdir, mds needs to return dentries from the beginning. Client keeps retrying if the reply does not contain the dentry it wants. In current version of ceph, mds sorts CDentry in its cache in hash order. Client also uses dentry hash to compose dir postion. For seeky readdir, if client passes the hash part of dir postion to mds. mds can avoid replying useless dentries. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12ceph: add flags parameter to send_cap_msgJeff Layton
Add a flags parameter to send_cap_msg, so we can request expedited service from the MDS when we know we'll be waiting on the result. Set that flag in the case of try_flush_caps. The callers of that function generally wait synchronously on the result, so it's beneficial to ask the server to expedite it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: handle CEPH_SESSION_REJECT messageYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-08-24libceph: support for blacklisting clientsDouglas Fuller
Reuse ceph_mon_generic_request infrastructure for sending monitor commands. In particular, add support for 'blacklist add' to prevent other, non-responsive clients from making further updates. Signed-off-by: Douglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-07-28libceph: fsmap.user subscription supportYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28ceph: reduce i_nr_by_mode array sizeYan, Zheng
Track usage count for individual fmode bit. This can reduce the array size by half. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28libceph: rados pool namespace supportYan, Zheng
Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28libceph: define new ceph_file_layout structureYan, Zheng
Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: using hash value to compose dentry offsetYan, Zheng
If MDS sorts dentries in dirfrag in hash order, we use hash value to compose dentry offset. dentry offset is: (0xff << 52) | ((24 bits hash) << 28) | (the nth entry hash hash collision) This offset is stable across directory fragmentation. This alos means there is no need to reset readdir offset if directory get fragmented in the middle of readdir. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: define 'end/complete' in readdir reply as bit flagsYan, Zheng
Set a flag in readdir request, which indicates that client interprets 'end/complete' as bit flags. So that mds can reply additional flags in readdir reply. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26libceph: support for subscribing to "mdsmap.<id>" mapsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26libceph, rbd: ceph_osd_linger_request, watch/notify v2Ilya Dryomov
This adds support and switches rbd to a new, more reliable version of watch/notify protocol. As with the OSD client update, this is mostly about getting the right structures linked into the right places so that reconnects are properly sent when needed. watch/notify v2 also requires sending regular pings to the OSDs - send_linger_ping(). A major change from the old watch/notify implementation is the introduction of ceph_osd_linger_request - linger requests no longer piggy back on ceph_osd_request. ceph_osd_event has been merged into ceph_osd_linger_request. All the details are now hidden within libceph, the interface consists of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack(). ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep the lifetime management simple. ceph_osdc_notify_ack() accepts an optional data payload, which is relayed back to the notifier. Portions of this patch are loosely based on work by Douglas Fuller <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25ceph: fix security xattr deadlockYan, Zheng
When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25libceph: revamp subs code, switch to SUBSCRIBE2 protocolIlya Dryomov
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-22ceph: rename snapshot supportYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: handle SESSION_FORCE_RO messageYan, Zheng
mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>