summaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)Author
2021-04-21KEYS: trusted: Fix TPM reservation for seal/unsealJames Bottomley
The original patch 8c657a0590de ("KEYS: trusted: Reserve TPM for seal and unseal operations") was correct on the mailing list: https://lore.kernel.org/linux-integrity/20210128235621.127925-4-jarkko@kernel.org/ But somehow got rebased so that the tpm_try_get_ops() in tpm2_seal_trusted() got lost. This causes an imbalanced put of the TPM ops and causes oopses on TIS based hardware. This fix puts back the lost tpm_try_get_ops() Fixes: 8c657a0590de ("KEYS: trusted: Reserve TPM for seal and unseal operations") Reported-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2021-04-16kasan: remove redundant config optionWalter Wu
CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN stack instrumentation, but we should only need one config, so that we remove CONFIG_KASAN_STACK_ENABLE and make CONFIG_KASAN_STACK workable. see [1]. When enable KASAN stack instrumentation, then for gcc we could do no prompt and default value y, and for clang prompt and default value n. This patch fixes the following compilation warning: include/linux/kasan.h:333:30: warning: 'CONFIG_KASAN_STACK' is not defined, evaluates to 0 [-Wundef] [akpm@linux-foundation.org: fix merge snafu] Link: https://bugzilla.kernel.org/show_bug.cgi?id=210221 [1] Link: https://lkml.kernel.org/r/20210226012531.29231-1-walter-zh.wu@mediatek.com Fixes: d9b571c885a8 ("kasan: fix KASAN_STACK dependency for HW_TAGS") Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-09Merge tag 'selinux-pr-20210409' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fixes from Paul Moore: "Three SELinux fixes. These fix known problems relating to (re)loading SELinux policy or changing the policy booleans, and pass our test suite without problem" * tag 'selinux-pr-20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix race between old and new sidtab selinux: fix cond_list corruption when changing booleans selinux: make nslot handling in avtab more robust
2021-04-07selinux: fix race between old and new sidtabOndrej Mosnacek
Since commit 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU"), there is a small window during policy load where the new policy pointer has already been installed, but some threads may still be holding the old policy pointer in their read-side RCU critical sections. This means that there may be conflicting attempts to add a new SID entry to both tables via sidtab_context_to_sid(). See also (and the rest of the thread): https://lore.kernel.org/selinux/CAFqZXNvfux46_f8gnvVvRYMKoes24nwm2n3sPbMjrB8vKTW00g@mail.gmail.com/ Fix this by installing the new policy pointer under the old sidtab's spinlock along with marking the old sidtab as "frozen". Then, if an attempt to add new entry to a "frozen" sidtab is detected, make sidtab_context_to_sid() return -ESTALE to indicate that a new policy has been installed and that the caller will have to abort the policy transaction and try again after re-taking the policy pointer (which is guaranteed to be a newer policy). This requires adding a retry-on-ESTALE logic to all callers of sidtab_context_to_sid(), but fortunately these are easy to determine and aren't that many. This seems to be the simplest solution for this problem, even if it looks somewhat ugly. Note that other places in the kernel (e.g. do_mknodat() in fs/namei.c) use similar stale-retry patterns, so I think it's reasonable. Cc: stable@vger.kernel.org Fixes: 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-04-02selinux: fix cond_list corruption when changing booleansOndrej Mosnacek
Currently, duplicate_policydb_cond_list() first copies the whole conditional avtab and then tries to link to the correct entries in cond_dup_av_list() using avtab_search(). However, since the conditional avtab may contain multiple entries with the same key, this approach often fails to find the right entry, potentially leading to wrong rules being activated/deactivated when booleans are changed. To fix this, instead start with an empty conditional avtab and add the individual entries one-by-one while building the new av_lists. This approach leads to the correct result, since each entry is present in the av_lists exactly once. The issue can be reproduced with Fedora policy as follows: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True # setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off On fixed kernels, the sesearch output is the same after the setsebool command: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True While on the broken kernels, it will be different: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True While there, also simplify the computation of nslots. This changes the nslots values for nrules 2 or 3 to just two slots instead of 4, which makes the sequence more consistent. Cc: stable@vger.kernel.org Fixes: c7c556f1e81b ("selinux: refactor changing booleans") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-04-02selinux: make nslot handling in avtab more robustOndrej Mosnacek
1. Make sure all fileds are initialized in avtab_init(). 2. Slightly refactor avtab_alloc() to use the above fact. 3. Use h->nslot == 0 as a sentinel in the access functions to prevent dereferencing h->htable when it's not allocated. Cc: stable@vger.kernel.org Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-28tomoyo: don't special case PF_IO_WORKER for PF_KTHREADJens Axboe
Since commit 3bfe6106693b6b4b ("io-wq: fork worker threads from original task") stopped using PF_KTHREAD flag for the io_uring PF_IO_WORKER threads, tomoyo_kernel_service() no longer needs to check PF_IO_WORKER flag. (This is a 5.12+ patch. Please don't send to stable kernels.) Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-03-25Merge tag 'integrity-v5.12-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fix from Mimi Zohar: "Just one patch to address a NULL ptr dereferencing when there is a mismatch between the user enabled LSMs and IMA/EVM" * tag 'integrity-v5.12-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: double check iint_cache was initialized
2021-03-22integrity: double check iint_cache was initializedMimi Zohar
The kernel may be built with multiple LSMs, but only a subset may be enabled on the boot command line by specifying "lsm=". Not including "integrity" on the ordered LSM list may result in a NULL deref. As reported by Dmitry Vyukov: in qemu: qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu max,migratable=off -smp 4 -m 4G,slots=4,maxmem=16G -hda wheezy.img -kernel arch/x86/boot/bzImage -nographic -vga std -soundhw all -usb -usbdevice tablet -bt hci -bt device:keyboard -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic,model=virtio-net-pci -object memory-backend-file,id=pmem1,share=off,mem-path=/dev/zero,size=64M -device nvdimm,id=nvdimm1,memdev=pmem1 -append "console=ttyS0 root=/dev/sda earlyprintk=serial rodata=n oops=panic panic_on_warn=1 panic=86400 lsm=smack numa=fake=2 nopcid dummy_hcd.num=8" -pidfile vm_pid -m 2G -cpu host But it crashes on NULL deref in integrity_inode_get during boot: Run /sbin/init as init process BUG: kernel NULL pointer dereference, address: 000000000000001c PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc2+ #97 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-44-g88ab0c15525c-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0x2b/0x370 mm/slub.c:2920 Code: 57 41 56 41 55 41 54 41 89 f4 55 48 89 fd 53 48 83 ec 10 44 8b 3d d9 1f 90 0b 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 <8b> 5f 1c 4cf RSP: 0000:ffffc9000032f9d8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888017fc4f00 RCX: 0000000000000000 RDX: ffff888040220000 RSI: 0000000000000c40 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888019263627 R10: ffffffff83937cd1 R11: 0000000000000000 R12: 0000000000000c40 R13: ffff888019263538 R14: 0000000000000000 R15: 0000000000ffffff FS: 0000000000000000(0000) GS:ffff88802d180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 000000000b48e000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: integrity_inode_get+0x47/0x260 security/integrity/iint.c:105 process_measurement+0x33d/0x17e0 security/integrity/ima/ima_main.c:237 ima_bprm_check+0xde/0x210 security/integrity/ima/ima_main.c:474 security_bprm_check+0x7d/0xa0 security/security.c:845 search_binary_handler fs/exec.c:1708 [inline] exec_binprm fs/exec.c:1761 [inline] bprm_execve fs/exec.c:1830 [inline] bprm_execve+0x764/0x19a0 fs/exec.c:1792 kernel_execve+0x370/0x460 fs/exec.c:1973 try_to_run_init_process+0x14/0x4e init/main.c:1366 kernel_init+0x11d/0x1b8 init/main.c:1477 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Modules linked in: CR2: 000000000000001c ---[ end trace 22d601a500de7d79 ]--- Since LSMs and IMA may be configured at build time, but not enabled at run time, panic the system if "integrity" was not initialized before use. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 79f7865d844c ("LSM: Introduce "lsm=" for boottime LSM selection") Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-03-22Merge tag 'selinux-pr-20210322' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fixes from Paul Moore: "Three SELinux patches: - Fix a problem where a local variable is used outside its associated function. Thankfully this can only be triggered by reloading the SELinux policy, which is a restricted operation for other obvious reasons. - Fix some incorrect, and inconsistent, audit and printk messages when loading the SELinux policy. All three patches are relatively minor and have been through our testing with no failures" * tag 'selinux-pr-20210322' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinuxfs: unify policy load error reporting selinux: fix variable scope issue in live sidtab conversion selinux: don't log MAC_POLICY_LOAD record on failed policy load
2021-03-18selinuxfs: unify policy load error reportingOndrej Mosnacek
Let's drop the pr_err()s from sel_make_policy_nodes() and just add one pr_warn_ratelimited() call to the sel_make_policy_nodes() error path in sel_write_load(). Changing from error to warning makes sense, since after 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs"), this error path no longer leads to a broken selinuxfs tree (it's just kept in the original state and policy load is aborted). I also added _ratelimited to be consistent with the other prtin in the same function (it's probably not necessary, but can't really hurt... there are likely more important error messages to be printed when filesystem entry creation starts erroring out). Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-18selinux: fix variable scope issue in live sidtab conversionOndrej Mosnacek
Commit 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs") moved the selinux_policy_commit() call out of security_load_policy() into sel_write_load(), which caused a subtle yet rather serious bug. The problem is that security_load_policy() passes a reference to the convert_params local variable to sidtab_convert(), which stores it in the sidtab, where it may be accessed until the policy is swapped over and RCU synchronized. Before 02a52c5c8c3b, selinux_policy_commit() was called directly from security_load_policy(), so the convert_params pointer remained valid all the way until the old sidtab was destroyed, but now that's no longer the case and calls to sidtab_context_to_sid() on the old sidtab after security_load_policy() returns may cause invalid memory accesses. This can be easily triggered using the stress test from commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance"): ``` function rand_cat() { echo $(( $RANDOM % 1024 )) } function do_work() { while true; do echo -n "system_u:system_r:kernel_t:s0:c$(rand_cat),c$(rand_cat)" \ >/sys/fs/selinux/context 2>/dev/null || true done } do_work >/dev/null & do_work >/dev/null & do_work >/dev/null & while load_policy; do echo -n .; sleep 0.1; done kill %1 kill %2 kill %3 ``` Fix this by allocating the temporary sidtab convert structures dynamically and passing them among the selinux_policy_{load,cancel,commit} functions. Fixes: 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs") Cc: stable@vger.kernel.org Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [PM: merge fuzz in security.h and services.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-18selinux: don't log MAC_POLICY_LOAD record on failed policy loadOndrej Mosnacek
If sel_make_policy_nodes() fails, we should jump to 'out', not 'out1', as the latter would incorrectly log an MAC_POLICY_LOAD audit record, even though the policy hasn't actually been reloaded. The 'out1' jump label now becomes unused and can be removed. Fixes: 02a52c5c8c3b ("selinux: move policy commit after updating selinuxfs") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-12Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file ↵Eric W. Biederman
capabilities") It turns out that there are in fact userspace implementations that care and this recent change caused a regression. https://github.com/containers/buildah/issues/3071 As the motivation for the original change was future development, and the impact is existing real world code just revert this change and allow the ambiguity in v3 file caps. Cc: stable@vger.kernel.org Fixes: 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-02-23Merge tag 'keys-misc-20210126' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring updates from David Howells: "Here's a set of minor keyrings fixes/cleanups that I've collected from various people for the upcoming merge window. A couple of them might, in theory, be visible to userspace: - Make blacklist_vet_description() reject uppercase letters as they don't match the all-lowercase hex string generated for a blacklist search. This may want reconsideration in the future, but, currently, you can't add to the blacklist keyring from userspace and the only source of blacklist keys generates lowercase descriptions. - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag. This isn't currently a problem as the blacklist keyring isn't currently writable by userspace. The rest of the patches are cleanups and I don't think they should have any visible effect" * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: watch_queue: rectify kernel-doc for init_watch() certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID certs: Fix blacklist flag type confusion PKCS#7: Fix missing include certs: Fix blacklisted hexadecimal hash string check certs/blacklist: fix kernel doc interface issue crypto: public_key: Remove redundant header file from public_key.h keys: remove trailing semicolon in macro definition crypto: pkcs7: Use match_string() helper to simplify the code PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one encrypted-keys: Replace HTTP links with HTTPS ones crypto: asymmetric_keys: fix some comments in pkcs7_parser.h KEYS: remove redundant memset security: keys: delete repeated words in comments KEYS: asymmetric: Fix kerneldoc security/keys: use kvfree_sensitive() watch_queue: Drop references to /dev/watch_queue keys: Remove outdated __user annotations security: keys: Fix fall-through warnings for Clang
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-22Merge branch 'userns-for-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace update from Eric Biederman: "There are several pieces of active development, but only a single change made it through the gauntlet to be ready for v5.12. That change is tightening up the semantics of the v3 capabilities xattr. It is just short of being a bug-fix/security issue as no user space is known to even generate the problem case" * 'userns-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: capabilities: Don't allow writing ambiguous v3 file capabilities
2021-02-22Merge branch 'work.audit' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU-safe common_lsm_audit() from Al Viro: "Make common_lsm_audit() non-blocking and usable from RCU pathwalk context. We don't really need to grab/drop dentry in there - rcu_read_lock() is enough. There's a couple of followups using that to simplify the logics in selinux, but those hadn't soaked in -next yet, so they'll have to go in next window" * 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make dump_common_audit_data() safe to be called from RCU pathwalk new helper: d_find_alias_rcu()
2021-02-21Merge tag 'tpmdd-next-v5.12-rc1-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "New features: - Cr50 I2C TPM driver - sysfs exports of PCR registers in TPM 2.0 chips Bug fixes: - bug fixes for tpm_tis driver, which had a racy wait for hardware state change to be ready to send a command to the TPM chip. The bug has existed already since 2006, but has only made itself known in recent past. This is the same as the "last time" :-) - Otherwise there's bunch of fixes for not as alarming regressions. I think the list is about the same as last time, except I added fixes for some disjoint bugs in trusted keys that I found some time ago" * tag 'tpmdd-next-v5.12-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: KEYS: trusted: Reserve TPM for seal and unseal operations KEYS: trusted: Fix migratable=1 failing KEYS: trusted: Fix incorrect handling of tpm_get_random() tpm/ppi: Constify static struct attribute_group ABI: add sysfs description for tpm exports of PCR registers tpm: add sysfs exports for all banks of PCR registers keys: Update comment for restrict_link_by_key_or_keyring_chain tpm: Remove tpm_dev_wq_lock char: tpm: add i2c driver for cr50 tpm: Fix fall-through warnings for Clang tpm_tis: Clean up locality release tpm_tis: Fix check_locality for correct locality acquisition
2021-02-21Merge tag 'Smack-for-v5.12' of git://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack updates from Casey Schaufler: "Bounds checking for writes to smackfs interfaces" * tag 'Smack-for-v5.12' of git://github.com/cschaufler/smack-next: smackfs: restrict bytes count in smackfs write functions
2021-02-21Merge tag 'integrity-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA updates from Mimi Zohar: "New is IMA support for measuring kernel critical data, as per usual based on policy. The first example measures the in memory SELinux policy. The second example measures the kernel version. In addition are four bug fixes to address memory leaks and a missing 'static' function declaration" * tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: Make function integrity_add_key() static ima: Free IMA measurement buffer after kexec syscall ima: Free IMA measurement buffer on error IMA: Measure kernel version in early boot selinux: include a consumer of the new IMA critical data hook IMA: define a builtin critical data measurement policy IMA: extend critical data hook to limit the measurement based on a label IMA: limit critical data measurement based on a label IMA: add policy rule to measure critical data IMA: define a hook to measure kernel integrity critical data IMA: add support to measure buffer data hash IMA: generalize keyring specific measurement constructs evm: Fix memleak in init_desc
2021-02-21Merge tag 'selinux-pr-20210215' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "We've got a good handful of patches for SELinux this time around; with everything passing the selinux-testsuite and applying cleanly to your tree as of a few minutes ago. The highlights are: - Add support for labeling anonymous inodes, and extend this new support to userfaultfd. - Fallback to SELinux genfs file labeling if the filesystem does not have xattr support. This is useful for virtiofs which can vary in its xattr support depending on the backing filesystem. - Classify and handle MPTCP the same as TCP in SELinux. - Ensure consistent behavior between inode_getxattr and inode_listsecurity when the SELinux policy is not loaded. This fixes a known problem with overlayfs. - A couple of patches to prune some unused variables from the SELinux code, mark private variables as static, and mark other variables as __ro_after_init or __read_mostly" * tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: fs: anon_inodes: rephrase to appropriate kernel-doc userfaultfd: use secure anon inodes for userfaultfd selinux: teach SELinux about anonymous inodes fs: add LSM-supporting anon-inode interface security: add inode_init_security_anon() LSM hook selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support selinux: mark selinux_xfrm_refcount as __read_mostly selinux: mark some global variables __ro_after_init selinux: make selinuxfs_mount static selinux: drop the unnecessary aurule_callback variable selinux: remove unused global variables selinux: fix inconsistency between inode_getxattr and inode_listsecurity selinux: handle MPTCP consistently with TCP
2021-02-21Merge tag 'tomoyo-pr-20210215' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1Linus Torvalds
Pull tomoyo updates from Tetsuo Handa: "Detect kernel thread correctly, and ignore harmless data race" * tag 'tomoyo-pr-20210215' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: tomoyo: recognize kernel threads correctly tomoyo: ignore data race while checking quota
2021-02-16KEYS: trusted: Reserve TPM for seal and unseal operationsJarkko Sakkinen
When TPM 2.0 trusted keys code was moved to the trusted keys subsystem, the operations were unwrapped from tpm_try_get_ops() and tpm_put_ops(), which are used to take temporarily the ownership of the TPM chip. The ownership is only taken inside tpm_send(), but this is not sufficient, as in the key load TPM2_CC_LOAD, TPM2_CC_UNSEAL and TPM2_FLUSH_CONTEXT need to be done as a one single atom. Take the TPM chip ownership before sending anything with tpm_try_get_ops() and tpm_put_ops(), and use tpm_transmit_cmd() to send TPM commands instead of tpm_send(), reverting back to the old behaviour. Fixes: 2e19e10131a0 ("KEYS: trusted: Move TPM2 trusted keys code") Reported-by: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Sumit Garg <sumit.garg@linaro.org> Acked-by Sumit Garg <sumit.garg@linaro.org> Tested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-02-16KEYS: trusted: Fix migratable=1 failingJarkko Sakkinen
Consider the following transcript: $ keyctl add trusted kmk "new 32 blobauth=helloworld keyhandle=80000000 migratable=1" @u add_key: Invalid argument The documentation has the following description: migratable= 0|1 indicating permission to reseal to new PCR values, default 1 (resealing allowed) The consequence is that "migratable=1" should succeed. Fix this by allowing this condition to pass instead of return -EINVAL. [*] Documentation/security/keys/trusted-encrypted.rst Cc: stable@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: David Howells <dhowells@redhat.com> Fixes: d00a1c72f7f4 ("keys: add new trusted key-type") Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-02-16KEYS: trusted: Fix incorrect handling of tpm_get_random()Jarkko Sakkinen
When tpm_get_random() was introduced, it defined the following API for the return value: 1. A positive value tells how many bytes of random data was generated. 2. A negative value on error. However, in the call sites the API was used incorrectly, i.e. as it would only return negative values and otherwise zero. Returning he positive read counts to the user space does not make any possible sense. Fix this by returning -EIO when tpm_get_random() returns a positive value. Fixes: 41ab999c80f1 ("tpm: Move tpm_get_random api into the TPM device driver") Cc: stable@vger.kernel.org Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: David Howells <dhowells@redhat.com> Cc: Kent Yoder <key@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-12integrity: Make function integrity_add_key() staticWei Yongjun
The sparse tool complains as follows: security/integrity/digsig.c:146:12: warning: symbol 'integrity_add_key' was not declared. Should it be static? This function is not used outside of digsig.c, so this commit marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 60740accf784 ("integrity: Load certs to the platform keyring") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-10Merge branch 'ima-kexec-fixes' into next-integrityMimi Zohar
2021-02-10ima: Free IMA measurement buffer after kexec syscallLakshmi Ramasubramanian
IMA allocates kernel virtual memory to carry forward the measurement list, from the current kernel to the next kernel on kexec system call, in ima_add_kexec_buffer() function. This buffer is not freed before completing the kexec system call resulting in memory leak. Add ima_buffer field in "struct kimage" to store the virtual address of the buffer allocated for the IMA measurement list. Free the memory allocated for the IMA measurement list in kimage_file_post_load_cleanup() function. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list") Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-10ima: Free IMA measurement buffer on errorLakshmi Ramasubramanian
IMA allocates kernel virtual memory to carry forward the measurement list, from the current kernel to the next kernel on kexec system call, in ima_add_kexec_buffer() function. In error code paths this memory is not freed resulting in memory leak. Free the memory allocated for the IMA measurement list in the error code paths in ima_add_kexec_buffer() function. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list") Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-02smackfs: restrict bytes count in smackfs write functionsSabyrzhan Tasbolatov
syzbot found WARNINGs in several smackfs write operations where bytes count is passed to memdup_user_nul which exceeds GFP MAX_ORDER. Check count size if bigger than PAGE_SIZE. Per smackfs doc, smk_write_net4addr accepts any label or -CIPSO, smk_write_net6addr accepts any label or -DELETE. I couldn't find any general rule for other label lengths except SMK_LABELLEN, SMK_LONGLABEL, SMK_CIPSOMAX which are documented. Let's constrain, in general, smackfs label lengths for PAGE_SIZE. Although fuzzer crashes write to smackfs/netlabel on 0x400000 length. Here is a quick way to reproduce the WARNING: python -c "print('A' * 0x400000)" > /sys/fs/smackfs/netlabel Reported-by: syzbot+a71a442385a0b2815497@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2021-02-01tomoyo: recognize kernel threads correctlyTetsuo Handa
Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced segment_eq(get_fs(), KERNEL_DS) with uaccess_kernel(). But the correct method for tomoyo to check whether current is a kernel thread in order to assume that kernel threads are privileged for socket operations was (current->flags & PF_KTHREAD). Now that uaccess_kernel() became 0 on x86, tomoyo has to fix this problem. Do like commit 942cb357ae7d9249 ("Smack: Handle io_uring kernel thread privileges") does. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-02-01tomoyo: ignore data race while checking quotaTetsuo Handa
syzbot is reporting that tomoyo's quota check is racy [1]. But this check is tolerant of some degree of inaccuracy. Thus, teach KCSAN to ignore this data race. [1] https://syzkaller.appspot.com/bug?id=999533deec7ba6337f8aa25d8bd1a4d5f7e50476 Reported-by: syzbot <syzbot+0789a72b46fd91431bd8@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-01-28cap: fix conversions on getxattrMiklos Szeredi
If a capability is stored on disk in v2 format cap_inode_getsecurity() will currently return in v2 format unconditionally. This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid, and so the same conversions performed on it. If the rootid cannot be mapped, v3 is returned unconverted. Fix this so that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs user namespace in case of v2) cannot be mapped into the current user namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-01-26IMA: Measure kernel version in early bootRaphael Gianotti
The integrity of a kernel can be verified by the boot loader on cold boot, and during kexec, by the current running kernel, before it is loaded. However, it is still possible that the new kernel being loaded is older than the current kernel, and/or has known vulnerabilities. Therefore, it is imperative that an attestation service be able to verify the version of the kernel being loaded on the client, from cold boot and subsequent kexec system calls, ensuring that only kernels with versions known to be good are loaded. Measure the kernel version using ima_measure_critical_data() early on in the boot sequence, reducing the chances of known kernel vulnerabilities being exploited. With IMA being part of the kernel, this overall approach makes the measurement itself more trustworthy. To enable measuring the kernel version "ima_policy=critical_data" needs to be added to the kernel command line arguments. For example, BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset ima_policy=critical_data If runtime measurement of the kernel version is ever needed, the following should be added to /etc/ima/ima-policy: measure func=CRITICAL_DATA label=kernel_info To extract the measured data after boot, the following command can be used: grep -m 1 "kernel_version" \ /sys/kernel/security/integrity/ima/ascii_runtime_measurements Sample output from the command above: 10 a8297d408e9d5155728b619761d0dd4cedf5ef5f ima-buf sha256:5660e19945be0119bc19cbbf8d9c33a09935ab5d30dad48aa11f879c67d70988 kernel_version 352e31312e302d7263332d31363138372d676564623634666537383234342d6469727479 The above hex-ascii string corresponds to the kernel version (e.g. xxd -r -p): 5.11.0-rc3-16187-gedb64fe78244-dirty Signed-off-by: Raphael Gianotti <raphgi@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-01-24ima: handle idmapped mountsChristian Brauner
IMA does sometimes access the inode's i_uid and compares it against the rules' fowner. Enable IMA to handle idmapped mounts by passing down the mount's user namespace. We simply make use of the helpers we introduced before. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24apparmor: handle idmapped mountsChristian Brauner
The i_uid and i_gid are mostly used when logging for AppArmor. This is broken in a bunch of places where the global root id is reported instead of the i_uid or i_gid of the file. Nonetheless, be kind and log the mapped inode if we're coming from an idmapped mount. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-26-christian.brauner@ubuntu.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24fs: make helpers idmap mount awareChristian Brauner
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24commoncap: handle idmapped mountsChristian Brauner
When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24xattr: handle idmapped mountsTycho Andersen
When interacting with extended attributes the vfs verifies that the caller is privileged over the inode with which the extended attribute is associated. For posix access and posix default extended attributes a uid or gid can be stored on-disk. Let the functions handle posix extended attributes on idmapped mounts. If the inode is accessed through an idmapped mount we need to map it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. This has no effect for e.g. security xattrs since they don't store uids or gids and don't perform permission checks on them like posix acls do. Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24acl: handle idmapped mountsChristian Brauner
The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24inode: make init and permission helpers idmapped mount awareChristian Brauner
The inode_owner_or_capable() helper determines whether the caller is the owner of the inode or is capable with respect to that inode. Allow it to handle idmapped mounts. If the inode is accessed through an idmapped mount it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Similarly, allow the inode_init_owner() helper to handle idmapped mounts. It initializes a new inode on idmapped mounts by mapping the fsuid and fsgid of the caller from the mount's user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24capability: handle idmapped mountsChristian Brauner
In order to determine whether a caller holds privilege over a given inode the capability framework exposes the two helpers privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former verifies that the inode has a mapping in the caller's user namespace and the latter additionally verifies that the caller has the requested capability in their current user namespace. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped inodes. If the initial user namespace is passed all operations are a nop so non-idmapped mounts will not see a change in behavior. Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-21certs: Fix blacklist flag type confusionDavid Howells
KEY_FLAG_KEEP is not meant to be passed to keyring_alloc() or key_alloc(), as these only take KEY_ALLOC_* flags. KEY_FLAG_KEEP has the same value as KEY_ALLOC_BYPASS_RESTRICTION, but fortunately only key_create_or_update() uses it. LSMs using the key_alloc hook don't check that flag. KEY_FLAG_KEEP is then ignored but fortunately (again) the root user cannot write to the blacklist keyring, so it is not possible to remove a key/hash from it. Fix this by adding a KEY_ALLOC_SET_KEEP flag that tells key_alloc() to set KEY_FLAG_KEEP on the new key. blacklist_init() can then, correctly, pass this to keyring_alloc(). We can also use this in ima_mok_init() rather than setting the flag manually. Note that this doesn't fix an observable bug with the current implementation but it is required to allow addition of new hashes to the blacklist in the future without making it possible for them to be removed. Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring") Reported-by: Mickaël Salaün <mic@linux.microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Mickaël Salaün <mic@linux.microsoft.com> cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org>
2021-01-21KEYS: remove redundant memsetTom Rix
Reviewing use of memset in keyctl_pkey.c keyctl_pkey_params_get prologue code to set params up memset(params, 0, sizeof(*params)); params->encoding = "raw"; keyctl_pkey_query has the same prologue and calls keyctl_pkey_params_get. So remove the prologue. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
2021-01-21security: keys: delete repeated words in commentsRandy Dunlap
Drop repeated words in comments. {to, will, the} Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Ben Boeckel <mathstuf@gmail.com> Cc: keyrings@vger.kernel.org Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org
2021-01-21security/keys: use kvfree_sensitive()Denis Efremov
Use kvfree_sensitive() instead of open-coding it. Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
2021-01-21watch_queue: Drop references to /dev/watch_queueGabriel Krisman Bertazi
The merged API doesn't use a watch_queue device, but instead relies on pipes, so let the documentation reflect that. Fixes: f7e47677e39a ("watch_queue: Add a key/keyring notification facility") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
2021-01-21keys: Remove outdated __user annotationsJann Horn
When the semantics of the ->read() handlers were changed such that "buffer" is a kernel pointer, some __user annotations survived. Since they're wrong now, get rid of them. Fixes: d3ec10aa9581 ("KEYS: Don't write out to userspace while holding key semaphore") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
2021-01-21security: keys: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Ben Boeckel <mathstuf@gmail.com>