summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-03fib: add missing attribute validation for tun_idJakub Kicinski
Add missing netlink policy entry for FRA_TUN_ID. Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03devlink: validate length of region addr/lenJakub Kicinski
DEVLINK_ATTR_REGION_CHUNK_ADDR and DEVLINK_ATTR_REGION_CHUNK_LEN lack entries in the netlink policy. Corresponding nla_get_u64()s may read beyond the end of the message. Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03devlink: validate length of param valuesJakub Kicinski
DEVLINK_ATTR_PARAM_VALUE_DATA may have different types so it's not checked by the normal netlink policy. Make sure the attribute length is what we expect. Fixes: e3b7ca18ad7b ("devlink: Add param set command") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id modeTony Lindgren
Commit cd28d1d6e52e ("net: phy: at803x: Disable phy delay for RGMII mode") caused a regression for dm814x boards where NFSroot would no longer work. Let's fix the issue by configuring "rgmii-id" mode as internal delays are needed that is no longer the case with "rgmii" mode. Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-03-03hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()Dan Carpenter
This is only called from adt7462_update_device(). The caller expects it to return zero on error. I fixed a similar issue earlier in commit a4bf06d58f21 ("hwmon: (adt7462) ADT7462_REG_VOLT_MAX() should return 0") but I missed this one. Fixes: c0b4e3ab0c76 ("adt7462: new hwmon driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200303101608.kqjwfcazu2ylhi2a@kili.mountain Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-03-03perf symbols: Don't try to find a vmlinux file when looking for kernel modulesArnaldo Carvalho de Melo
The dso->kernel value is now set to everything that is in machine->kmaps, but that was being used to decide if vmlinux lookup is needed, which ended up making that lookup be made for kernel modules, that now have dso->kernel set, leading to these kinds of warnings when running on a machine with compressed kernel modules, like fedora:31: [root@five ~]# perf record -F 10000 -a sleep 2 [ perf record: Woken up 1 times to write data ] lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' lzma: fopen failed on vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux: 'No such file or directory' lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory' lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory' [ perf record: Captured and wrote 1.024 MB perf.data (1366 samples) ] [root@five ~]# This happens when collecting the buildid, when we find samples for kernel modules, fix it by checking if the looked up DSO is a kernel module by other means. Fixes: 02213cec64bb ("perf maps: Mark module DSOs with kernel type") Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200302191007.GD10335@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-03perf bench: Share some global variables to fix build with gcc 10Arnaldo Carvalho de Melo
Noticed with gcc 10 (fedora rawhide) that those variables were not being declared as static, so end up with: ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1 Prefix those with bench__ and add them to bench/bench.h, so that we can share those on the tools needing to access those variables from signal handlers. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-03binder: prevent UAF for binderfs devices IIChristian Brauner
This is a necessary follow up to the first fix I proposed and we merged in 2669b8b0c79 ("binder: prevent UAF for binderfs devices"). I have been overly optimistic that the simple fix I proposed would work. But alas, ihold() + iput() won't work since the inodes won't survive the destruction of the superblock. So all we get with my prior fix is a different race with a tinier race-window but it doesn't solve the issue. Fwiw, the problem lies with generic_shutdown_super(). It even has this cozy Al-style comment: if (!list_empty(&sb->s_inodes)) { printk("VFS: Busy inodes after unmount of %s. " "Self-destruct in 5 seconds. Have a nice day...\n", sb->s_id); } On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is called which punts the actual cleanup operation to a workqueue. At some point, binder_deferred_func() will be called which will end up calling binder_deferred_release() which will retrieve and cleanup the binder_context attach to this struct binder_proc. If we trace back where this binder_context is attached to binder_proc we see that it is set in binder_open() and is taken from the struct binder_device it is associated with. This obviously assumes that the struct binder_device that context is attached to is _never_ freed. While that might be true for devtmpfs binder devices it is most certainly wrong for binderfs binder devices. So, assume binder_open() is called on a binderfs binder devices. We now stash away the struct binder_context associated with that struct binder_devices: proc->context = &binder_dev->context; /* binderfs stashes devices in i_private */ if (is_binderfs_device(nodp)) { binder_dev = nodp->i_private; info = nodp->i_sb->s_fs_info; binder_binderfs_dir_entry_proc = info->proc_log_dir; } else { . . . proc->context = &binder_dev->context; Now let's assume that the binderfs instance for that binder devices is shutdown via umount() and/or the mount namespace associated with it goes away. As long as there is still an fd open for that binderfs binder device things are fine. But let's assume we now close the last fd for that binderfs binder device. Now binder_release() is called and punts to the workqueue. Assume that the workqueue has quite a bit of stuff to do and doesn't get to cleaning up the struct binder_proc and the associated struct binder_context with it for that binderfs binder device right away. In the meantime, the VFS is killing the super block and is ultimately calling sb->evict_inode() which means it will call binderfs_evict_inode() which does: static void binderfs_evict_inode(struct inode *inode) { struct binder_device *device = inode->i_private; struct binderfs_info *info = BINDERFS_I(inode); clear_inode(inode); if (!S_ISCHR(inode->i_mode) || !device) return; mutex_lock(&binderfs_minors_mutex); --info->device_count; ida_free(&binderfs_minors, device->miscdev.minor); mutex_unlock(&binderfs_minors_mutex); kfree(device->context.name); kfree(device); } thereby freeing the struct binder_device including struct binder_context. Now the workqueue finally has time to get around to cleaning up struct binder_proc and is now trying to access the associate struct binder_context. Since it's already freed it will OOPs. Fix this by introducing a refounct on binder devices. This is an alternative fix to 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()"). Fixes: 3ad20fe393b3 ("binder: implement binderfs") Fixes: 2669b8b0c798 ("binder: prevent UAF for binderfs devices") Fixes: 03e2e07e3814 ("binder: Make transaction_log available in binderfs") Related : 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20200303164340.670054-1-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03sys/sysinfo: Respect boottime inside time namespaceCyril Hrubis
The sysinfo() syscall includes uptime in seconds but has no correction for time namespaces which makes it inconsistent with the /proc/uptime inside of a time namespace. Add the missing time namespace adjustment call. Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dima@arista.com> Link: https://lkml.kernel.org/r/20200303150638.7329-1-chrubis@suse.cz
2020-03-03riscv: Change code model of module to medany to improve data accessingVincent Chen
All the loaded module locates in the region [&_end-2G,VMALLOC_END] at runtime, so the distance from the module start to the end of the kernel image does not exceed 2GB. Hence, the code model of the kernel module can be changed to medany to improve the performance data access. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03riscv: avoid the PIC offset of static percpu data in module beyond 2G limitsVincent Chen
The compiler uses the PIC-relative method to access static variables instead of GOT when the code model is PIC. Therefore, the limitation of the access range from the instruction to the symbol address is +-2GB. Under this circumstance, the kernel cannot load a kernel module if this module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason is that kernel relocates the .data..percpu section of the kernel module to the end of kernel's .data..percpu. Hence, the distance between the per-CPU symbols and the instruction will exceed the 2GB limits. To solve this problem, the kernel should place the loaded module in the memory area [&_end-2G, VMALLOC_END]. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Suggested-by: Alexandre Ghiti <alex@ghiti.fr> Suggested-by: Anup Patel <anup@brainfault.org> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Tested-by: Carlos de Paula <me@carlosedp.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03KVM: x86: remove stale comment from struct x86_emulate_ctxtVitaly Kuznetsov
Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") did some field shuffling and instead of [opcode_len, _regs) started clearing [has_seg_override, modrm). The comment about clearing fields altogether is not true anymore. Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03KVM: x86: clear stale x86_emulate_ctxt->intercept valueVitaly Kuznetsov
After commit 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Hyper-V guests on KVM stopped booting with: kvm_nested_vmexit: rip fffff802987d6169 reason EPT_VIOLATION info1 181 info2 0 int_info 0 int_info_err 0 kvm_page_fault: address febd0000 error_code 181 kvm_emulate_insn: 0:fffff802987d6169: f3 a5 kvm_emulate_insn: 0:fffff802987d6169: f3 a5 FAIL kvm_inj_exception: #UD (0x0) "f3 a5" is a "rep movsw" instruction, which should not be intercepted at all. Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") reduced the number of fields cleared by init_decode_cache() claiming that they are being cleared elsewhere, 'intercept', however, is left uncleared if the instruction does not have any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm, NearBranch, No16 and of course Intercept itself). Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: stable@vger.kernel.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03dm: bump version of core and various targetsMike Snitzer
Changes made during the 5.6 cycle warrant bumping the version number for DM core and the targets modified by this commit. It should be noted that dm-thin, dm-crypt and dm-raid already had their target version bumped during the 5.6 merge window. Signed-off-by; Mike Snitzer <snitzer@redhat.com>
2020-03-03dm: fix congested_fn for request-based deviceHou Tao
We neither assign congested_fn for requested-based blk-mq device nor implement it correctly. So fix both. Also, remove incorrect comment from dm_init_normal_md_queue and rename it to dm_init_congested_fn. Fixes: 4aa9c692e052 ("bdi: separate out congested state into a separate struct") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03fcntl: Distribute switch variables for initializationKees Cook
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. fs/fcntl.c: In function ‘send_sigio_to_task’: fs/fcntl.c:738:20: warning: statement will never be executed [-Wswitch-unreachable] 738 | kernel_siginfo_t si; | ^~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-03-03erofs: handle corrupted images whose decompressed size less than it'd beGao Xiang
As Lasse pointed out, "Looking at fs/erofs/decompress.c, the return value from LZ4_decompress_safe_partial is only checked for negative value to catch errors. ... So if I understood it correctly, if there is bad data whose uncompressed size is much less than it should be, it can leave part of the output buffer untouched and expose the previous data as the file content. " Let's fix it now. Cc: Lasse Collin <lasse.collin@tukaani.org> Fixes: 7fc45dbc938a ("staging: erofs: introduce generic decompression backend") [ Gao Xiang: v5.3+, I will manually backport this to stable later. ] Link: https://lore.kernel.org/r/20200226081008.86348-3-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-03erofs: use LZ4_decompress_safe() for full decodingGao Xiang
As Lasse pointed out, "EROFS uses LZ4_decompress_safe_partial for both partial and full blocks. Thus when it is decoding a full block, it doesn't know if the LZ4 decoder actually decoded all the input. The real uncompressed size could be bigger than the value stored in the file system metadata. Using LZ4_decompress_safe instead of _safe_partial when decompressing a full block would help to detect errors." So it's reasonable to use _safe in case of potential corrupted images and it might have some speed gain as well although I didn't observe much difference. Note that legacy compressor (< 5.3, no LZ4_0PADDING) could encode extra data in a pcluster, which is excluded as well. Cc: Lasse Collin <lasse.collin@tukaani.org> Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace") [ Gao Xiang: v5.3+, I will manually backport this to stable later. ] Link: https://lore.kernel.org/r/20200226081008.86348-2-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-03erofs: correct the remaining shrink objectsGao Xiang
The remaining count should not include successful shrink attempts. Fixes: e7e9a307be9d ("staging: erofs: introduce workstation for decompression") Cc: <stable@vger.kernel.org> # 4.19+ Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-03mt76: fix array overflow on receiving too many fragments for a packetFelix Fietkau
If the hardware receives an oversized packet with too many rx fragments, skb_shinfo(skb)->frags can overflow and corrupt memory of adjacent pages. This becomes especially visible if it corrupts the freelist pointer of a slab page. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-03-03erofs: convert workstn to XArrayGao Xiang
XArray has friendly APIs and it will replace the old radix tree in the near future. This convert makes use of __xa_cmpxchg when inserting on a just inserted item by other thread. In detail, instead of totally looking up again as what we did for the old radix tree, it will try to legitimize the current in-tree item in the XArray therefore more effective. In addition, naming is rather a challenge for non-English speaker like me. The basic idea of workstn is to provide a runtime sparse array with items arranged in the physical block number order. Such items (was called workgroup) can be used to record compress clusters or for later new features. However, both workgroup and workstn seem not good names from whatever point of view, so I'd like to rename them as pslot and managed_pslots to stand for physical slots. This patch handles the second as a part of the radix tree convert. Cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20200220024642.91529-1-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2020-03-03arm64: dts: socfpga: agilex: Fix gmac compatibleLey Foon Tan
Fix gmac compatible string to "altr,socfpga-stmmac-a10-s10". Gmac for Agilex should use same compatible as Stratix 10. Fixes: 4b36daf9ada3 ("arm64: dts: agilex: Add initial support for Intel's Agilex SoCFPGA") Cc: stable@vger.kernel.org Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2020-03-03dm integrity: use dm_bio_record and dm_bio_restoreMike Snitzer
In cases where dec_in_flight() has to requeue the integrity_bio_wait work to transfer the rest of the data, the bio's __bi_remaining might already have been decremented to 0, e.g.: if bio passed to underlying data device was split via blk_queue_split(). Use dm_bio_{record,restore} rather than effectively open-coding them in dm-integrity -- these methods now manage __bi_remaining too. Depends-on: f7f0b057a9c1 ("dm bio record: save/restore bi_end_io and bi_integrity") Reported-by: Daniel Glöckner <dg@emlix.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03dm bio record: save/restore bi_end_io and bi_integrityMike Snitzer
Also, save/restore __bi_remaining in case the bio was used in a BIO_CHAIN (e.g. due to blk_queue_split). Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03btrfs: fix RAID direct I/O reads with alternate csumsOmar Sandoval
btrfs_lookup_and_bind_dio_csum() does pointer arithmetic which assumes 32-bit checksums. If using a larger checksum, this leads to spurious failures when a direct I/O read crosses a stripe. This is easy to reproduce: # mkfs.btrfs -f --checksum blake2 -d raid0 /dev/vdc /dev/vdd ... # mount /dev/vdc /mnt # cd /mnt # dd if=/dev/urandom of=foo bs=1M count=1 status=none # dd if=foo of=/dev/null bs=1M iflag=direct status=none dd: error reading 'foo': Input/output error # dmesg | tail -1 [ 135.821568] BTRFS warning (device vdc): csum failed root 5 ino 257 off 421888 ... Fix it by using the actual checksum size. Fixes: 1e25a2e3ca0d ("btrfs: don't assume ordered sums to be 4 bytes") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-03ASoC: SOF: Fix snd_sof_ipc_stream_posn()Dan Carpenter
We're passing "&posn" instead of "posn" so it ends up corrupting memory instead of doing something useful. Fixes: 53e0c72d98ba ("ASoC: SOF: Add support for IPC IO between DSP and Host") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20200303101858.ytehbrivocyp3cnf@kili.mountain Signed-off-by: Mark Brown <broonie@kernel.org>
2020-03-03ASoC: rt1015: modify pre-divider for sysclkJack Yu
Modify pre-divider for system clock. Signed-off-by: Jack Yu <jack.yu@realtek.com> Link: https://lore.kernel.org/r/20200303025913.24499-1-jack.yu@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-03-03interconnect: Handle memory allocation errorsGeorgi Djakov
When we allocate memory, kasprintf() can fail and we must check its return value. Fixes: 05309830e1f8 ("interconnect: Add a name to struct icc_path") Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Link: https://lore.kernel.org/r/20200226110420.5357-2-georgi.djakov@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03altera-stapl: altera_get_note: prevent write beyond end of 'key'Daniel Axtens
altera_get_note is called from altera_init, where key is kzalloc(33). When the allocation functions are annotated to allow the compiler to see the sizes of objects, and with FORTIFY_SOURCE, we see: In file included from drivers/misc/altera-stapl/altera.c:14:0: In function ‘strlcpy’, inlined from ‘altera_init’ at drivers/misc/altera-stapl/altera.c:2189:5: include/linux/string.h:378:4: error: call to ‘__write_overflow’ declared with attribute error: detected write beyond size of object passed as 1st parameter __write_overflow(); ^~~~~~~~~~~~~~~~~~ That refers to this code in altera_get_note: if (key != NULL) strlcpy(key, &p[note_strings + get_unaligned_be32( &p[note_table + (8 * i)])], length); The error triggers because the length of 'key' is 33, but the copy uses length supplied as the 'length' parameter, which is always 256. Split the size parameter into key_len and val_len, and use the appropriate length depending on what is being copied. Detected by compiler error, only compile-tested. Cc: "Igor M. Liplianin" <liplianin@netup.ru> Signed-off-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20200120074344.504-2-dja@axtens.net Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/202002251042.D898E67AC@keescook Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03binder: prevent UAF for binderfs devicesChristian Brauner
On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is called which punts the actual cleanup operation to a workqueue. At some point, binder_deferred_func() will be called which will end up calling binder_deferred_release() which will retrieve and cleanup the binder_context attach to this struct binder_proc. If we trace back where this binder_context is attached to binder_proc we see that it is set in binder_open() and is taken from the struct binder_device it is associated with. This obviously assumes that the struct binder_device that context is attached to is _never_ freed. While that might be true for devtmpfs binder devices it is most certainly wrong for binderfs binder devices. So, assume binder_open() is called on a binderfs binder devices. We now stash away the struct binder_context associated with that struct binder_devices: proc->context = &binder_dev->context; /* binderfs stashes devices in i_private */ if (is_binderfs_device(nodp)) { binder_dev = nodp->i_private; info = nodp->i_sb->s_fs_info; binder_binderfs_dir_entry_proc = info->proc_log_dir; } else { . . . proc->context = &binder_dev->context; Now let's assume that the binderfs instance for that binder devices is shutdown via umount() and/or the mount namespace associated with it goes away. As long as there is still an fd open for that binderfs binder device things are fine. But let's assume we now close the last fd for that binderfs binder device. Now binder_release() is called and punts to the workqueue. Assume that the workqueue has quite a bit of stuff to do and doesn't get to cleaning up the struct binder_proc and the associated struct binder_context with it for that binderfs binder device right away. In the meantime, the VFS is killing the super block and is ultimately calling sb->evict_inode() which means it will call binderfs_evict_inode() which does: static void binderfs_evict_inode(struct inode *inode) { struct binder_device *device = inode->i_private; struct binderfs_info *info = BINDERFS_I(inode); clear_inode(inode); if (!S_ISCHR(inode->i_mode) || !device) return; mutex_lock(&binderfs_minors_mutex); --info->device_count; ida_free(&binderfs_minors, device->miscdev.minor); mutex_unlock(&binderfs_minors_mutex); kfree(device->context.name); kfree(device); } thereby freeing the struct binder_device including struct binder_context. Now the workqueue finally has time to get around to cleaning up struct binder_proc and is now trying to access the associate struct binder_context. Since it's already freed it will OOPs. Fix this by holding an additional reference to the inode that is only released once the workqueue is done cleaning up struct binder_proc. This is an easy alternative to introducing separate refcounting on struct binder_device which we can always do later if it becomes necessary. This is an alternative fix to 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()"). Fixes: 3ad20fe393b3 ("binder: implement binderfs") Fixes: 03e2e07e3814 ("binder: Make transaction_log available in binderfs") Related : 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03drm/i915/gvt: Fix unnecessary schedule timer when no vGPU exitsZhenyu Wang
From commit f25a49ab8ab9 ("drm/i915/gvt: Use vgpu_lock to protect per vgpu access") the vgpu idr destroy is moved later than vgpu resource destroy, then it would fail to stop timer for schedule policy clean which to check vgpu idr for any left vGPU. So this trys to destroy vgpu idr earlier. Cc: Colin Xu <colin.xu@intel.com> Fixes: f25a49ab8ab9 ("drm/i915/gvt: Use vgpu_lock to protect per vgpu access") Acked-by: Colin Xu <colin.xu@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200229055445.31481-1-zhenyuw@linux.intel.com
2020-03-02io_uring: clean up io_closePavel Begunkov
Don't abuse labels for plain and straightworward code. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02Revert "bcache: ignore pending signals when creating gc and allocator thread"Jens Axboe
This reverts commit 0b96da639a4874311e9b5156405f69ef9fc3bef8. We can't just go flushing random signals, under the assumption that the OOM killer will just do something else. It's not safe from the OOM perspective, and it could also cause other signals to get randomly lost. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: Ensure mask is initialized in io_arm_poll_handlerNathan Chancellor
Clang warns: fs/io_uring.c:4178:6: warning: variable 'mask' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (def->pollin) ^~~~~~~~~~~ fs/io_uring.c:4182:2: note: uninitialized use occurs here mask |= POLLERR | POLLPRI; ^~~~ fs/io_uring.c:4178:2: note: remove the 'if' if its condition is always true if (def->pollin) ^~~~~~~~~~~~~~~~ fs/io_uring.c:4154:15: note: initialize the variable 'mask' to silence this warning __poll_t mask, ret; ^ = 0 1 warning generated. io_op_defs has many definitions where pollin is not set so mask indeed might be uninitialized. Initialize it to zero and change the next assignment to |=, in case further masks are added in the future to avoid missing changing the assignment then. Fixes: d7718a9d25a6 ("io_uring: use poll driven retry for files that support it") Link: https://github.com/ClangBuiltLinux/linux/issues/916 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02dt-bindings: arm: fsl: fix APF6Dev compatibleSébastien Szymanski
APF6 Dev compatible is armadeus,imx6dl-apf6dev and not armadeus,imx6dl-apf6dldev. Fixes: 3d735471d066 ("dt-bindings: arm: Document Armadeus SoM and Dev boards devicetree binding") Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Signed-off-by: Rob Herring <robh@kernel.org>
2020-03-02Merge branch 'mauro' into docs-nextJonathan Corbet
Mauro sez: There are lots of plain text documents under Documentation/filesystems. Manually convert several of those to ReST and add them to the index file. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-03-02io_uring: remove io_prep_next_work()Pavel Begunkov
io-wq cares about IO_WQ_WORK_UNBOUND flag only while enqueueing, so it's useless setting it for a next req of a link. Thus, removed it from io_prep_linked_timeout(), and inline the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: remove extra nxt check after puntPavel Begunkov
After __io_queue_sqe() ended up in io_queue_async_work(), it's already known that there is no @nxt req, so skip the check and return from the function. Also, @nxt initialisation now can be done just before io_put_req_find_next(), as there is no jumping until it's checked. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: use poll driven retry for files that support itJens Axboe
Currently io_uring tries any request in a non-blocking manner, if it can, and then retries from a worker thread if we get -EAGAIN. Now that we have a new and fancy poll based retry backend, use that to retry requests if the file supports it. This means that, for example, an IORING_OP_RECVMSG on a socket no longer requires an async thread to complete the IO. If we get -EAGAIN reading from the socket in a non-blocking manner, we arm a poll handler for notification on when the socket becomes readable. When it does, the pending read is executed directly by the task again, through the io_uring task work handlers. Not only is this faster and more efficient, it also means we're not generating potentially tons of async threads that just sit and block, waiting for the IO to complete. The feature is marked with IORING_FEAT_FAST_POLL, meaning that async pollable IO is fast, and that poll<link>other_op is fast as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: mark requests that we can do poll async in io_op_defsJens Axboe
Add a pollin/pollout field to the request table, and have commands that we can safely poll for properly marked. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: add per-task callback handlerJens Axboe
For poll requests, it's not uncommon to link a read (or write) after the poll to execute immediately after the file is marked as ready. Since the poll completion is called inside the waitqueue wake up handler, we have to punt that linked request to async context. This slows down the processing, and actually means it's faster to not use a link for this use case. We also run into problems if the completion_lock is contended, as we're doing a different lock ordering than the issue side is. Hence we have to do trylock for completion, and if that fails, go async. Poll removal needs to go async as well, for the same reason. eventfd notification needs special case as well, to avoid stack blowing recursion or deadlocks. These are all deficiencies that were inherited from the aio poll implementation, but I think we can do better. When a poll completes, simply queue it up in the task poll list. When the task completes the list, we can run dependent links inline as well. This means we never have to go async, and we can remove a bunch of code associated with that, and optimizations to try and make that run faster. The diffstat speaks for itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: store io_kiocb in wait->privateJens Axboe
Store the io_kiocb in the private field instead of the poll entry, this is in preparation for allowing multiple waitqueues. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02task_work_run: don't take ->pi_lock unconditionallyOleg Nesterov
As Peter pointed out, task_work() can avoid ->pi_lock and cmpxchg() if task->task_works == NULL && !PF_EXITING. And in fact the only reason why task_work_run() needs ->pi_lock is the possible race with task_work_cancel(), we can optimize this code and make the locking more clear. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io-wq: use BIT for ulong hashPavel Begunkov
@hash_map is unsigned long, but BIT_ULL() is used for manipulations. BIT() is a better match as it returns exactly unsigned long value. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io_uring: remove IO_WQ_WORK_CBPavel Begunkov
IO_WQ_WORK_CB is used only for linked timeouts, which will be armed before the work setup (i.e. mm, override creds, etc). The setup shouldn't take long, so it's ok to arm it a bit later and get rid of IO_WQ_WORK_CB. Make io-wq call work->func() only once, callbacks will handle the rest. i.e. the linked timeout handler will do the actual issue. And as a bonus, it removes an extra indirect call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02io-wq: remove unused IO_WQ_WORK_HAS_MMPavel Begunkov
IO_WQ_WORK_HAS_MM is set but never used, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02docs: filesystems: convert zonefs.txt to ReSTMauro Carvalho Chehab
- Add a SPDX header; - Add a document title; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/42a7cfcd19f6b904a9a3188fd4af71bed5050052.1581955849.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-03-02docs: filesystems: convert udf.txt to ReSTMauro Carvalho Chehab
- Add a SPDX header; - Add a document title; - Add table markups; - Add lists markups; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/2887f8a3a813a31170389eab687e9f199327dc7d.1581955849.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-03-02io_uring: extract kmsg copy helperPavel Begunkov
io_recvmsg() and io_sendmsg() duplicate nonblock -EAGAIN finilising part, so add helper for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02docs: filesystems: convert ubifs.txt to ReSTMauro Carvalho Chehab
- Add a SPDX header; - Add a document title; - Adjust section titles; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add table markups; - Add lists markups; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/9043dc2965cafc64e6a521e2317c00ecc8303bf6.1581955849.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>