summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-09-03ext4: fix kernel oops caused by spurious casefold flagTheodore Ts'o
If an directory has the a casefold flag set without the casefold feature set, s_encoding will not be initialized, and this will cause the kernel to dereference a NULL pointer. In addition to adding checks to avoid these kernel oops, attempts to load inodes with the casefold flag when the casefold feature is not enable will cause the file system to be declared corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-28ext4: fix integer overflow when calculating commit intervalzhangyi (F)
If user specify a large enough value of "commit=" option, it may trigger signed integer overflow which may lead to sbi->s_commit_interval becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in ../fs/ext4/super.c:1592:31 signed integer overflow: 536870912 * 1000 cannot be represented in type 'int' [...] Call trace: [...] [<ffffff9008a2d120>] ubsan_epilogue+0x34/0x9c lib/ubsan.c:166 [<ffffff9008a2d8b8>] handle_overflow+0x228/0x280 lib/ubsan.c:197 [<ffffff9008a2d95c>] __ubsan_handle_mul_overflow+0x4c/0x68 lib/ubsan.c:218 [<ffffff90086d070c>] handle_mount_opt fs/ext4/super.c:1592 [inline] [<ffffff90086d070c>] parse_options+0x1724/0x1a40 fs/ext4/super.c:1773 [<ffffff90086d51c4>] ext4_remount+0x2ec/0x14a0 fs/ext4/super.c:4834 [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-08-28ext4: use percpu_counters for extent_status cache hits/missesYang Guo
@es_stats_cache_hits and @es_stats_cache_misses are accessed frequently in ext4_es_lookup_extent function, it would influence the ext4 read/write performance in NUMA system. Let's optimize it using percpu_counter, it is profitable for the performance. The test command is as below: fio -name=randwrite -numjobs=8 -filename=/mnt/test1 -rw=randwrite -ioengine=libaio -direct=1 -iodepth=64 -sync=0 -norandommap -group_reporting -runtime=120 -time_based -bs=4k -size=5G And the result is better 10% than the initial implement: without the patch,IOPS=197k, BW=770MiB/s (808MB/s)(90.3GiB/120002msec) with the patch, IOPS=218k, BW=852MiB/s (894MB/s)(99.9GiB/120002msec) Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Yang Guo <guoyang2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
2019-08-28ext4: fix potential use after free after remounting with noblock_validityzhangyi (F)
Remount process will release system zone which was allocated before if "noblock_validity" is specified. If we mount an ext4 file system to two mountpoints with default mount options, and then remount one of them with "noblock_validity", it may trigger a use after free problem when someone accessing the other one. # mount /dev/sda foo # mount /dev/sda bar User access mountpoint "foo" | Remount mountpoint "bar" | ext4_map_blocks() | ext4_remount() check_block_validity() | ext4_setup_system_zone() ext4_data_block_valid() | ext4_release_system_zone() | free system_blks rb nodes access system_blks rb nodes | trigger use after free | This problem can also be reproduced by one mountpint, At the same time, add_system_zone() can get called during remount as well so there can be racing ext4_data_block_valid() reading the rbtree at the same time. This patch add RCU to protect system zone from releasing or building when doing a remount which inverse current "noblock_validity" mount option. It assign the rbtree after the whole tree was complete and do actual freeing after rcu grace period, avoid any intermediate state. Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-08-24jbd2: add missing tracepoint for reserved handleXiaoguang Wang
This issue was found when I use ebpf to trace every jbd2 handle's running info in dioread_nolock case. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-23ext4: fix punch hole for inline_data file systemsTheodore Ts'o
If a program attempts to punch a hole on an inline data file, we need to convert it to a normal file first. This was detected using ext4/032 using the adv configuration. Simple reproducer: mke2fs -Fq -t ext4 -O inline_data /dev/vdc mount /vdc echo "" > /vdc/testfile xfs_io -c 'truncate 33554432' /vdc/testfile xfs_io -c 'fpunch 0 1048576' /vdc/testfile umount /vdc e2fsck -fy /dev/vdc Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-22ext4: rework reserved cluster accounting when invalidating pagesEric Whitney
The goal of this patch is to remove two references to the buffer delay bit in ext4_da_page_release_reservation() as part of a larger effort to remove all such references from ext4. These two references are principally used to reduce the reserved block/cluster count when pages are invalidated as a result of truncating, punching holes, or collapsing a block range in a file. The entire function is removed and replaced with code in ext4_es_remove_extent() that reduces the reserved count as a side effect of removing a block range from delayed and not unwritten extents in the extent status tree as is done when truncating, punching holes, or collapsing ranges. The code is written to minimize the number of searches descending from rb tree roots for scalability. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-22ext4: treat buffers with write errors as containing valid dataZhangXiaoxu
I got some errors when I repair an ext4 volume which stacked by an iscsi target: Entry 'test60' in / (2) has deleted/unused inode 73750. Clear? It can be reproduced when the network not good enough. When I debug this I found ext4 will read entry buffer from disk and the buffer is marked with write_io_error. If the buffer is marked with write_io_error, it means it already wroten to journal, and not checked out to disk. IOW, the journal is newer than the data in disk. If this journal record 'delete test60', it means the 'test60' still on the disk metadata. In this case, if we read the buffer from disk successfully and create file continue, the new journal record will overwrite the journal which record 'delete test60', then the entry corruptioned. So, use the buffer rather than read from disk if the buffer is marked with write_io_error. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-22ext4: fix warning inside ext4_convert_unwritten_extents_endioRakesh Pandit
Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing first argument. This was introduced in commit ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") and splitting extents inside endio would trigger it. Fixes: ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2019-08-12ext4: set error return correctly when ext4_htree_store_dirent failsColin Ian King
Currently when the call to ext4_htree_store_dirent fails the error return variable 'ret' is is not being set to the error code and variable count is instead, hence the error code is not being returned. Fix this by assigning ret to the error return code. Addresses-Coverity: ("Unused value") Fixes: 8af0f0822797 ("ext4: fix readdir error in the case of inline_data+dir_index") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-12ext4: drop legacy pre-1970 encoding workaroundTheodore Ts'o
Originally, support for expanded timestamps had a bug in that pre-1970 times were erroneously encoded as being in the the 24th century. This was fixed in commit a4dad1ae24f8 ("ext4: Fix handling of extended tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps were correctly encoded, but for backwards compatibility those incorrectly encoded timestamps were mapped back to the pre-1970 dates. Given that backwards compatibility workaround has been around for 4 years, and given that running e2fsck from e2fsprogs 1.43.2 and later will offer to fix these timestamps (which has been released for 3 years), it's past time to drop the legacy workaround from the kernel. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11ext4: add new ioctl EXT4_IOC_GET_ES_CACHETheodore Ts'o
For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fiemap-style interface to return this information. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11ext4: add a new ioctl EXT4_IOC_GETSTATETheodore Ts'o
The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of an ext4 inode for debugging purposes. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHETheodore Ts'o
The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent status cache to be cleared out. This is intended for use for debugging. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11jbd2: flush_descriptor(): Do not decrease buffer head's ref countChandan Rajendra
When executing generic/388 on a ppc64le machine, we notice the following call trace, VFS: brelse: Trying to free free buffer WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0 Call Trace: __brelse+0x80/0xc0 (unreliable) invalidate_bh_lru+0x78/0xc0 on_each_cpu_mask+0xa8/0x130 on_each_cpu_cond_mask+0x130/0x170 invalidate_bh_lrus+0x44/0x60 invalidate_bdev+0x38/0x70 ext4_put_super+0x294/0x560 generic_shutdown_super+0xb0/0x170 kill_block_super+0x38/0xb0 deactivate_locked_super+0xa4/0xf0 cleanup_mnt+0x164/0x1d0 task_work_run+0x110/0x160 do_notify_resume+0x414/0x460 ret_from_except_lite+0x70/0x74 The warning happens because flush_descriptor() drops bh reference it does not own. The bh reference acquired by jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and gets released when this list is processed. The reference for doing IO is only acquired in write_dirty_buffer() later in flush_descriptor(). Reported-by: Harish Sriram <harish@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11ext4: remove unnecessary error checkShi Siyuan
Remove unnecessary error check in ext4_file_write_iter(), because this check will be done in upcoming later function -- ext4_write_checks() -> generic_write_checks() Change-Id: I7b0ab27f693a50765c15b5eaa3f4e7c38f42e01e Signed-off-by: shisiyuan <shisiyuan@xiaomi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11ext4: fix warning when turn on dioread_nolock and inline_datayangerkun
mkfs.ext4 -O inline_data /dev/vdb mount -o dioread_nolock /dev/vdb /mnt echo "some inline data..." >> /mnt/test-file echo "some inline data..." >> /mnt/test-file sync The above script will trigger "WARN_ON(!io_end->handle && sbi->s_journal)" because ext4_should_dioread_nolock() returns false for a file with inline data. Move the check to a place after we have already removed the inline data and prepared inode to write normal pages. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-08-11Merge tag 'dax-fixes-5.3-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A filesystem-dax and device-dax fix for v5.3. The filesystem-dax fix is tagged for stable as the implementation has been mistakenly throwing away all cow pages on any truncate or hole punch operation as part of the solution to coordinate device-dma vs truncate to dax pages. The device-dax change fixes up a regression this cycle from the introduction of a common 'internal per-cpu-ref' implementation. Summary: - Fix dax_layout_busy_page() to not discard private cow pages of fs/dax private mappings. - Update the memremap_pages core to properly cleanup on behalf of internal reference-count users like device-dax" * tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: mm/memremap: Fix reuse of pgmap instances with internal references dax: dax_layout_busy_page() should not unmap cow pages
2019-08-10Merge tag 'gfs2-v5.3-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix incorrect lseek / fiemap results" * tag 'gfs2-v5.3-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: gfs2_walk_metadata fix
2019-08-09Merge tag 'for-linus-20190809' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Revert of a bcache patch that caused an oops for some (Coly) - ata rb532 unused warning fix (Gustavo) - AoE kernel crash fix (He) - Error handling fixup for blkdev_get() (Jan) - libata read/write translation and SFF PIO fix (me) - Use after free and error handling fix for O_DIRECT fragments. There's still a nowait + sync oddity in there, we'll nail that start next week. If all else fails, I'll queue a revert of the NOWAIT change. (me) - Loop GFP_KERNEL -> GFP_NOIO deadlock fix (Mikulas) - Two BFQ regression fixes that caused crashes (Paolo) * tag 'for-linus-20190809' of git://git.kernel.dk/linux-block: bcache: Revert "bcache: use sysfs_match_string() instead of __sysfs_match_string()" loop: set PF_MEMALLOC_NOIO for the worker thread bdev: Fixup error handling in blkdev_get() block, bfq: handle NULL return value by bfq_init_rq() block, bfq: move update of waker and woken list to queue freeing block, bfq: reset last_completed_rq_bfqq if the pointed queue is freed block: aoe: Fix kernel crash due to atomic sleep when exiting libata: add SG safety checks in SFF pio transfers libata: have ata_scsi_rw_xlat() fail invalid passthrough requests block: fix O_DIRECT error handling for bio fragments ata: rb532_cf: Fix unused variable warning in rb532_pata_driver_probe
2019-08-09gfs2: gfs2_walk_metadata fixAndreas Gruenbacher
It turns out that the current version of gfs2_metadata_walker suffers from multiple problems that can cause gfs2_hole_size to report an incorrect size. This will confuse fiemap as well as lseek with the SEEK_DATA flag. Fix that by changing gfs2_hole_walker to compute the metapath to the first data block after the hole (if any), and compute the hole size based on that. Fixes xfstest generic/490. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # v4.18+
2019-08-08Merge tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Highlights include: Stable fixes: - NFSv4: Ensure we check the return value of update_open_stateid() so we correctly track active open state. - NFSv4: Fix for delegation state recovery to ensure we recover all open modes that are active. - NFSv4: Fix an Oops in nfs4_do_setattr Fixes: - NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts - NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim() - NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid - pNFS: Report errors from the call to nfs4_select_rw_stateid() - NFSv4: Various other delegation and open stateid recovery fixes - NFSv4: Fix state recovery behaviour when server connection times out" * tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Ensure state recovery handles ETIMEDOUT correctly NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts NFSv4: Fix an Oops in nfs4_do_setattr NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim() NFSv4: Check the return value of update_open_stateid() NFSv4.1: Only reap expired delegations NFSv4.1: Fix open stateid recovery NFSv4: Report the error from nfs4_select_rw_stateid() NFSv4: When recovering state fails with EAGAIN, retry the same recovery NFSv4: Print an error in the syslog when state is marked as irrecoverable NFSv4: Fix delegation state recovery NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
2019-08-08Merge tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Six small SMB3 fixes, two for stable" * tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL smb3: update TODO list of missing features smb3: send CAP_DFS capability during session setup SMB3: Fix potential memory leak when processing compound chain SMB3: Fix deadlock in validate negotiate hits reconnect cifs: fix rmmod regression in cifs.ko caused by force_sig changes
2019-08-08bdev: Fixup error handling in blkdev_get()Jan Kara
Commit 89e524c04fa9 ("loop: Fix mount(2) failure due to race with LOOP_SET_FD") converted blkdev_get() to use the new helpers for finishing claiming of a block device. However the conversion botched the error handling in blkdev_get() and thus the bdev has been marked as held even in case __blkdev_get() returned error. This led to occasional warnings with block/001 test from blktests like: kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0 Correct the error handling. CC: stable@vger.kernel.org Fixes: 89e524c04fa9 ("loop: Fix mount(2) failure due to race with LOOP_SET_FD") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-07block: fix O_DIRECT error handling for bio fragmentsJens Axboe
0eb6ddfb865c tried to fix this up, but introduced a use-after-free of dio. Additionally, we still had an issue with error handling, as reported by Darrick: "I noticed a regression in xfs/747 (an unreleased xfstest for the xfs_scrub media scanning feature) on 5.3-rc3. I'll condense that down to a simpler reproducer: error-test: 0 209 linear 8:48 0 error-test: 209 1 error error-test: 210 6446894 linear 8:48 210 Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO for sector 209 and to pass the io to the scsi device everywhere else. On 5.3-rc3, performing a directio pread of this range with a < 1M buffer (in other words, a request for fewer than MAX_BIO_PAGES bytes) yields EIO like you'd expect: pread64(3, 0x7f880e1c7000, 1048576, 0) = -1 EIO (Input/output error) pread: Input/output error +++ exited with 0 +++ But doing it with a larger buffer succeeds(!): pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880 read 1146880/1146880 bytes at offset 0 1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec) +++ exited with 0 +++ (Note that the part of the buffer corresponding to the dm-error area is uninitialized) On 5.3-rc2, both commands would fail with EIO like you'd expect. The only change between rc2 and rc3 is commit 0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments"). AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB) and a second one for the 96k after that." Fix this by noting that it's always safe to dereference dio if we get BLK_QC_T_EAGAIN returned, as end_io hasn't been run for that case. So we can safely increment the dio size before calling submit_bio(), and then decrement it on failure (not that it really matters, as the bio and dio are going away). For error handling, return to the original method of just using 'ret' for tracking the error, and the size tracking in dio->size. Fixes: 0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments") Fixes: 6a43074e2f46 ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO") Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-07NFSv4: Ensure state recovery handles ETIMEDOUT correctlyTrond Myklebust
Ensure that the state recovery code handles ETIMEDOUT correctly, and also that we set RPC_TASK_TIMEOUT when recovering open state. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Yeah I should have sent a pull request last week, so there is a lot more here than usual: 1) Fix memory leak in ebtables compat code, from Wenwen Wang. 2) Several kTLS bug fixes from Jakub Kicinski (circular close on disconnect etc.) 3) Force slave speed check on link state recovery in bonding 802.3ad mode, from Thomas Falcon. 4) Clear RX descriptor bits before assigning buffers to them in stmmac, from Jose Abreu. 5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF loops, from Nishka Dasgupta. 6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean. 7) Need to hold sock across skb->destructor invocation, from Cong Wang. 8) IP header length needs to be validated in ipip tunnel xmit, from Haishuang Yan. 9) Use after free in ip6 tunnel driver, also from Haishuang Yan. 10) Do not use MSI interrupts on r8169 chips before RTL8168d, from Heiner Kallweit. 11) Upon bridge device init failure, we need to delete the local fdb. From Nikolay Aleksandrov. 12) Handle erros from of_get_mac_address() properly in stmmac, from Martin Blumenstingl. 13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef Kadlecsik. 14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with some devices, so revert. From Johannes Berg. 15) Fix deadlock in rxrpc, from David Howells. 16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue Haibing. 17) Fix mvpp2 crash on module removal, from Matteo Croce. 18) Fix race in genphy_update_link, from Heiner Kallweit. 19) bpf_xdp_adjust_head() stopped working with generic XDP when we fixes generic XDP to support stacked devices properly, fix from Jesper Dangaard Brouer. 20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from David Ahern. 21) Several memory leaks in new sja1105 driver, from Vladimir Oltean" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits) net: dsa: sja1105: Fix memory leak on meta state machine error path net: dsa: sja1105: Fix memory leak on meta state machine normal path net: dsa: sja1105: Really fix panic on unregistering PTP clock net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well net: dsa: sja1105: Fix broken learning with vlan_filtering disabled net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus() net: sched: sample: allow accessing psample_group with rtnl net: sched: police: allow accessing police->params with rtnl net: hisilicon: Fix dma_map_single failed on arm64 net: hisilicon: fix hip04-xmit never return TX_BUSY net: hisilicon: make hip04_tx_reclaim non-reentrant tc-testing: updated vlan action tests with batch create/delete net sched: update vlan action for batched events operations net: stmmac: tc: Do not return a fragment entry net: stmmac: Fix issues when number of Queues >= 4 net: stmmac: xgmac: Fix XGMAC selftests be2net: disable bh with spin_lock in be_process_mcc net: cxgb3_main: Fix a resource leak in a error path in 'init_one()' net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER ...
2019-08-05SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUALSebastien Tisserant
Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL Signed-off-by: Sebastien Tisserant <stisserant@wallix.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-05smb3: send CAP_DFS capability during session setupSteve French
We had a report of a server which did not do a DFS referral because the session setup Capabilities field was set to 0 (unlike negotiate protocol where we set CAP_DFS). Better to send it session setup in the capabilities as well (this also more closely matches Windows client behavior). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-08-05SMB3: Fix potential memory leak when processing compound chainPavel Shilovsky
When a reconnect happens in the middle of processing a compound chain the code leaks a buffer from the memory pool. Fix this by properly checking for a return code and freeing buffers in case of error. Also maintain a buf variable to be equal to either smallbuf or bigbuf depending on a response buffer size while parsing a chain and when returning to the caller. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-05SMB3: Fix deadlock in validate negotiate hits reconnectPavel Shilovsky
Currently we skip SMB2_TREE_CONNECT command when checking during reconnect because Tree Connect happens when establishing an SMB session. For SMB 3.0 protocol version the code also calls validate negotiate which results in SMB2_IOCL command being sent over the wire. This may deadlock on trying to acquire a mutex when checking for reconnect. Fix this by skipping SMB2_IOCL command when doing the reconnect check. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2019-08-05dax: dax_layout_busy_page() should not unmap cow pagesVivek Goyal
Vivek: "As of now dax_layout_busy_page() calls unmap_mapping_range() with last argument as 1, which says even unmap cow pages. I am wondering who needs to get rid of cow pages as well. I noticed one interesting side affect of this. I mount xfs with -o dax and mmaped a file with MAP_PRIVATE and wrote some data to a page which created cow page. Then I called fallocate() on that file to zero a page of file. fallocate() called dax_layout_busy_page() which unmapped cow pages as well and then I tried to read back the data I wrote and what I get is old data from persistent memory. I lost the data I had written. This read basically resulted in new fault and read back the data from persistent memory. This sounds wrong. Are there any users which need to unmap cow pages as well? If not, I am proposing changing it to not unmap cow pages. I noticed this while while writing virtio_fs code where when I tried to reclaim a memory range and that corrupted the executable and I was running from virtio-fs and program got segment violation." Dan: "In fact the unmap_mapping_range() in this path is only to synchronize against get_user_pages_fast() and force it to call back into the filesystem to re-establish the mapping. COW pages should be left untouched by dax_layout_busy_page()." Cc: <stable@vger.kernel.org> Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax mappings") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-08-04cifs: fix rmmod regression in cifs.ko caused by force_sig changesSteve French
Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") The global change from force_sig caused module unloading of cifs.ko to fail (since the cifsd process could not be killed, "rmmod cifs" now would always fail) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com>
2019-08-04NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mountsTrond Myklebust
People are reporing seeing fscache errors being reported concerning duplicate cookies even in cases where they are not setting up fscache at all. The rule needs to be that if fscache is not enabled, then it should have no side effects at all. To ensure this is the case, we disable fscache completely on all superblocks for which the 'fsc' mount option was not set. In order to avoid issues with '-oremount', we also disable the ability to turn fscache on via remount. Fixes: f1fe29b4a02d ("NFS: Use i_writecount to control whether...") Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145 Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Steve Dickson <steved@redhat.com> Cc: David Howells <dhowells@redhat.com>
2019-08-04NFSv4: Fix an Oops in nfs4_do_setattrTrond Myklebust
If the user specifies an open mode of 3, then we don't have a NFSv4 state attached to the context, and so we Oops when we try to dereference it. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 29b59f9416937 ("NFSv4: change nfs4_do_setattr to take...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.10: 991eedb1371dc: NFSv4: Only pass the... Cc: stable@vger.kernel.org # v4.10+
2019-08-04NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()Trond Myklebust
John Hubbard reports seeing the following stack trace: nfs4_do_reclaim rcu_read_lock /* we are now in_atomic() and must not sleep */ nfs4_purge_state_owners nfs4_free_state_owner nfs4_destroy_seqid_counter rpc_destroy_wait_queue cancel_delayed_work_sync __cancel_work_timer __flush_work start_flush_work might_sleep: (kernel/workqueue.c:2975: BUG) The solution is to separate out the freeing of the state owners from nfs4_purge_state_owners(), and perform that outside the atomic context. Reported-by: John Hubbard <jhubbard@nvidia.com> Fixes: 0aaaf5c424c7f ("NFS: Cache state owners after files are closed") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Check the return value of update_open_stateid()Trond Myklebust
Ensure that we always check the return value of update_open_stateid() so that we can retry if the update of local state failed. This fixes infinite looping on state recovery. Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v3.7+
2019-08-04NFSv4.1: Only reap expired delegationsTrond Myklebust
Fix nfs_reap_expired_delegations() to ensure that we only reap delegations that are actually expired, rather than triggering on random errors. Fixes: 45870d6909d5a ("NFSv4.1: Test delegation stateids when server...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4.1: Fix open stateid recoveryTrond Myklebust
The logic for checking in nfs41_check_open_stateid() whether the state is supported by a delegation is inverted. In addition, it makes more sense to perform that check before we check for expired locks. Fixes: 8a64c4ef106d1 ("NFSv4.1: Even if the stateid is OK,...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Report the error from nfs4_select_rw_stateid()Trond Myklebust
In pnfs_update_layout() ensure that we do report any fatal errors from nfs4_select_rw_stateid(). Fixes: d9aba2b40de6 ("NFSv4: Don't use the zero stateid with layoutget") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: When recovering state fails with EAGAIN, retry the same recoveryTrond Myklebust
If the server returns with EAGAIN when we're trying to recover from a server reboot, we currently delay for 1 second, but then mark the stateid as needing recovery after the grace period has expired. Instead, we should just retry the same recovery process immediately after the 1 second delay. Break out of the loop after 10 retries. Fixes: 35a61606a612 ("NFS: Reduce indentation of the switch statement...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Print an error in the syslog when state is marked as irrecoverableTrond Myklebust
When error recovery fails due to a fatal error on the server, ensure we log it in the syslog. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Fix delegation state recoveryTrond Myklebust
Once we clear the NFS_DELEGATED_STATE flag, we're telling nfs_delegation_claim_opens() that we're done recovering all open state for that stateid, so we really need to ensure that we test for all open modes that are currently cached and recover them before exiting nfs4_open_delegation_recall(). Fixes: 24311f884189d ("NFSv4: Recovery of recalled read delegations...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.3+
2019-08-04NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateidTrond Myklebust
It is unsafe to dereference delegation outside the rcu lock, and in any case, the refcount is guaranteed held if cred is non-zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-03Merge tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: - Avoid leaking kernel stack contents to userspace - Fix a potential null pointer dereference in the dabtree scrub code * tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix possible null-pointer dereferences in xchk_da_btree_block_check_sibling() xfs: fix stack contents leakage in the v1 inumber ioctls
2019-08-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/acpi/scan.c: document why we don't need the device_hotplug_lock memremap: move from kernel/ to mm/ lib/test_meminit.c: use GFP_ATOMIC in RCU critical section asm-generic: fix -Wtype-limits compiler warnings cgroup: kselftest: relax fs_spec checks mm/memory_hotplug.c: remove unneeded return for void function mm/migrate.c: initialize pud_entry in migrate_vma() coredump: split pipe command whitespace before expanding template page flags: prioritize kasan bits over last-cpuid ubsan: build ubsan.c more conservatively kasan: remove clang version check for KASAN_STACK mm: compaction: avoid 100% CPU usage during compaction when a task is killed mm: migrate: fix reference check race between __find_get_block() and migration mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker ocfs2: remove set but not used variable 'last_hash' Revert "kmemleak: allow to coexist with fault injection" kernel/signal.c: fix a kernel-doc markup
2019-08-03coredump: split pipe command whitespace before expanding templatePaul Wise
Save the offsets of the start of each argument to avoid having to update pointers to each argument after every corename krealloc and to avoid having to duplicate the memory for the dump command. Executable names containing spaces were previously being expanded from %e or %E and then split in the middle of the filename. This is incorrect behaviour since an argument list can represent arguments with spaces. The splitting could lead to extra arguments being passed to the core dump handler that it might have interpreted as options or ignored completely. Core dump handlers that are not aware of this Linux kernel issue will be using %e or %E without considering that it may be split and so they will be vulnerable to processes with spaces in their names breaking their argument list. If their internals are otherwise well written, such as if they are written in shell but quote arguments, they will work better after this change than before. If they are not well written, then there is a slight chance of breakage depending on the details of the code but they will already be fairly broken by the split filenames. Core dump handlers that are aware of this Linux kernel issue will be placing %e or %E as the last item in their core_pattern and then aggregating all of the remaining arguments into one, separated by spaces. Alternatively they will be obtaining the filename via other methods. Both of these will be compatible with the new arrangement. A side effect from this change is that unknown template types (for example %z) result in an empty argument to the dump handler instead of the argument being dropped. This is a desired change as: It is easier for dump handlers to process empty arguments than dropped ones, especially if they are written in shell or don't pass each template item with a preceding command-line option in order to differentiate between individual template types. Most core_patterns in the wild do not use options so they can confuse different template types (especially numeric ones) if an earlier one gets dropped in old kernels. If the kernel introduces a new template type and a core_pattern uses it, the core dump handler might not expect that the argument can be dropped in old kernels. For example, this can result in security issues when %d is dropped in old kernels. This happened with the corekeeper package in Debian and resulted in the interface between corekeeper and Linux having to be rewritten to use command-line options to differentiate between template types. The core_pattern for most core dump handlers is written by the handler author who would generally not insert unknown template types so this change should be compatible with all the core dump handlers that exist. Link: http://lkml.kernel.org/r/20190528051142.24939-1-pabs3@bonedaddy.net Fixes: 74aadce98605 ("core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe") Signed-off-by: Paul Wise <pabs3@bonedaddy.net> Reported-by: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398] Reported-by: Paul Wise <pabs3@bonedaddy.net> [https://lore.kernel.org/linux-fsdevel/c8b7ecb8508895bf4adb62a748e2ea2c71854597.camel@bonedaddy.net/] Suggested-by: Jakub Wilk <jwilk@jwilk.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-03ocfs2: remove set but not used variable 'last_hash'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/ocfs2/xattr.c: In function ocfs2_xattr_bucket_find: fs/ocfs2/xattr.c:3828:6: warning: variable last_hash set but not used [-Wunused-but-set-variable] It's never used and can be removed. Link: http://lkml.kernel.org/r/20190716132110.34836-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-02Merge tag 'for-linus-20190802' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Here's a small collection of fixes that should go into this series. This contains: - io_uring potential use-after-free fix (Jackie) - loop regression fix (Jan) - O_DIRECT fragmented bio regression fix (Damien) - Mark Denis as the new floppy maintainer (Denis) - ataflop switch fall-through annotation (Gustavo) - libata zpodd overflow fix (Kees) - libata ahci deferred probe fix (Miquel) - nbd invalidation BUG_ON() fix (Munehisa) - dasd endless loop fix (Stefan)" * tag 'for-linus-20190802' of git://git.kernel.dk/linux-block: s390/dasd: fix endless loop after read unit address configuration block: Fix __blkdev_direct_IO() for bio fragments MAINTAINERS: floppy: take over maintainership nbd: replace kill_bdev() with __invalidate_device() again ata: libahci: do not complain in case of deferred probe io_uring: fix KASAN use after free in io_sq_wq_submit_work loop: Fix mount(2) failure due to race with LOOP_SET_FD libata: zpodd: Fix small read overflow in zpodd_get_mech_type() ataflop: Mark expected switch fall-through
2019-08-02Merge tag 'for-5.3-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - tiny race window during 2 transactions aborting at the same time can accidentally lead to a commit - regression fix, possible deadlock during fiemap - fix for an old bug when incremental send can fail on a file that has been deduplicated in a special way * tag 'for-5.3-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix deadlock between fiemap and transaction commits Btrfs: fix race leading to fs corruption after transaction abort Btrfs: fix incremental send failure after deduplication