summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-30tipc: fix unitilized skb list crashJon Maloy
Our test suite somtimes provokes the following crash: Description of problem: [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 1092.605072] PGD 0 P4D 0 [ 1092.607620] Oops: 0000 [#1] SMP PTI [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1 [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018 [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc] [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7 [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282 [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045 [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000 [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000 [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000 [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0 [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1092.727843] PKRU: 55555554 [ 1092.730556] Call Trace: [ 1092.733010] <IRQ> [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc] [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290 [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210 [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc] [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc] [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc] [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0 [ 1092.766432] ? tpacket_rcv+0x50/0x9f0 [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc] [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0 [ 1092.779164] ? enqueue_entity+0xf6/0x630 [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0 [ 1092.787272] ? __build_skb+0x25/0xd0 [ 1092.790849] netif_receive_skb_internal+0x42/0xf0 [ 1092.795557] napi_gro_receive+0xba/0xe0 [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core] [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core] [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core] [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0 [ 1092.818861] net_rx_action+0x149/0x3b0 [ 1092.822616] __do_softirq+0xe3/0x30a [ 1092.826193] irq_exit+0x100/0x110 [ 1092.829512] do_IRQ+0x85/0xd0 [ 1092.832483] common_interrupt+0xf/0xf [ 1092.836147] </IRQ> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0 [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000 [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940 [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78 [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000 [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0 [ 1092.909555] do_idle+0x236/0x280 [ 1092.912785] cpu_startup_entry+0x6f/0x80 [ 1092.916715] start_secondary+0x1a7/0x200 [ 1092.920642] secondary_startup_64+0xb7/0xc0 [...] The reason is that the skb list tipc_socket::mc_method.deferredq only is initialized for connectionless sockets, while nothing stops arriving multicast messages from being filtered by connection oriented sockets, with subsequent access to the said list. We fix this by initializing the list unconditionally at socket creation. This eliminates the crash, while the message still is dropped further down in tipc_sk_filter_rcv() as it should be. Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30Merge tag 'for-linus-20190730' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd fixes from Christian Brauner: "This makes setting the exit_state in exit_notify() consistent after fixing the pidfd polling race pre-rc1. Related to the race fix, this adds a WARN_ON() to do_notify_pidfd() to catch any future exit_state races. Last, this removes an obsolete comment from the pidfd tests" * tag 'for-linus-20190730' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: exit: make setting exit_state consistent pidfd: Add warning if exit_state is 0 during notification pidfd: remove obsolete comments from test
2019-07-30Merge tag 'f2fs-for-5.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This set of patches adjust to follow recent setflags changes and fix two regressions" * tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: use EINVAL for superblock with invalid magic f2fs: fix to read source block before invalidating it f2fs: remove redundant check from f2fs_setflags_common() f2fs: use generic checking function for FS_IOC_FSSETXATTR f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
2019-07-30Merge tag 'linux-kselftest-5.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Minor fixes to tests and one major fix to livepatch test to add skip handling to avoid false fail reports when livepatch is disabled" * tag 'linux-kselftest-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/livepatch: add test skip handling selftests: mlxsw: Fix typo in qos_mc_aware.sh selftests/x86: fix spelling mistake "FAILT" -> "FAIL" selftests: kmod: Fix typo in kmod.sh
2019-07-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A few regression and bug fixes for the patches merged in the last cycle: - hns fixes a subtle crash from the ib core SGL rework - hfi1 fixes various error handling, oops and protocol errors - bnxt_re fixes a regression where nvmeof doesn't work on some configurations - mlx5 fixes a serious 'use after free' bug in how MR caching is handled - some edge case crashers in the new statistic core code - more siw static checker fixups" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification IB/counters: Always initialize the port counter object IB/core: Fix querying total rdma stats IB/mlx5: Prevent concurrent MR updates during invalidation IB/mlx5: Fix clean_mr() to work in the expected order IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache IB/mlx5: Use direct mkey destroy command upon UMR unreg failure IB/mlx5: Fix unreg_umr to ignore the mkey state RDMA/siw: Remove set but not used variables 'rv' IB/mlx5: Replace kfree with kvfree RDMA/bnxt_re: Honor vlan_id in GID entry comparison IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn IB/hfi1: Field not zero-ed when allocating TID flow memory IB/hfi1: Unreserve a flushed OPFN request IB/hfi1: Check for error on call to alloc_rsm_map_table RDMA/hns: Fix sg offset non-zero issue RDMA/siw: Fix error return code in siw_init_module()
2019-07-30Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull HMM fixes from Jason Gunthorpe: "Fix the locking around nouveau's use of the hmm_range_* APIs. It works correctly in the success case, but many of the the edge cases have missing unlocks or double unlocks. The diffstat is a bit big as Christoph did a comprehensive job to move the obsolete API from the core header and into the driver before fixing its flow, but the risk of regression from this code motion is low" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: nouveau: unlock mmap_sem on all errors from nouveau_range_fault nouveau: remove the block parameter to nouveau_range_fault mm/hmm: move hmm_vma_range_done and hmm_vma_fault to nouveau mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
2019-07-30loop: Fix mount(2) failure due to race with LOOP_SET_FDJan Kara
Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") made LOOP_SET_FD ioctl acquire exclusive block device reference while it updates loop device binding. However this can make perfectly valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding temporarily the exclusive bdev reference in cases like this: for i in {a..z}{a..z}; do dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024 mkfs.ext2 $i.image mkdir mnt$i done echo "Run" for i in {a..z}{a..z}; do mount -o loop -t ext2 $i.image mnt$i & done Fix the problem by not getting full exclusive bdev reference in LOOP_SET_FD but instead just mark the bdev as being claimed while we update the binding information. This just blocks new exclusive openers instead of failing them with EBUSY thus fixing the problem. Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") Cc: stable@vger.kernel.org Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30dt-bindings: Fix generated example files getting added to schemasRob Herring
Commit 837158b847a4 ("dt-bindings: Check the examples against the schemas") started generating YAML encoded DT files to validate the examples against the schema. When running 'make dt_binding_check' in tree after the 1st time, the generated example .dt.yaml files are mistakenly added to the list of schema files. Exclude *.example.dt.yaml files from the search for schema files. Fixes: 837158b847a4 ("dt-bindings: Check the examples against the schemas") Reported-by: Guido Günther <agx@sigxcpu.org> Tested-by: Guido Günther <agx@sigxcpu.org> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-30xfs: Fix possible null-pointer dereferences in ↵Jia-Ju Bai
xchk_da_btree_block_check_sibling() In xchk_da_btree_block_check_sibling(), there is an if statement on line 274 to check whether ds->state->altpath.blk[level].bp is NULL: if (ds->state->altpath.blk[level].bp) When ds->state->altpath.blk[level].bp is NULL, it is used on line 281: xfs_trans_brelse(..., ds->state->altpath.blk[level].bp); struct xfs_buf_log_item *bip = bp->b_log_item; ASSERT(bp->b_transp == tp); Thus, possible null-pointer dereferences may occur. To fix these bugs, ds->state->altpath.blk[level].bp is checked before being used. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-30exit: make setting exit_state consistentChristian Brauner
Since commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state") we unconditionally set exit_state to EXIT_ZOMBIE before calling into do_notify_parent(). This was done to eliminate a race when querying exit_state in do_notify_pidfd(). Back then we decided to do the absolute minimal thing to fix this and not touch the rest of the exit_notify() function where exit_state is set. Since this fix has not caused any issues change the setting of exit_state to EXIT_DEAD in the autoreap case to account for the fact hat exit_state is set to EXIT_ZOMBIE unconditionally. This fix was planned but also explicitly requested in [1] and makes the whole code more consistent. /* References */ [1]: https://lore.kernel.org/lkml/CAHk-=wigcxGFR2szue4wavJtH5cYTTeNES=toUBVGsmX0rzX+g@mail.gmail.com Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-30intel_th: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to Drivers for Intel(R) Trace Hub controller. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46 Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30Merge tag 'rxrpc-fixes-20190730' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here are a couple of fixes for rxrpc: (1) Fix a potential deadlock in the peer keepalive dispatcher. (2) Fix a missing notification when a UDP sendmsg error occurs in rxrpc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30enetc: Fix build error without PHYLIBYueHaibing
If PHYLIB is not set, build enetc will fails: drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open': enetc.c: undefined reference to `phy_disconnect' enetc.c: undefined reference to `phy_start' drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close': enetc.c: undefined reference to `phy_stop' enetc.c: undefined reference to `phy_disconnect' drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_get_link_ksettings' drivers/net/ethernet/freescale/enetc/enetc_ethtool.o: undefined reference to `phy_ethtool_set_link_ksettings' drivers/net/ethernet/freescale/enetc/enetc_mdio.o: In function `enetc_mdio_probe': enetc_mdio.c: undefined reference to `mdiobus_alloc_size' enetc_mdio.c: undefined reference to `mdiobus_free' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30net: stmmac: Sync RX Buffer upon allocationJose Abreu
With recent changes that introduced support for Page Pool in stmmac, Jon reported that NFS boot was no longer working on an ARM64 based platform that had the IP behind an IOMMU. As Page Pool API does not guarantee DMA syncing because of the use of DMA_ATTR_SKIP_CPU_SYNC flag, we have to explicit sync the whole buffer upon re-allocation because we are always re-using same pages. In fact, ARM64 code invalidates the DMA area upon two situations [1]: - sync_single_for_cpu(): Invalidates if direction != DMA_TO_DEVICE - sync_single_for_device(): Invalidates if direction == DMA_FROM_DEVICE So, as we must invalidate both the current RX buffer and the newly allocated buffer we propose this fix. [1] arch/arm64/mm/cache.S Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Fixes: 2af6106ae949 ("net: stmmac: Introducing support for Page Pool") Signed-off-by: Jose Abreu <joabreu@synopsys.com> Tested-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30mlxsw: spectrum_ptp: fix duplicated check on orig_egr_typesColin Ian King
Currently are duplicated checks on orig_egr_types which are redundant, I believe this is a typo and should actually be orig_ing_types || orig_egr_types instead of the expression orig_egr_types || orig_egr_types. Fix these. Addresses-Coverity: ("Same on both sides") Fixes: c6b36bdd04b5 ("mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30net: dsa: mv88e6xxx: use link-down-define instead of plain valueHubert Feurstein
Using the define here makes the code more expressive. Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30net: phy: fixed_phy: print gpio error only if gpio node is presentHubert Feurstein
It is perfectly ok to not have an gpio attached to the fixed-link node. So the driver should not throw an error message when the gpio is missing. Fixes: 5468e82f7034 ("net: phy: fixed-phy: Drop GPIO from fixed_phy_add()") Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30scsi: qla2xxx: Fix possible fcport null-pointer dereferencesJia-Ju Bai
In qla2x00_alloc_fcport(), fcport is assigned to NULL in the error handling code on line 4880: fcport = NULL; Then fcport is used on lines 4883-4886: INIT_WORK(&fcport->del_work, qla24xx_delete_sess_fn); INIT_WORK(&fcport->reg_work, qla_register_fcport_fn); INIT_LIST_HEAD(&fcport->gnl_entry); INIT_LIST_HEAD(&fcport->list); Thus, possible null-pointer dereferences may occur. To fix these bugs, qla2x00_alloc_fcport() directly returns NULL in the error handling code. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-30scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBASuganath Prabu
Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as per hardware design, if DMA-able range contains all 64-bits set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault. E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000 bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then HBA will fault the firmware. Driver will set 63-bit DMA mask to ensure the above address will not be used. Cc: <stable@vger.kernel.org> # 5.1.20+ Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-30driver core: Fix use-after-free and double free on glue directoryMuchun Song
There is a race condition between removing glue directory and adding a new device under the glue dir. It can be reproduced in following test: CPU1: CPU2: device_add() get_device_parent() class_dir_create_and_add() kobject_add_internal() create_dir() // create glue_dir device_add() get_device_parent() kobject_get() // get glue_dir device_del() cleanup_glue_dir() kobject_del(glue_dir) kobject_add() kobject_add_internal() create_dir() // in glue_dir sysfs_create_dir_ns() kernfs_create_dir_ns(sd) sysfs_remove_dir() // glue_dir->sd=NULL sysfs_put() // free glue_dir->sd // sd is freed kernfs_new_node(sd) kernfs_get(glue_dir) kernfs_add_one() kernfs_put() Before CPU1 remove last child device under glue dir, if CPU2 add a new device under glue dir, the glue_dir kobject reference count will be increase to 2 via kobject_get() in get_device_parent(). And CPU2 has been called kernfs_create_dir_ns(), but not call kernfs_new_node(). Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next call kernfs_add_one() by CPU2 will fail(This is also use-after-free) and call kernfs_put() to decrease reference count. Because the reference count is decremented to 0, it will also call kmem_cache_free() to free the glue_dir->sd again. This will result in double free. In order to avoid this happening, we also should make sure that kernfs_node for glue_dir is released in CPU1 only when refcount for glue_dir kobj is 1 to fix this race. The following calltrace is captured in kernel 4.14 with the following patch applied: commit 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier") -------------------------------------------------------------------------- [ 3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494 Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get(). .... [ 3.633986] Call trace: [ 3.633991] kernfs_create_dir_ns+0xa8/0xb0 [ 3.633994] sysfs_create_dir_ns+0x54/0xe8 [ 3.634001] kobject_add_internal+0x22c/0x3f0 [ 3.634005] kobject_add+0xe4/0x118 [ 3.634011] device_add+0x200/0x870 [ 3.634017] _request_firmware+0x958/0xc38 [ 3.634020] request_firmware_into_buf+0x4c/0x70 .... [ 3.634064] kernel BUG at .../mm/slub.c:294! Here is BUG_ON(object == fp) in set_freepointer(). .... [ 3.634346] Call trace: [ 3.634351] kmem_cache_free+0x504/0x6b8 [ 3.634355] kernfs_put+0x14c/0x1d8 [ 3.634359] kernfs_create_dir_ns+0x88/0xb0 [ 3.634362] sysfs_create_dir_ns+0x54/0xe8 [ 3.634366] kobject_add_internal+0x22c/0x3f0 [ 3.634370] kobject_add+0xe4/0x118 [ 3.634374] device_add+0x200/0x870 [ 3.634378] _request_firmware+0x958/0xc38 [ 3.634381] request_firmware_into_buf+0x4c/0x70 -------------------------------------------------------------------------- Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier") Signed-off-by: Muchun Song <smuchun@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Prateek Sood <prsood@codeaurora.org> Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30scsi: hpsa: remove printing internal cdb on tag collisionDon Brace
Remove racy printing of internal commands. Completion thread can be cleaning up the command in parallel. Reviewed-by: Bader Ali - Saleh <bader.alisaleh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-30MIPS: OProfile: Mark expected switch fall-throughsGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning (Building: mips): arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_stop’: arch/mips/oprofile/op_model_mipsxx.c:217:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl3(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:218:2: note: here case 3: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:219:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl2(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:220:2: note: here case 2: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:221:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl1(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:222:2: note: here case 1: ^~~~ arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_start’: arch/mips/oprofile/op_model_mipsxx.c:197:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl3(WHAT | reg.control[3]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:198:2: note: here case 3: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:199:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl2(WHAT | reg.control[2]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:200:2: note: here case 2: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:201:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfctrl1(WHAT | reg.control[1]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:202:2: note: here case 1: ^~~~ arch/mips/oprofile/op_model_mipsxx.c: In function ‘reset_counters’: arch/mips/oprofile/op_model_mipsxx.c:299:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr3(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:300:2: note: here case 3: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:302:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr2(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:303:2: note: here case 2: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:305:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr1(0); ^~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:306:2: note: here case 1: ^~~~ arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_perfcount_handler’: arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if ((control & MIPS_PERFCTRL_IE) && \ ^ arch/mips/oprofile/op_model_mipsxx.c:248:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(3) ^~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here case n + 1: \ ^ arch/mips/oprofile/op_model_mipsxx.c:249:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(2) ^~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if ((control & MIPS_PERFCTRL_IE) && \ ^ arch/mips/oprofile/op_model_mipsxx.c:249:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(2) ^~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here case n + 1: \ ^ arch/mips/oprofile/op_model_mipsxx.c:250:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(1) ^~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:242:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if ((control & MIPS_PERFCTRL_IE) && \ ^ arch/mips/oprofile/op_model_mipsxx.c:250:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(1) ^~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:239:2: note: here case n + 1: \ ^ arch/mips/oprofile/op_model_mipsxx.c:251:2: note: in expansion of macro ‘HANDLE_COUNTER’ HANDLE_COUNTER(0) ^~~~~~~~~~~~~~ CC usr/include/linux/pmu.h.s arch/mips/oprofile/op_model_mipsxx.c: In function ‘mipsxx_cpu_setup’: arch/mips/oprofile/op_model_mipsxx.c:174:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr3(reg.counter[3]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:175:2: note: here case 3: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:177:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr2(reg.counter[2]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:178:2: note: here case 2: ^~~~ arch/mips/oprofile/op_model_mipsxx.c:180:3: warning: this statement may fall through [-Wimplicit-fallthrough=] w_c0_perfcntr1(reg.counter[1]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/mips/oprofile/op_model_mipsxx.c:181:2: note: here case 1: ^~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Robert Richter <rric@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: oprofile-list@lists.sf.net Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Kees Cook <keescook@chromium.org>
2019-07-30scsi: hpsa: correct scsi command status issue after resetDon Brace
Reviewed-by: Bader Ali - Saleh <bader.alisaleh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-30iwlwifi: mvm: fix a use-after-free bug in iwl_mvm_tx_tso_segmentEmmanuel Grumbach
Accessing the hdr of an skb that was consumed already isn't a good idea. First ask if the skb is a QoS packet, then keep that data on stack, and then consume the skb. This was spotted by KASAN. Cc: stable@vger.kernel.org Fixes: 08f7d8b69aaf ("iwlwifi: mvm: bring back mvm GSO code") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: fix an out-of-bound accessEmmanuel Grumbach
The index for the elements of the ACPI object we dereference was static. This means that if we called the function twice we wouldn't start from 3 again, but rather from the latest index we reached in the previous call. This was dutifully reported by KASAN. Fix this. Cc: stable@vger.kernel.org Fixes: 6996490501ed ("iwlwifi: mvm: add support for EWRD (Dynamic SAR) ACPI table") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: don't unmap as page memory that was mapped as singleEmmanuel Grumbach
In order to remember how to unmap a memory (as single or as page), we maintain a bit per Transmit Buffer (TBs) in the meta data (structure iwl_cmd_meta). We maintain a bitmap: 1 bit per TB. If the TB is set, we will free the memory as a page. This bitmap was never cleared. Fix this. Cc: stable@vger.kernel.org Fixes: 3cd1980b0cdf ("iwlwifi: pcie: introduce new tfd and tb formats") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT supportLuca Coelho
We erroneously added a check for FW API version 41 before sending GEO_TX_POWER_LIMIT, but this was already implemented in version 38. Additionally, it was cherry-picked to older versions, namely 17, 26 and 29, so check for those as well. Cc: stable@vger.kernel.org Fixes: eca1e56ceedd ("iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: add 3 new IDs for the 9000 series (iwl9260_2ac_160_cfg)Ihab Zhaika
Add a few PCI ID'S for 9000 series. Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: fix possible out-of-bounds read when accessing lq_infoGregory Greenman
lq_info is an arary of size 2, active_tbl index is u8. When accessing lq_info[1 - active_tbl], theoretically it's possible that the access will be made to a negative index value. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: fix frame drop from the reordering bufferEmmanuel Grumbach
An earlier patch made sure that the queues are not lagging too far behind. This means that iwl_mvm_release_frames should not be called with a head_sn too far behind NSSN. Don't take the risk to change completely the entry condition to iwl_mvm_release_frames, but don't update the head_sn is the NSSN is more than 2048 packets ahead of us. Since this just cannot be right. This means that the scenario described here happened. We are queue 0. Q:0 Q:1 head_sn: 0 -> 2047 head_sn: 2048 Lots of packets arrive: head_sn: 2047 -> 2150 send NSSN_SYNC notification Handle notification from the firmware and do NOT move the head_sn back to 2048 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: replace RS mutex with a spin_lockGregory Greenman
The solution with the worker still had a bug, as in order to get sta, rcu_read_lock should be used and thus no mutex can be used inside iwl_mvm_rs_rate_init. Also, spin_lock is a simpler solution, no need to spawn a dedicated worker. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: send LQ command always ASYNCGregory Greenman
The only place where the command was sent as SYNC is during init and this is not really critical. This change is required for replacing RS mutex with a spinlock (in the subsequent patch), since SYNC comamnd requres sleeping and thus the flow cannot be done when holding a spinlock. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: fix comparison of u32 variable with less than zeroColin Ian King
The comparison of the u32 variable wgds_tbl_idx with less than zero is always going to be false because it is unsigned. Fix this by making wgds_tbl_idx a plain signed int. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 4fd445a2c855 ("iwlwifi: mvm: Add log information about SAR status") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: fix locking in delayed GTK settingJohannes Berg
This code clearly never could have worked, since it locks while already locked. Add an unlocked __iwl_mvm_mac_set_key() variant that doesn't do locking to fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdefShahar S Matityahu
The driver should call iwl_dbg_tlv_free even if debugfs is not defined since ini mode does not depend on debugfs ifdef. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Fixes: 68f6f492c4fa ("iwlwifi: trans: support loading ini TLVs from external file") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: dbg_ini: move iwl_dbg_tlv_load_bin out of debug override ifdefShahar S Matityahu
ini debug mode should work even if debug override is not defined. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Fixes: 68f6f492c4fa ("iwlwifi: trans: support loading ini TLVs from external file") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30coccinelle: api/atomic_as_refcounter: add SPDX License IdentifierMatthias Maennich
Add the missing GPLv2 SPDX license identifier. It appears this single file was missing from 7f904d7e1f3e ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505"), which addressed all other files in scripts/coccinelle. Hence I added GPL-2.0-only consitently with the mentioned patch. Cc: linux-spdx@vger.kernel.org Cc: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30iwlwifi: mvm: add a wrapper around rs_tx_status to handle locksGregory Greenman
iwl_mvm_rs_tx_status can be called from two places in the code, but the mutex is taken only on one of the calls. Split it into a wrapper taking locks and an internal __iwl_mvm_rs_tx_status function. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30kernel/configs: Replace GPL boilerplate code with SPDX identifierThomas Huth
The FSF does not reside in "675 Mass Ave, Cambridge" anymore... let's replace the old GPL boilerplate code with a proper SPDX identifier instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queuesEmmanuel Grumbach
In order to support MSI-X efficiently, we want to avoid communication across Rx queues. Each Rx queue should have all the data it needs to process a packet. The reordering buffer is a challenge in the MSI-X world since we can have a single BA session whose packets are directed to different queues. This is why each queue has its own reordering buffer. The hardware is able to hint the driver whether we have a hole or not, which allows the driver to know whether it can release a packet or not. This indication is called NSSN. Roughly, if the packet's SN is lower than the NSSN, we can release the packet to the stack. The NSSN is the SN of the newest packet received without any holes + 1. This is working as long as we don't have packets that we release because of a timeout. When that happens, we could have taken the decision to release a packet after we have been waiting for its predecessor for too long. If this predecessor comes later, we have to drop it because we can't release packets out of order. In that case, the hardware will give us an indication that we can we release the packet (SN < NSSN), but the packet still needs to be dropped. This is why we sometimes need to ignore the NSSN and we track the head_sn in software. Here is a specific example of this: 1) Rx queue 1 got packets: 480, 482, 483 2) We release 480 to to the stack and wait for 481 3) NSSN is now 481 4) The timeout expires 5) We release 482 and 483, NSSN is still 480 6) 481 arrives its NSSN is 484. We need to drop 481 even if 481 < 484. This is why we'll update the head_sn to 484 at step 2. The flow now is: 1) Rx queue 1 got packets: 480, 482, 483 2) We release 480 to to the stack and wait for 481 3) NSSN is now 481 / head_sn is 481 4) The timeout expires 5) We release 482 and 483, NSSN is still 480 but head_sn is 484. 6) 481 arrives its NSSN is 484, but head_sn is 484 and we drop it. This code introduces another problem in case all the traffic goes well (no hole, no timeout): Rx queue 1: 0 -> 483 (head_sn = 484) Rx queue 2: 501 -> 4095 (head_sn = 0) Rx queue 2: 0 -> 480 (head_sn = 481) Rx queue 1: 481 but head_sn = 484 and we drop it. At this point, the SN of queue 1 is far behind: more than 4040 packets behind. Queue 1 will consider 481 "old" because 481 is in [501-64:501] whereas it is a very new packet. In order to fix that, send an Rx notification from time to time (twice across the full set of 4096 packets) to make sure no Rx queue is lagging too far behind. What will happen then is: Rx queue 1: 0 -> 483 (head_sn = 484) Rx queue 2: 501 -> 2047 (head_sn = 2048) Rx queue 1: Sync nofication (head_sn = 2048) Rx queue 2: 2048 -> 4095 (head_sn = 0) Rx queue 1: Sync notification (head_sn = 0) Rx queue 2: 1 -> 481 (head_sn = 482) Rx queue 1: 481 and head_sn = 0. In queue 1's data, head_sn is now 0, the packet coming in is 481, it'll understand that the new packet is new and it won't be dropped. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwiif: mvm: refactor iwl_mvm_notify_rx_queueEmmanuel Grumbach
Instead of allocating memory for which we have an upper limit, use a small buffer on stack. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: add a new RSS sync notification for NSSN syncEmmanuel Grumbach
We will soon be using a new notification that will be initiated by the driver, sent to the firmware and sent back to all the RSS queues by the firmware. This new notification will be useful to synchronize the NSSN across all the queues. For now, don't send the notification, just add the code to handle it. Later patch will add the code to actually send it. While at it, validate the baid coming from the firmware to avoid accessing an array with a bad index in the driver. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: prepare the ground for more RSS notificationsEmmanuel Grumbach
We will need a new type of synchronization message going through all the RSS queues. Prepare the ground for this. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT on version < 41Luca Coelho
Firmware versions before 41 don't support the GEO_TX_POWER_LIMIT command, and sending it to the firmware will cause a firmware crash. We allow this via debugfs, so we need to return an error value in case it's not supported. This had already been fixed during init, when we send the command if the ACPI WGDS table is present. Fix it also for the other, userspace-triggered case. Cc: stable@vger.kernel.org Fixes: 7fe90e0e3d60 ("iwlwifi: mvm: refactor geo init") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: avoid races in rate init and rate performMordechay Goodstein
Rate perform uses the lq_sta table to calculate the next rate to scale while rate init resets the same table, Rate perform is done in soft irq context in parallel to rate init that can be called in case we are doing changes like AP changes BW or moving state for auth to assoc. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30iwlwifi: mvm: disable TX-AMSDU on older NICsJohannes Berg
On older NICs, we occasionally see issues with A-MSDU support, where the commands in the FIFO get confused and then we see an assert EDC because the next command in the FIFO isn't TX. We've tried to isolate this issue and understand where it comes from, but haven't found any errors in building the A-MSDU in software. At least for now, disable A-MSDU support on older hardware so that users can use it again without fearing the assert. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=203315. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-30MAINTAINERS: Move linux-fpga tree to new locationMoritz Fischer
Move the linux-fpga tree to new location at: git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga.git Signed-off-by: Moritz Fischer <mdf@kernel.org> Link: https://lore.kernel.org/r/20190725174517.10516-1-mdf@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-30Btrfs: fix deadlock between fiemap and transaction commitsFilipe Manana
The fiemap handler locks a file range that can have unflushed delalloc, and after locking the range, it tries to attach to a running transaction. If the running transaction started its commit, that is, it is in state TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the flushoncommit option or the transaction is creating a snapshot for the subvolume that contains the file that fiemap is operating on, we end up deadlocking. This happens because fiemap is blocked on the transaction, waiting for it to complete, and the transaction is waiting for the flushed dealloc to complete, which requires locking the file range that the fiemap task already locked. The following stack traces serve as an example of when this deadlock happens: (...) [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [404571.515956] Call Trace: [404571.516360] ? __schedule+0x3ae/0x7b0 [404571.516730] schedule+0x3a/0xb0 [404571.517104] lock_extent_bits+0x1ec/0x2a0 [btrfs] [404571.517465] ? remove_wait_queue+0x60/0x60 [404571.517832] btrfs_finish_ordered_io+0x292/0x800 [btrfs] [404571.518202] normal_work_helper+0xea/0x530 [btrfs] [404571.518566] process_one_work+0x21e/0x5c0 [404571.518990] worker_thread+0x4f/0x3b0 [404571.519413] ? process_one_work+0x5c0/0x5c0 [404571.519829] kthread+0x103/0x140 [404571.520191] ? kthread_create_worker_on_cpu+0x70/0x70 [404571.520565] ret_from_fork+0x3a/0x50 [404571.520915] kworker/u8:6 D 0 31651 2 0x80004000 [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs] (...) [404571.537000] fsstress D 0 13117 13115 0x00004000 [404571.537263] Call Trace: [404571.537524] ? __schedule+0x3ae/0x7b0 [404571.537788] schedule+0x3a/0xb0 [404571.538066] wait_current_trans+0xc8/0x100 [btrfs] [404571.538349] ? remove_wait_queue+0x60/0x60 [404571.538680] start_transaction+0x33c/0x500 [btrfs] [404571.539076] btrfs_check_shared+0xa3/0x1f0 [btrfs] [404571.539513] ? extent_fiemap+0x2ce/0x650 [btrfs] [404571.539866] extent_fiemap+0x2ce/0x650 [btrfs] [404571.540170] do_vfs_ioctl+0x526/0x6f0 [404571.540436] ksys_ioctl+0x70/0x80 [404571.540734] __x64_sys_ioctl+0x16/0x20 [404571.540997] do_syscall_64+0x60/0x1d0 [404571.541279] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) [404571.543729] btrfs D 0 14210 14208 0x00004000 [404571.544023] Call Trace: [404571.544275] ? __schedule+0x3ae/0x7b0 [404571.544526] ? wait_for_completion+0x112/0x1a0 [404571.544795] schedule+0x3a/0xb0 [404571.545064] schedule_timeout+0x1ff/0x390 [404571.545351] ? lock_acquire+0xa6/0x190 [404571.545638] ? wait_for_completion+0x49/0x1a0 [404571.545890] ? wait_for_completion+0x112/0x1a0 [404571.546228] wait_for_completion+0x131/0x1a0 [404571.546503] ? wake_up_q+0x70/0x70 [404571.546775] btrfs_wait_ordered_extents+0x27c/0x400 [btrfs] [404571.547159] btrfs_commit_transaction+0x3b0/0xae0 [btrfs] [404571.547449] ? btrfs_mksubvol+0x4a4/0x640 [btrfs] [404571.547703] ? remove_wait_queue+0x60/0x60 [404571.547969] btrfs_mksubvol+0x605/0x640 [btrfs] [404571.548226] ? __sb_start_write+0xd4/0x1c0 [404571.548512] ? mnt_want_write_file+0x24/0x50 [404571.548789] btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs] [404571.549048] btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs] [404571.549307] btrfs_ioctl+0x133f/0x3150 [btrfs] [404571.549549] ? mem_cgroup_charge_statistics+0x4c/0xd0 [404571.549792] ? mem_cgroup_commit_charge+0x84/0x4b0 [404571.550064] ? __handle_mm_fault+0xe3e/0x11f0 [404571.550306] ? do_raw_spin_unlock+0x49/0xc0 [404571.550608] ? _raw_spin_unlock+0x24/0x30 [404571.550976] ? __handle_mm_fault+0xedf/0x11f0 [404571.551319] ? do_vfs_ioctl+0xa2/0x6f0 [404571.551659] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [404571.552087] do_vfs_ioctl+0xa2/0x6f0 [404571.552355] ksys_ioctl+0x70/0x80 [404571.552621] __x64_sys_ioctl+0x16/0x20 [404571.552864] do_syscall_64+0x60/0x1d0 [404571.553104] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) If we were joining the transaction instead of attaching to it, we would not risk a deadlock because a join only blocks if the transaction is in a state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc flush performed by a transaction is done before it reaches that state, when it is in the state TRANS_STATE_COMMIT_START. However a transaction join is intended for use cases where we do modify the filesystem, and fiemap only needs to peek at delayed references from the current transaction in order to determine if extents are shared, and, besides that, when there is no current transaction or when it blocks to wait for a current committing transaction to complete, it creates a new transaction without reserving any space. Such unnecessary transactions, besides doing unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary rotation of the precious backup roots. So fix this by adding a new transaction join variant, named join_nostart, which behaves like the regular join, but it does not create a transaction when none currently exists or after waiting for a committing transaction to complete. Fixes: 03628cdbc64db6 ("Btrfs: do not start a transaction during fiemap") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30Btrfs: fix race leading to fs corruption after transaction abortFilipe Manana
When one transaction is finishing its commit, it is possible for another transaction to start and enter its initial commit phase as well. If the first ends up getting aborted, we have a small time window where the second transaction commit does not notice that the previous transaction aborted and ends up committing, writing a superblock that points to btrees that reference extent buffers (nodes and leafs) that were not persisted to disk. The consequence is that after mounting the filesystem again, we will be unable to load some btree nodes/leafs, either because the content on disk is either garbage (or just zeroes) or corresponds to the old content of a previouly COWed or deleted node/leaf, resulting in the well known error messages "parent transid verify failed on ...". The following sequence diagram illustrates how this can happen. CPU 1 CPU 2 <at transaction N> btrfs_commit_transaction() (...) --> sets transaction state to TRANS_STATE_UNBLOCKED --> sets fs_info->running_transaction to NULL (...) btrfs_start_transaction() start_transaction() wait_current_trans() --> returns immediately because fs_info->running_transaction is NULL join_transaction() --> creates transaction N + 1 --> sets fs_info->running_transaction to transaction N + 1 --> adds transaction N + 1 to the fs_info->trans_list list --> returns transaction handle pointing to the new transaction N + 1 (...) btrfs_sync_file() btrfs_start_transaction() --> returns handle to transaction N + 1 (...) btrfs_write_and_wait_transaction() --> writeback of some extent buffer fails, returns an error btrfs_handle_fs_error() --> sets BTRFS_FS_STATE_ERROR in fs_info->fs_state --> jumps to label "scrub_continue" cleanup_transaction() btrfs_abort_transaction(N) --> sets BTRFS_FS_STATE_TRANS_ABORTED flag in fs_info->fs_state --> sets aborted field in the transaction and transaction handle structures, for transaction N only --> removes transaction from the list fs_info->trans_list btrfs_commit_transaction(N + 1) --> transaction N + 1 was not aborted, so it proceeds (...) --> sets the transaction's state to TRANS_STATE_COMMIT_START --> does not find the previous transaction (N) in the fs_info->trans_list, so it doesn't know that transaction was aborted, and the commit of transaction N + 1 proceeds (...) --> sets transaction N + 1 state to TRANS_STATE_UNBLOCKED btrfs_write_and_wait_transaction() --> succeeds writing all extent buffers created in the transaction N + 1 write_all_supers() --> succeeds --> we now have a superblock on disk that points to trees that refer to at least one extent buffer that was never persisted So fix this by updating the transaction commit path to check if the flag BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting the transaction to the TRANS_STATE_COMMIT_START we do not find any previous transaction in the fs_info->trans_list. If the flag is set, just fail the transaction commit with -EROFS, as we do in other places. The exact error code for the previous transaction abort was already logged and reported. Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30Btrfs: fix incremental send failure after deduplicationFilipe Manana
When doing an incremental send operation we can fail if we previously did deduplication operations against a file that exists in both snapshots. In that case we will fail the send operation with -EIO and print a message to dmesg/syslog like the following: BTRFS error (device sdc): Send: inconsistent snapshot, found updated \ extent for inode 257 without updated inode item, send root is 258, \ parent root is 257 This requires that we deduplicate to the same file in both snapshots for the same amount of times on each snapshot. The issue happens because a deduplication only updates the iversion of an inode and does not update any other field of the inode, therefore if we deduplicate the file on each snapshot for the same amount of time, the inode will have the same iversion value (stored as the "sequence" field on the inode item) on both snapshots, therefore it will be seen as unchanged between in the send snapshot while there are new/updated/deleted extent items when comparing to the parent snapshot. This makes the send operation return -EIO and print an error message. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our first file. The first half of the file has several 64Kb # extents while the second half as a single 512Kb extent. $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo # Create the base snapshot and the parent send stream from it. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1 $ btrfs send -f /tmp/1.snap /mnt/mysnap1 # Create our second file, that has exactly the same data as the first # file. $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar # Create the second snapshot, used for the incremental send, before # doing the file deduplication. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2 # Now before creating the incremental send stream: # # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This # will drop several extent items and add a new one, also updating # the inode's iversion (sequence field in inode item) by 1, but not # any other field of the inode; # # 2) Deduplicate into a different subrange of file foo in snapshot # mysnap2. This will replace an extent item with a new one, also # updating the inode's iversion by 1 but not any other field of the # inode. # # After these two deduplication operations, the inode items, for file # foo, are identical in both snapshots, but we have different extent # items for this inode in both snapshots. We want to check this doesn't # cause send to fail with an error or produce an incorrect stream. $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo # Create the incremental send stream. $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2 ERROR: send ioctl failed with -5: Input/output error This issue started happening back in 2015 when deduplication was updated to not update the inode's ctime and mtime and update only the iversion. Back then we would hit a BUG_ON() in send, but later in 2016 send was updated to return -EIO and print the error message instead of doing the BUG_ON(). A test case for fstests follows soon. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933 Fixes: 1c919a5e13702c ("btrfs: don't update mtime/ctime on deduped inodes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>