linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-05-14	block: Export bio check/set pages_dirty	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add warning for bi_next not NULL in bio_endio()	Kent Overstreet
	Recently found a bug where a driver left bi_next not NULL and then called bio_endio(), and then the submitter of the bio used bio_copy_data() which was treating src and dst as lists of bios. Fixed that bug by splitting out bio_list_copy_data(), but in case other things are depending on bi_next in weird ways, add a warning to help avoid more bugs like that in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add missing flush_dcache_page() call	Kent Overstreet
	Since a bio can point to userspace pages (e.g. direct IO), this is generally necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Split out bio_list_copy_data()	Kent Overstreet
	Found a bug (with ASAN) where we were passing a bio to bio_copy_data() with bi_next not NULL, when it should have been - a driver had left bi_next set to something after calling bio_endio(). Since the normal case is only copying single bios, split out bio_list_copy_data() to avoid more bugs like this in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add bio_copy_data_iter(), zero_fill_bio_iter()	Kent Overstreet
	Add versions that take bvec_iter args instead of using bio->bi_iter - to be used by bcachefs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Use bioset_init() for fs_bio_set	Kent Overstreet
	Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Add bioset_init()/bioset_exit()	Kent Overstreet
	Similarly to mempool_init()/mempool_exit(), take a pointer indirection out of allocation/freeing by allowing biosets to be embedded in other structs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: Convert bio_set to mempool_init()	Kent Overstreet
	Minor performance improvement by getting rid of pointer indirections from allocation/freeing fastpaths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	mempool: Add mempool_init()/mempool_exit()	Kent Overstreet
	Allows mempools to be embedded in other structs, getting rid of a pointer indirection from allocation fastpaths. mempool_exit() is safe to call on an uninitialized but zeroed mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	tun: fix use after free for ptr_ring	Jason Wang
	We used to initialize ptr_ring during TUNSETIFF, this is because its size depends on the tx_queue_len of netdevice. And we try to clean it up when socket were detached from netdevice. A race were spotted when trying to do uninit during a read which will lead a use after free for pointer ring. Solving this by always initialize a zero size ptr_ring in open() and do resizing during TUNSETIFF, and then we can safely do cleanup during close(). With this, there's no need for the workaround that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak for tfile->tx_array"). Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14	sbitmap: fix race in wait batch accounting	Jens Axboe
	If we have multiple callers of sbq_wake_up(), we can end up in a situation where the wait_cnt will continually go more and more negative. Consider the case where our wake batch is 1, hence wait_cnt will start out as 1. wait_cnt == 1 CPU0 CPU1 atomic_dec_return(), cnt == 0 atomic_dec_return(), cnt == -1 cmpxchg(-1, 0) (succeeds) [wait_cnt now 0] cmpxchg(0, 1) (fails) This ends up with wait_cnt being 0, we'll wakeup immediately next time. Going through the same loop as above again, and we'll have wait_cnt -1. For the case where we have a larger wake batch, the only difference is that the starting point will be higher. We'll still end up with continually smaller batch wakeups, which defeats the purpose of the rolling wakeups. Always reset the wait_cnt to the batch value. Then it doesn't matter who wins the race. But ensure that whomever does win the race is the one that increments the ws index and wakes up our batch count, loser gets to call __sbq_wake_up() again to account his wakeups towards the next active wait state index. Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	Merge tag 'reset-fixes-for-4.17' of git://git.pengutronix.de/pza/linux into ↵	Olof Johansson
	fixes Reset controller fixes for v4.17 Fix the USB3 reset (offset 0x200c, bit 5) on Uniphier LD20. It was incorrectly labeled as GIO reset. This reset line is not yet used in uniphier-ld20.dtsi. * tag 'reset-fixes-for-4.17' of git://git.pengutronix.de/pza/linux: reset: uniphier: fix USB clock line for LD20 Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-14	Merge tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu into fixes	Olof Johansson
	mvebu fixes for 4.17 (part 1) Declare missing clocks needed for network on Armada 8040 base boards (such as the McBin) * tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu: ARM64: dts: marvell: armada-cp110: Add mg_core_clk for ethernet node ARM64: dts: marvell: armada-cp110: Add clocks for the xmdio node Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-14	ARM: keystone: fix platform_domain_notifier array overrun	Russell King
	platform_domain_notifier contains a variable sized array, which the pm_clk_notify() notifier treats as a NULL terminated array: for (con_id = clknb->con_ids; con_id; con_id++) pm_clk_add(dev, con_id); Omitting the initialiser for con_ids means that the array is zero sized, and there is no NULL terminator. This leads to pm_clk_notify() overrunning into what ever structure follows, which may not be NULL. This leads to an oops: Unable to handle kernel NULL pointer dereference at virtual address 0000008c pgd = c0003000 [0000008c] pgd=80000800004003c, pmd=00000000c Internal error: Oops: 206 [#1] PREEMPT SMP ARM Modules linked in:c CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #9 Hardware name: Keystone PC is at strlen+0x0/0x34 LR is at kstrdup+0x18/0x54 pc : [<c0623340>] lr : [<c0111d6c>] psr: 20000013 sp : eec73dc0 ip : eed780c0 fp : 00000001 r10: 00000000 r9 : 00000000 r8 : eed71e10 r7 : 0000008c r6 : 0000008c r5 : 014000c0 r4 : c03a6ff4 r3 : c09445d0 r2 : 00000000 r1 : 014000c0 r0 : 0000008c Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 00003000 DAC: fffffffd Process swapper/0 (pid: 1, stack limit = 0xeec72210) Stack: (0xeec73dc0 to 0xeec74000) ... [<c0623340>] (strlen) from [<c0111d6c>] (kstrdup+0x18/0x54) [<c0111d6c>] (kstrdup) from [<c03a6ff4>] (__pm_clk_add+0x58/0x120) [<c03a6ff4>] (__pm_clk_add) from [<c03a731c>] (pm_clk_notify+0x64/0xa8) [<c03a731c>] (pm_clk_notify) from [<c004614c>] (notifier_call_chain+0x44/0x84) [<c004614c>] (notifier_call_chain) from [<c0046320>] (__blocking_notifier_call_chain+0x48/0x60) [<c0046320>] (__blocking_notifier_call_chain) from [<c0046350>] (blocking_notifier_call_chain+0x18/0x20) [<c0046350>] (blocking_notifier_call_chain) from [<c0390234>] (device_add+0x36c/0x534) [<c0390234>] (device_add) from [<c047fc00>] (of_platform_device_create_pdata+0x70/0xa4) [<c047fc00>] (of_platform_device_create_pdata) from [<c047fea0>] (of_platform_bus_create+0xf0/0x1ec) [<c047fea0>] (of_platform_bus_create) from [<c047fff8>] (of_platform_populate+0x5c/0xac) [<c047fff8>] (of_platform_populate) from [<c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8) [<c08b1f04>] (of_platform_default_populate_init) from [<c000a78c>] (do_one_initcall+0x3c/0x164) [<c000a78c>] (do_one_initcall) from [<c087bd9c>] (kernel_init_freeable+0x10c/0x1d0) [<c087bd9c>] (kernel_init_freeable) from [<c0628db0>] (kernel_init+0x8/0xf0) [<c0628db0>] (kernel_init) from [<c00090d8>] (ret_from_fork+0x14/0x3c) Exception stack(0xeec73fb0 to 0xeec73ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: e3520000 1afffff7 e12fff1e c0801730 (e5d02000) ---[ end trace cafa8f148e262e80 ]--- Fix this by adding the necessary initialiser. Fixes: fc20ffe1213b ("ARM: keystone: add PM domain support for clock management") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-14	libata: Apply NOLPM quirk for SAMSUNG PM830 CXM13D1Q.	François Cami
	Without this patch the drive errors out regularly: [ 1.090154] ata1.00: ATA-8: SAMSUNG SSD PM830 mSATA 256GB, CXM13D1Q, max UDMA/133 (...) [ 345.154996] ata1.00: exception Emask 0x40 SAct 0x0 SErr 0xc0800 action 0x6 [ 345.155006] ata1.00: irq_stat 0x40000001 [ 345.155013] ata1: SError: { HostInt CommWake 10B8B } [ 345.155018] ata1.00: failed command: SET FEATURES [ 345.155032] ata1.00: cmd ef/05:e1:00:00:00/00:00:00:00:00/40 tag 7 res 51/04:e1:00:00:00/00:00:00:00:00/40 Emask 0x41 (internal error) [ 345.155038] ata1.00: status: { DRDY ERR } [ 345.155042] ata1.00: error: { ABRT } [ 345.155051] ata1: hard resetting link [ 345.465661] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 345.466955] ata1.00: configured for UDMA/133 [ 345.467085] ata1: EH complete Signed-off-by: François Cami <fcami@fedoraproject.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-14	usb: musb: fix remote wakeup racing with suspend	Daniel Glöckner
	It has been observed that writing 0xF2 to the power register while it reads as 0xF4 results in the register having the value 0xF0, i.e. clearing RESUME and setting SUSPENDM in one go does not work. It might also violate the USB spec to transition directly from resume to suspend, especially when not taking T_DRSMDN into account. But this is what happens when a remote wakeup occurs between SetPortFeature USB_PORT_FEAT_SUSPEND on the root hub and musb_bus_suspend being called. This commit returns -EBUSY when musb_bus_suspend is called while remote wakeup is signalled and thus avoids to reset the RESUME bit. Ignoring this error when musb_port_suspend is called from musb_hub_control is ok. Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-14	block: consistently use GFP_NOIO instead of __GFP_NORECLAIM	Christoph Hellwig
	Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM	Christoph Hellwig
	We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: pass an explicit gfp_t to get_request	Christoph Hellwig
	blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: sanitize blk_get_request calling conventions	Christoph Hellwig
	Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	block: fix __get_request documentation	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	scsi/osd: remove the gfp argument to osd_start_request	Christoph Hellwig
	Always GFP_KERNEL, and keeping it would cause serious complications for the next change. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	Btrfs: fix xattr loss after power failure	Filipe Manana
	If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure. Trivial reproducer $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ sync $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power failure> $ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar <empty output> $ So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode. Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: <stable@vger.kernel.org> # 4.2+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-14	Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting	Robbie Ko
	[BUG] btrfs incremental send BUG happens when creating a snapshot of snapshot that is being used by send. [REASON] The problem can happen if while we are doing a send one of the snapshots used (parent or send) is snapshotted, because snapshoting implies COWing the root of the source subvolume/snapshot. 1. When doing an incremental send, the send process will get the commit roots from the parent and send snapshots, and add references to them through extent_buffer_get(). 2. When a snapshot/subvolume is snapshotted, its root node is COWed (transaction.c:create_pending_snapshot()). 3. COWing releases the space used by the node immediately, through: __btrfs_cow_block() --btrfs_free_tree_block() ----btrfs_add_free_space(bytenr of node) 4. Because send doesn't hold a transaction open, it's possible that the transaction used to create the snapshot commits, switches the commit root and the old space used by the previous root node gets assigned to some other node allocation. Allocation of a new node will use the existing extent buffer found in memory, which we previously got a reference through extent_buffer_get(), and allow the extent buffer's content (pages) to be modified: btrfs_alloc_tree_block --btrfs_reserve_extent ----find_free_extent (get bytenr of old node) --btrfs_init_new_buffer (use bytenr of old node) ----btrfs_find_create_tree_block ------alloc_extent_buffer --------find_extent_buffer (get old node) 5. So send can access invalid memory content and have unpredictable behaviour. [FIX] So we fix the problem by copying the commit roots of the send and parent snapshots and use those copies. CallTrace looks like this: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1861! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721 ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000 RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs] RSP: 0018:ffff88041b723b68 EFLAGS: 00010246 RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000 RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66 R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890 R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001 FS: 00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000) CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880 ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50 ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400 Call Trace: [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs] [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs] [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs] [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs] [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs] [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs] [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990 [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20 [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0 [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0 [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500 [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90 [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990 [<ffffffff81034f83>] ? do_fork+0x113/0x3b0 [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0 [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b ---[ end trace 29576629ee80b2e1 ]--- Fixes: 7069830a9e38 ("Btrfs: add btrfs_compare_trees function") CC: stable@vger.kernel.org # 3.6+ Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-14	memstick: remove unused variables	Christoph Hellwig
	Fixes: 7c2d748e8476 ("memstick: don't call blk_queue_bounce_limit") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14	afs: Fix the non-encryption of calls	David Howells
	Some AFS servers refuse to accept unencrypted traffic, so can't be accessed with kAFS. Set the AF_RXRPC security level to encrypt client calls to deal with this. Note that incoming service calls are set by the remote client and so aren't affected by this. This requires an AF_RXRPC patch to pass the value set by setsockopt to calls begun by the kernel. Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix CB.CallBack handling	David Howells
	The handling of CB.CallBack messages sent by the fileserver to the client is broken in that they are currently being processed after the reply has been transmitted. This is not what the fileserver expects, however. It holds up change visibility until the reply comes so as to maintain cache coherency, and so expects the client to have to refetch the state on the affected files. Fix CB.CallBack handling to perform the callback break before sending the reply. The fileserver is free to hold up status fetches issued by other threads on the same client that occur in reponse to the callback until any pending changes have been committed. Fixes: d001648ec7cf ("rxrpc: Don't expose skbs to in-kernel users [ver #2]") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix whole-volume callback handling	David Howells
	It's possible for an AFS file server to issue a whole-volume notification that callbacks on all the vnodes in the file have been broken. This is done for R/O and backup volumes (which don't have per-file callbacks) and for things like a volume being taken offline. Fix callback handling to detect whole-volume notifications, to track it across operations and to check it during inode validation. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix afs_find_server search loop	Marc Dionne
	The code that looks up servers by addresses makes the assumption that the list of addresses for a server is sorted. It exits the loop if it finds that the target address is larger than the current candidate. As the list is not currently sorted, this can lead to a failure to find a matching server, which can cause callbacks from that server to be ignored. Remove the early exit case so that the complete list is searched. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix the handling of an unfound server in CM operations	David Howells
	If the client cache manager operations that need the server record (CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find the server record, they abort the call from the file server with RX_CALL_DEAD when they should return okay. Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Add a tracepoint to record callbacks from unlisted servers	David Howells
	Add a tracepoint to record callbacks from servers for which we don't have a record. Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID	David Howells
	Fix the handling of the CB.InitCallBackState3 service call to find the record of a server that we're using by looking it up by the UUID passed as the parameter rather than by its address (of which it might have many, and which may change). Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix VNOVOL handling in address rotation	David Howells
	If a volume location record lists multiple file servers for a volume, then it's possible that due to a misconfiguration or a changing configuration that one of the file servers doesn't know about it yet and will abort VNOVOL. Currently, the rotation algorithm will stop with EREMOTEIO. Fix this by moving on to try the next server if VNOVOL is returned. Once all the servers have been tried and the record rechecked, the algorithm will stop with EREMOTEIO or ENOMEDIUM. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility	David Howells
	The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug whereby if an error occurs on one of the vnodes being queried, then the errorCode field is set correctly in the corresponding status, but the interfaceVersion field is left unset. Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against the following cases when called from FS.InlineBulkStatus delivery: (1) If InterfaceVersion == 0 then: (a) If errorCode != 0 then it indicates the abort code for the corresponding vnode. (b) If errorCode == 0 then the status record is invalid. (2) If InterfaceVersion == 1 then: (a) If errorCode != 0 then it indicates the abort code for the corresponding vnode. (b) If errorCode == 0 then the status record is valid and can be parsed. (3) If InterfaceVersion is anything else then the status record is invalid. Fixes: dd9fbcb8e103 ("afs: Rearrange status mapping") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	net/can: single_open_net needs to be paired with single_release_net	Christoph Hellwig
	Otherwise we will leak a reference to the network namespace. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-14	__inode_security_revalidate() never gets NULL opt_dentry	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-14	make xattr_getsecurity() static	Al Viro
	many years overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-14	mtd: rawnand: marvell: Fix read logic for layouts with ->nchunks > 2	Boris Brezillon
	The code is doing monolithic reads for all chunks except the last one which is wrong since a monolithic read will issue the READ0+ADDRS+READ_START sequence. It not only takes longer because it forces the NAND chip to reload the page content into its internal cache, but by doing that we also reset the column pointer to 0, which means we'll always read the first chunk instead of moving to the next one. Rework the code to do a monolithic read only for the first chunk, then switch to naked reads for all intermediate chunks and finally issue a last naked read for the last chunk. Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver Cc: stable@vger.kernel.org Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-05-14	mtd: Fix comparison in map_word_andequal()	Ben Hutchings
	Commit 9e343e87d2c4 ("mtd: cfi: convert inline functions to macros") changed map_word_andequal() into a macro, but also changed the right hand side of the comparison from val3 to val2. Change it back to use val3 on the right hand side. Thankfully this did not cause a regression because all callers currently pass the same argument for val2 and val3. Fixes: 9e343e87d2c4 ("mtd: cfi: convert inline functions to macros") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-05-14	afs: Fix server rotation's handling of fileserver probe failure	David Howells
	The server rotation algorithm just gives up if it fails to probe a fileserver. Fix this by rotating to the next fileserver instead. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix refcounting in callback registration	David Howells
	The refcounting on afs_cb_interest struct objects in afs_register_server_cb_interest() is wrong as it uses the server list entry's call back interest pointer without regard for the fact that it might be replaced at any time and the object thrown away. Fix this by: (1) Put a lock on the afs_server_list struct that can be used to mediate access to the callback interest pointers in the servers array. (2) Keep a ref on the callback interest that we get from the entry. (3) Dropping the old reference held by vnode->cb_interest if we replace the pointer. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix giving up callbacks on server destruction	David Howells
	When a server record is destroyed, we want to send a message to the server telling it that we're giving up all the callbacks it has promised us. Apply two fixes to this: (1) Only send the FS.GiveUpAllCallBacks message if we actually got a callback from that server. We assume this to be the case if we performed at least one successful FS operation on that server. (2) Send it to the address last used for that server rather than always picking the first address in the list (which might be unreachable). Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix address list parsing	David Howells
	The parsing of port specifiers in the address list obtained from the DNS resolution upcall doesn't work as in4_pton() and in6_pton() will fail on encountering an unexpected delimiter (in this case, the '+' marking the port number). However, in_pton() can't be given multiple specifiers. Fix this by finding the delimiter in advance and not relying on in_pton() to find the end of the address for us. Fixes: 8b2a464ced77 ("afs: Add an address list concept") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	afs: Fix directory page locking	David Howells
	The afs directory loading code (primarily afs_read_dir()) locks all the pages that hold a directory's content blob to defend against getdents/getdents races and getdents/lookup races where the competitors issue conflicting reads on the same data. As the reads will complete consecutively, they may retrieve different versions of the data and one may overwrite the data that the other is busy parsing. Fix this by not locking the pages at all, but rather by turning the validation lock into an rwsem and getting an exclusive lock on it whilst reading the data or validating the attributes and a shared lock whilst parsing the data. Sharing the attribute validation lock should be fine as the data fetch will retrieve the attributes also. The individual page locks aren't needed at all as the only place they're being used is to serialise data loading. Without this patch, the: if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) { ... } part of afs_read_dir() may be skipped, leaving the pages unlocked when we hit the success: clause - in which case we try to unlock the not-locked pages, leading to the following oops: page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0 flags: 0xfffe000001004(referenced\|private) raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000 page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:ffff98156b27c000 ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1205! ... RIP: 0010:unlock_page+0x43/0x50 ... Call Trace: afs_dir_iterate+0x789/0x8f0 [kafs] ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x166/0x1d0 ? afs_do_lookup+0x69/0x490 [kafs] ? afs_do_lookup+0x101/0x490 [kafs] ? key_default_cmp+0x20/0x20 ? request_key+0x3c/0x80 ? afs_lookup+0xf1/0x340 [kafs] ? __lookup_slow+0x97/0x150 ? lookup_slow+0x35/0x50 ? walk_component+0x1bf/0x490 ? path_lookupat.isra.52+0x75/0x200 ? filename_lookup.part.66+0xa0/0x170 ? afs_end_vnode_operation+0x41/0x60 [kafs] ? __check_object_size+0x9c/0x171 ? strncpy_from_user+0x4a/0x170 ? vfs_statx+0x73/0xe0 ? __do_sys_newlstat+0x39/0x70 ? __x64_sys_getdents+0xc9/0x140 ? __x64_sys_getdents+0x140/0x140 ? do_syscall_64+0x5b/0x160 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14	drm/i915/execlists: Use rmb() to order CSB reads	Chris Wilson
	We assume that the CSB is written using the normal ringbuffer coherency protocols, as outlined in kernel/events/ring_buffer.c: * (HW) (DRIVER) * * if (LOAD ->data_tail) { LOAD ->data_head * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->data_head STORE ->data_tail * } So we assume that the HW fulfils its ordering requirements (B), and so we should use a complimentary rmb (C) to ensure that our read of its WRITE pointer is completed before we start accessing the data. The final mb (D) is implied by the uncached mmio we perform to inform the HW of our READ pointer. References: https://bugs.freedesktop.org/show_bug.cgi?id=105064 References: https://bugs.freedesktop.org/show_bug.cgi?id=105888 References: https://bugs.freedesktop.org/show_bug.cgi?id=106185 Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP") References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer") Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Timo Aaltonen <tjaalton@ubuntu.com> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-1-chris@chris-wilson.co.uk (cherry picked from commit 77dfedb5be03779f9a5d83e323a1b36e32090105) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-05-14	drm/i915/userptr: reject zero user_size	Matthew Auld
	Operating on a zero sized GEM userptr object will lead to explosions. Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Testcase: igt/gem_userptr_blits/input-checking Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180502195021.30900-1-matthew.auld@intel.com (cherry picked from commit c11c7bfd213495784b22ef82a69b6489f8d0092f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-05-14	x86/pkeys: Do not special case protection key 0	Dave Hansen
	mm_pkey_is_allocated() treats pkey 0 as unallocated. That is inconsistent with the manpages, and also inconsistent with mm->context.pkey_allocation_map. Stop special casing it and only disallow values that are actually bad (< 0). The end-user visible effect of this is that you can now use mprotect_pkey() to set pkey=0. This is a bit nicer than what Ram proposed[1] because it is simpler and removes special-casing for pkey 0. On the other hand, it does allow applications to pkey_free() pkey-0, but that's just a silly thing to do, so we are not going to protect against it. The scenario that could happen is similar to what happens if you free any other pkey that is in use: it might get reallocated later and used to protect some other data. The most likely scenario is that pkey-0 comes back from pkey_alloc(), an access-disable or write-disable bit is set in PKRU for it, and the next stack access will SIGSEGV. It's not horribly different from if you mprotect()'d your stack or heap to be unreadable or unwritable, which is generally very foolish, but also not explicitly prevented by the kernel. 1. http://lkml.kernel.org/r/1522112702-27853-1-git-send-email-linuxram@us.ibm.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org>p Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 58ab9a088dda ("x86/pkeys: Check against max pkey to avoid overflows") Link: http://lkml.kernel.org/r/20180509171358.47FD785E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14	x86/pkeys/selftests: Add a test for pkey 0	Dave Hansen
	Protection key 0 is the default key for all memory and will not normally come back from pkey_alloc(). But, you might still want pass it to mprotect_pkey(). This check ensures that you can use pkey 0. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180509171356.9E40B254@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14	x86/pkeys/selftests: Save off 'prot' for allocations	Dave Hansen
	This makes it possible to to tell what 'prot' a given allocation is supposed to have. That way, if we want to change just the pkey, we know what 'prot' to pass to mprotect_pkey(). Also, keep a record of the most recent allocation so the tests can easily find it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180509171354.AA23E228@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14	x86/pkeys/selftests: Fix pointer math	Dave Hansen
	We dump out the entire area of the siginfo where the si_pkey_ptr is supposed to be. But, we do some math on the poitner, which is a u32. We intended to do byte math, not u32 math on the pointer. Cast it over to a u8* so it works. Also, move this block of code to below th si_code check. It doesn't hurt anything, but the si_pkey field is gibberish for other signal types. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180509171352.9BE09819@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>