linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2015-03-02	KVM: MIPS: Fix trace event to save PC directly	James Hogan
	Currently the guest exit trace event saves the VCPU pointer to the structure, and the guest PC is retrieved by dereferencing it when the event is printed rather than directly from the trace record. This isn't safe as the printing may occur long afterwards, after the PC has changed and potentially after the VCPU has been freed. Usually this results in the same (wrong) PC being printed for multiple trace events. It also isn't portable as userland has no way to access the VCPU data structure when interpreting the trace record itself. Lets save the actual PC in the structure so that the correct value is accessible later. Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # v3.10+ Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-02	Merge tag 'gpio-v4.0-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Two GPIO fixes: - Fix a translation problem in of_get_named_gpiod_flags() - Fix a long standing container_of() mistake in the TPS65912 driver" * tag 'gpio-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: tps65912: fix wrong container_of arguments gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per node
2015-03-02	Merge branch 'fixes-for-4.0-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal management fixes from Eduardo Valentin: "Specifics: - Several fixes in tmon tool. - Fixes in intel int340x for _ART and _TRT tables. - Add id for Avoton SoC into powerclamp driver. - Fixes in RCAR thermal driver to remove race conditions and fix fail path - Fixes in TI thermal driver: removal of unnecessary code and build fix if !CONFIG_PM_SLEEP - Cleanups in exynos thermal driver - Add stubs for include/linux/thermal.h. Now drivers using thermal calls but that also work without CONFIG_THERMAL will be able to compile for systems that don't care about thermal. Note: I am sending this pull on Rui's behalf while he fixes issues in his Linux box" * 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: thermal: int340x_thermal: Ignore missing _ART, _TRT tables thermal/intel_powerclamp: add id for Avoton SoC tools/thermal: tmon: silence 'set but not used' warnings tools/thermal: tmon: use pkg-config to determine library dependencies tools/thermal: tmon: support cross-compiling tools/thermal: tmon: add .gitignore tools/thermal: tmon: fixup tui windowing calculations tools/thermal: tmon: tui: don't hard-code dialog window size assumptions tools/thermal: tmon: add min/max macros tools/thermal: tmon: add --target-temp parameter thermal: exynos: Clean-up code to use oneline entry for exynos compatible table thermal: rcar: Make error and remove paths symmetrical with init thermal: rcar: Fix race condition between init and interrupt thermal: Introduce dummy functions when thermal is not defined ti-soc-thermal: Delete an unnecessary check before the function call "cpufreq_cooling_unregister" thermal: ti-soc-thermal: bandgap: Fix build warning if !CONFIG_PM_SLEEP
2015-03-02	Btrfs: incremental send, don't rename a directory too soon	Filipe Manana
	There's one more case where we can't issue a rename operation for a directory as soon as we process it. We used to delay directory renames only if they have some ancestor directory with a higher inode number that got renamed too, but there's another case where we need to delay the rename too - when a directory A is renamed to the old name of a directory B but that directory B has its rename delayed because it has now (in the send root) an ancestor with a higher inode number that was renamed. If we don't delay the directory rename in this case, the receiving end of the send stream will attempt to rename A to the old name of B before B got renamed to its new name, which results in a "directory not empty" error. So fix this by delaying directory renames for this case too. Steps to reproduce: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/a $ mkdir /mnt/b $ mkdir /mnt/c $ touch /mnt/a/file $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ mv /mnt/c /mnt/x $ mv /mnt/a /mnt/x/y $ mv /mnt/b /mnt/a $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send /mnt/snap1 -f /tmp/1.send $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt2 $ btrfs receive /mnt2 -f /tmp/1.send $ btrfs receive /mnt2 -f /tmp/2.send ERROR: rename b -> a failed. Directory not empty A test case for xfstests follows soon. Reported-by: Ames Cornish <ames@cornishes.net> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	btrfs: fix lost return value due to variable shadowing	David Sterba
	A block-local variable stores error code but btrfs_get_blocks_direct may not return it in the end as there's a ret defined in the function scope. CC: <stable@vger.kernel.org> # 3.6+ Fixes: d187663ef24c ("Btrfs: lock extents as we map them in DIO") Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr	Filipe Manana
	The return value from btrfs_lookup_xattr() can be a pointer encoding an error, therefore deal with it. This fixes commit 5f5bc6b1e2d5 ("Btrfs: make xattr replace operations atomic"). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	Btrfs: fix off-by-one logic error in btrfs_realloc_node	Filipe Manana
	The end_slot variable actually matches the number of pointers in the node and not the last slot (which is 'nritems - 1'). Therefore in order to check that the current slot in the for loop doesn't match the last one, the correct logic is to check if 'i' is less than 'end_slot - 1' and not 'end_slot - 2'. Fix this and set end_slot to be 'nritems - 1', as it's less confusing since the variable name implies it's inclusive rather then exclusive. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	Btrfs: add missing inode update when punching hole	Filipe Manana
	When punching a file hole if we endup only zeroing parts of a page, because the start offset isn't a multiple of the sector size or the start offset and length fall within the same page, we were not updating the inode item. This prevented an fsync from doing anything, if no other file changes happened in the current transaction, because the fields in btrfs_inode used to check if the inode needs to be fsync'ed weren't updated. This issue is easy to reproduce and the following excerpt from the xfstest case I made shows how to trigger it: _scratch_mkfs >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create our test file. $XFS_IO_PROG -f -c "pwrite -S 0x22 -b 16K 0 16K" \ $SCRATCH_MNT/foo \| _filter_xfs_io # Fsync the file, this makes btrfs update some btrfs inode specific fields # that are used to track if the inode needs to be written/updated to the fsync # log or not. After this fsync, the new values for those fields indicate that # a subsequent fsync does not need to touch the fsync log. $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo # Force a commit of the current transaction. After this point, any operation # that modifies the data or metadata of our file, should update those fields in # the btrfs inode with values that make the next fsync operation write to the # fsync log. sync # Punch a hole in our file. This small range affects only 1 page. # This made the btrfs hole punching implementation write only some zeroes in # one page, but it did not update the btrfs inode fields used to determine if # the next fsync needs to write to the fsync log. $XFS_IO_PROG -c "fpunch 8000 4K" $SCRATCH_MNT/foo # Another variation of the previously mentioned case. $XFS_IO_PROG -c "fpunch 15000 100" $SCRATCH_MNT/foo # Now fsync the file. This was a no-operation because the previous hole punch # operation didn't update the inode's fields mentioned before, so they remained # with the values they had after the first fsync - that is, they indicate that # it is not needed to write to fsync log. $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo echo "File content before:" od -t x1 $SCRATCH_MNT/foo # Simulate a crash/power loss. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Enable writes and mount the fs. This makes the fsync log replay code run. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Because the last fsync didn't do anything, here the file content matched what # it was after the first fsync, before the holes were punched, and not what it # was after the holes were punched. echo "File content after:" od -t x1 $SCRATCH_MNT/foo This issue has been around since 2012, when the punch hole implementation was added, commit 2aaa66558172 ("Btrfs: add hole punching"). A test case for xfstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	Btrfs: abort the transaction if we fail to update the free space cache inode	Josef Bacik
	Our gluster boxes were hitting a problem where they'd run out of space when updating the block group cache and therefore wouldn't be able to update the free space inode. This is a problem because this is how we invalidate the cache and protect ourselves from errors further down the stack, so if this fails we have to abort the transaction so we make sure we don't end up with stale free space cache. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	Btrfs: fix fsync race leading to ordered extent memory leaks	Filipe Manana
	We can have multiple fsync operations against the same file during the same transaction and they can collect the same ordered extents while they don't complete (still accessible from the inode's ordered tree). If this happens, those ordered extents will never get their reference counts decremented to 0, leading to memory leaks and inode leaks (an iput for an ordered extent's inode is scheduled only when the ordered extent's refcount drops to 0). The following sequence diagram explains this race: CPU 1 CPU 2 btrfs_sync_file() btrfs_sync_file() mutex_lock(inode->i_mutex) btrfs_log_inode() btrfs_get_logged_extents() --> collects ordered extent X --> increments ordered extent X's refcount btrfs_submit_logged_extents() mutex_unlock(inode->i_mutex) mutex_lock(inode->i_mutex) btrfs_sync_log() btrfs_wait_logged_extents() --> list_del_init(&ordered->log_list) btrfs_log_inode() btrfs_get_logged_extents() --> Adds ordered extent X to logged_list because at this point: list_empty(&ordered->log_list) && test_bit(BTRFS_ORDERED_LOGGED, &ordered->flags) == 0 --> Increments ordered extent X's refcount --> check if ordered extent's io is finished or not, start it if necessary and wait for it to finish --> sets bit BTRFS_ORDERED_LOGGED on ordered extent X's flags and adds it to trans->ordered btrfs_sync_log() finishes btrfs_submit_logged_extents() btrfs_log_inode() finishes mutex_unlock(inode->i_mutex) btrfs_sync_file() finishes btrfs_sync_log() btrfs_wait_logged_extents() --> Sees ordered extent X has the bit BTRFS_ORDERED_LOGGED set in its flags --> X's refcount is untouched btrfs_sync_log() finishes btrfs_sync_file() finishes btrfs_commit_transaction() --> called by transaction kthread for e.g. btrfs_wait_pending_ordered() --> waits for ordered extent X to complete --> decrements ordered extent X's refcount by 1 only, corresponding to the increment done by the fsync task ran by CPU 1 In the scenario of the above diagram, after the transaction commit, the ordered extent will remain with a refcount of 1 forever, leaking the ordered extent structure and preventing the i_count of its inode from ever decreasing to 0, since the delayed iput is scheduled only when the ordered extent's refcount drops to 0, preventing the inode from ever being evicted by the VFS. Fix this by using the flag BTRFS_ORDERED_LOGGED differently. Use it to mean that an ordered extent is already being processed by an fsync call, which will attach it to the current transaction, preventing it from being collected by subsequent fsync operations against the same inode. This race was introduced with the following change (added in 3.19 and backported to stable 3.18 and 3.17): Btrfs: make sure logged extents complete in the current transaction V3 commit 50d9aa99bd35c77200e0e3dd7a72274f8304701f I ran into this issue while running xfstests/generic/113 in a loop, which failed about 1 out of 10 runs with the following warning in dmesg: [ 2612.440038] WARNING: CPU: 4 PID: 22057 at fs/btrfs/disk-io.c:3558 free_fs_root+0x36/0x133 [btrfs]() [ 2612.442810] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop processor parport_pc parport psmouse therma l_sys i2c_piix4 serio_raw pcspkr evdev microcode button i2c_core ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic virtio_pci ata_piix virtio_ring libata virtio flo ppy e1000 scsi_mod [last unloaded: btrfs] [ 2612.452711] CPU: 4 PID: 22057 Comm: umount Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1 [ 2612.454921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2612.457709] 0000000000000009 ffff8801342c3c78 ffffffff8142425e ffff88023ec8f2d8 [ 2612.459829] 0000000000000000 ffff8801342c3cb8 ffffffff81045308 ffff880046460000 [ 2612.461564] ffffffffa036da56 ffff88003d07b000 ffff880046460000 ffff880046460068 [ 2612.463163] Call Trace: [ 2612.463719] [<ffffffff8142425e>] dump_stack+0x4c/0x65 [ 2612.464789] [<ffffffff81045308>] warn_slowpath_common+0xa1/0xbb [ 2612.466026] [<ffffffffa036da56>] ? free_fs_root+0x36/0x133 [btrfs] [ 2612.467247] [<ffffffff810453c5>] warn_slowpath_null+0x1a/0x1c [ 2612.468416] [<ffffffffa036da56>] free_fs_root+0x36/0x133 [btrfs] [ 2612.469625] [<ffffffffa036f2a7>] btrfs_drop_and_free_fs_root+0x93/0x9b [btrfs] [ 2612.471251] [<ffffffffa036f353>] btrfs_free_fs_roots+0xa4/0xd6 [btrfs] [ 2612.472536] [<ffffffff8142612e>] ? wait_for_completion+0x24/0x26 [ 2612.473742] [<ffffffffa0370bbc>] close_ctree+0x1f3/0x33c [btrfs] [ 2612.475477] [<ffffffff81059d1d>] ? destroy_workqueue+0x148/0x1ba [ 2612.476695] [<ffffffffa034e3da>] btrfs_put_super+0x19/0x1b [btrfs] [ 2612.477911] [<ffffffff81153e53>] generic_shutdown_super+0x73/0xef [ 2612.479106] [<ffffffff811540e2>] kill_anon_super+0x13/0x1e [ 2612.480226] [<ffffffffa034e1e3>] btrfs_kill_super+0x17/0x23 [btrfs] [ 2612.481471] [<ffffffff81154307>] deactivate_locked_super+0x3b/0x50 [ 2612.482686] [<ffffffff811547a7>] deactivate_super+0x3f/0x43 [ 2612.483791] [<ffffffff8116b3ed>] cleanup_mnt+0x59/0x78 [ 2612.484842] [<ffffffff8116b44c>] __cleanup_mnt+0x12/0x14 [ 2612.485900] [<ffffffff8105d019>] task_work_run+0x8f/0xbc [ 2612.486960] [<ffffffff810028d8>] do_notify_resume+0x5a/0x6b [ 2612.488083] [<ffffffff81236e5b>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 2612.489333] [<ffffffff8142a17f>] int_signal+0x12/0x17 [ 2612.490353] ---[ end trace 54a960a6bdcb8d93 ]--- [ 2612.557253] VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds. Have a nice day... Kmemleak confirmed the ordered extent leak (and btrfs inode specific structures such as delayed nodes): $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff880154290db0 (size 576): comm "btrfsck", pid 21980, jiffies 4295542503 (age 1273.412s) hex dump (first 32 bytes): 01 40 00 00 01 00 00 00 b0 1d f1 4e 01 88 ff ff .@.........N.... 00 00 00 00 00 00 00 00 c8 0d 29 54 01 88 ff ff ..........)T.... backtrace: [<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a [<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83 [<ffffffff8122fb26>] __radix_tree_create+0x109/0x190 [<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac [<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs] [<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs] [<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs] [<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs] [<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs] [<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed [<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa [<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b [<ffffffff81429e92>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff88014ef11db0 (size 576): comm "rm", pid 22009, jiffies 4295542593 (age 1273.052s) hex dump (first 32 bytes): 02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 c8 1d f1 4e 01 88 ff ff ...........N.... backtrace: [<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a [<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83 [<ffffffff8122fb26>] __radix_tree_create+0x109/0x190 [<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac [<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs] [<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs] [<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs] [<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs] [<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs] [<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed [<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa [<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b [<ffffffff81429e92>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff8800336feda8 (size 584): comm "aio-stress", pid 22031, jiffies 4295543006 (age 1271.400s) hex dump (first 32 bytes): 00 40 3e 00 00 00 00 00 00 00 8f 42 00 00 00 00 .@>........B.... 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ backtrace: [<ffffffff8114eb34>] create_object+0x172/0x29a [<ffffffff8141d790>] kmemleak_alloc+0x25/0x41 [<ffffffff81141ae6>] kmemleak_alloc_recursive.constprop.52+0x16/0x18 [<ffffffff81145288>] kmem_cache_alloc+0xf7/0x198 [<ffffffffa0389243>] __btrfs_add_ordered_extent+0x43/0x309 [btrfs] [<ffffffffa038968b>] btrfs_add_ordered_extent_dio+0x12/0x14 [btrfs] [<ffffffffa03810e2>] btrfs_get_blocks_direct+0x3ef/0x571 [btrfs] [<ffffffff81181349>] do_blockdev_direct_IO+0x62a/0xb47 [<ffffffff8118189a>] __blockdev_direct_IO+0x34/0x36 [<ffffffffa03776e5>] btrfs_direct_IO+0x16a/0x1e8 [btrfs] [<ffffffff81100373>] generic_file_direct_write+0xb8/0x12d [<ffffffffa038615c>] btrfs_file_write_iter+0x24b/0x42f [btrfs] [<ffffffff8118bb0d>] aio_run_iocb+0x2b7/0x32e [<ffffffff8118c99a>] do_io_submit+0x26e/0x2ff [<ffffffff8118ca3b>] SyS_io_submit+0x10/0x12 [<ffffffff81429e92>] system_call_fastpath+0x12/0x17 CC: <stable@vger.kernel.org> # 3.19, 3.18 and 3.17 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-03-02	KVM: SVM: fix interrupt injection (apic->isr_count always 0)	Radim Krčmář
	In commit b4eef9b36db4, we started to use hwapic_isr_update() != NULL instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not change apic->isr_count, because it should always be 1. The initial value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm), which is always 0 for SVM, so KVM could have injected interrupts when it shouldn't. Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the initial isr_count depend on hwapic_isr_update() for good measure. Fixes: b4eef9b36db4 ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-02	Merge tag 'md/4.0-fixes' of git://neil.brown.name/md	Linus Torvalds
	Pull md fixes from Neil Brown: "Three md fixes: - fix a read-balance problem that was reported 2 years ago, but that I never noticed the report :-( - fix for rare RAID6 problem causing incorrect bitmap updates when two devices fail. - add __ATTR_PREALLOC annotation now that it is possible" * tag 'md/4.0-fixes' of git://neil.brown.name/md: md: mark some attributes as pre-alloc raid5: check faulty flag for array status during recovery. md/raid1: fix read balance when a drive is write-mostly.
2015-03-02	Merge tag 'metag-fixes-v4.0-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull arch/metag fix from James Hogan: "This is just a single patch to fix the KSTK_EIP() and KSTK_ESP() macros for metag which have always been erronously returning the PC and stack pointer of the task's kernel context rather than from its user context saved at entry from userland into the kernel, which affects the contents of /proc/<pid>/maps and /proc/<pid>/stat" * tag 'metag-fixes-v4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: metag: Fix KSTK_EIP() and KSTK_ESP() macros
2015-03-02	cpuidle: Clean up fallback handling in cpuidle_idle_call()	Rafael J. Wysocki
	Move the fallback code path in cpuidle_idle_call() to the end of the function to avoid jumping to a label in an if () branch. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-02	Merge branch 'mlx4'	David S. Miller
	Or Gerlitz says: ==================== Mellanox driver fixes Two small fixes, please apply to net. Both patches should go to 3.19.y too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02	net/mlx4_en: Disbale GRO for incoming loopback/selftest packets	Ido Shamay
	Packets which are sent from the selftest (ethtool) flow, should not be passed to GRO stack but rather dropped by the driver after validation. To achieve that, we disable GRO for the duration of the selftest. Fixes: dd65beac48a5 ("net/mlx4_en: Extend usage of napi_gro_frags") Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02	net/mlx4_core: Fix wrong mask and error flow for the update-qp command	Or Gerlitz
	The bit mask for currently supported driver features (MLX4_UPDATE_QP_SUPPORTED_ATTRS) of the update-qp command was defined twice (using enum value and pre-processor define directive) and wrong. The return value of the call to mlx4_update_qp() from within the SRIOV resource-tracker was wrongly voided down. Fix both issues. issue: none Fixes: 09e05c3f78e9 ('net/mlx4: Set vlan stripping policy by the right command') Fixes: ce8d9e0d6746 ('net/mlx4_core: Add UPDATE_QP SRIOV wrapper support') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02	dmaengine: shdma: Move DMA stop to (runtime) suspend callbacks	Geert Uytterhoeven
	During system reboot, the sh-dma-engine device may be runtime-suspended, causing a crash: Unhandled fault: imprecise external abort (0x1406) at 0x0002c02c Internal error: : 1406 [#1] SMP ARM ... PC is at sh_dmae_ctl_stop+0x28/0x64 LR is at sh_dmae_ctl_stop+0x24/0x64 If the sh-dma-engine is runtime-suspended, its module clock is turned off, and its registers cannot be accessed. To fix this, move the call to sh_dmae_ctl_stop(), which touches the DMAOR register, to the sh_dmae_suspend() and sh_dmae_runtime_suspend() callbacks. This makes PM operations more symmetric, as both sh_dmae_resume() and sh_dmae_runtime_resume() already call sh_dmae_rst() to re-initialize the DMAOR register. Remove sh_dmae_shutdown(), as it became empty. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-03-02	ASoC: sam9g20_wm8731: drop machine_is_xxx	Alexandre Belloni
	Atmel based boards can now only be used with device tree. Drop non DT initialization. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-03-02	Merge tag 'efi-urgent' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: " - Fix regression in DMI sysfs code for handling "End of Table" entry and a type bug that could lead to integer overflow. (Ivan Khoronzhuk) - Fix boundary checking in efi_high_alloc() which can lead to memory corruption in the EFI boot stubs. (Yinghai Lu)" Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-02	MAINTAINERS: Add entry for SAMSUNG THERMAL DRIVER	Lukasz Majewski
	This patch adds entry for SAMSUNG THERMAL DRIVER in the MAINTAINERS file. It has been agreed, that pull request are going to be sent to Eduardo Valentin. Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Acked-by: Eduardo Valentin <edubezval@gmail.com>
2015-03-02	cpufreq: exynos: Use simple approach to asses if cpu cooling can be used	Lukasz Majewski
	Commit: e725d26c4857e5e41975b5e74e64ce6ab09a7121 provided possibility to use device tree to asses if cpu can be used as cooling device. Since the code was somewhat awkward, simpler approach has been proposed. Test HW: Exynos 4412 - Odroid U3. Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-03-02	thermal: exynos: Fix wrong control of power down detection mode for Exynos7	Chanwoo Choi
	This patch fixes the wrong control of PD_DET_EN (power down detection mode) for Exynos7 because exynos7_tmu_control() always enables the power down detection mode regardless 'on' parameter. Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Abhilash Kesavan <a.kesavan@samsung.com> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
2015-03-02	USB: ch341: set tty baud speed according to tty struct	Nicolas PLANEL
	The ch341_set_baudrate() function initialize the device baud speed according to the value on priv->baud_rate. By default the ch341_open() set it to a hardcoded value (DEFAULT_BAUD_RATE 9600). Unfortunately, the tty_struct is not initialized with the same default value. (usually 56700) This means that the tty_struct and the device baud rate generator are not synchronized after opening the port. Fixup is done by calling ch341_set_termios() if tty exist. Remove unnecessary variable priv->baud_rate setup as it's already done by ch341_port_probe(). Remove unnecessary call to ch341_set_{handshake,baudrate}() in ch341_open() as there already called in ch341_configure() and ch341_set_termios() Signed-off-by: Nicolas PLANEL <nicolas.planel@enovance.com> Signed-off-by: Johan Hovold <johan@kernel.org>
2015-03-01	NFSv4: Don't call put_rpccred() under the rcu_read_lock()	Trond Myklebust
	put_rpccred() can sleep. Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()") Cc: stable@vger.kernel.org # 2.6.35+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01	NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()	Trond Myklebust
	If the server does not return a valid set of attributes that we can use to either create a file or refresh the inode, then there is no value in calling nfs_prime_dcache(). However if we're just refreshing the inode using the attributes that the server returned, then it shouldn't matter whether or not we have a filehandle, as long as we check the fsid+fileid combination. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01	NFSv3: Use the readdir fileid as the mounted-on-fileid	Trond Myklebust
	When we call readdirplus, set the fileid normally returned by readdir as the mounted-on-fileid, since that is commonly the case if there is a mountpoint. To ensure that we get it right, we only set the flag if the readdir fileid differs from the one returned in the readdirplus attributes. This again means that we can avoid the issues described in commit 2ef47eb1aee17 ("NFS: Fix use of nfs_attr_use_mounted_on_fileid()"), which only fixed NFSv4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01	NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()	Trond Myklebust
	If we're traversing a directory which contains a submounted filesystem, or one that has a referral, the NFS server that is processing the READDIR request will often return information for the underlying (mounted-on) directory. It may, or may not, also return filehandle information. If this happens, and the lookup in nfs_prime_dcache() returns the dentry for the submounted directory, the filehandle comparison will fail, and we call d_invalidate(). Post-commit 8ed936b5671bf ("vfs: Lazily remove mounts on unlinked files and directories."), this means the entire subtree is unmounted. The following minimal patch addresses this problem by punting on the invalidation if there is a submount. Kudos to Neil Brown <neilb@suse.de> for having tracked down this issue (see link). Reported-by: Nix <nix@esperi.org.uk> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-01	NFSv4: Set a barrier in the update_changeattr() helper	Trond Myklebust
	Ensure that we don't regress the changes that were made to the directory. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Fix nfs_post_op_update_inode() to set an attribute barrier	Trond Myklebust
	nfs_post_op_update_inode() is called after a self-induced attribute update. Ensure that it also sets the barrier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Remove size hack in nfs_inode_attrs_need_update()	Trond Myklebust
	Prior to this patch, we used to always OK attribute updates that extended the file size on the assumption that we might be performing writeback. Now that we have attribute barriers to protect the writeback related updates, we should remove this hack, as it can cause truncate() operations to apparently be reverted if/when a readahead or getattr RPC call races with our on-the-wire SETATTR. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommit	Trond Myklebust
	Ensure that other operations that race with delegreturn and layoutcommit cannot revert the attribute updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Add attribute update barriers to NFS writebacks	Trond Myklebust
	Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Set an attribute barrier on all updates	Trond Myklebust
	Ensure that we update the attribute barrier even if there were no invalidations, provided that this value is newer than the old one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Add attribute update barriers to nfs_setattr_update_inode()	Trond Myklebust
	Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Add a helper to set attribute barriers	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	NFS: Ensure that buffered writes wait for O_DIRECT writes to complete	Trond Myklebust
	The O_DIRECT code will grab the inode->i_mutex and flush out buffered writes, before scheduling a read or a write. However there is no equivalent in the buffered write code to wait for O_DIRECT to complete. Fixes a reported issue in xfstests generic/133, when first performing an O_DIRECT write followed by a buffered write. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2015-03-01	mei: make device disabled on stop unconditionally	Alexander Usyskin
	Set the internal device state to to disabled after hardware reset in stop flow. This will cover cases when driver was not brought to disabled state because of an error and in stop flow we wish not to retry the reset. Cc: <stable@vger.kernel.org> #3.10+ Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01	staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel	Ian Abbott
	Reading of analog input channels by the `INSN_READ` comedi instruction is broken for all except channel 0. `pci171x_ai_insn_read()` calls `pci171x_ai_read_sample()` with the wrong value for the third parameter. It is supposed to be the current index in a channel list (which is always of length 1 in this case, so the index should be 0), but instead it is passing the actual channel number. `pci171x_ai_read_sample()` checks the channel number encoded in the raw sample value read from the hardware matches the channel number stored in the specified index of the previously set up channel list and returns `-ENODATA` if it doesn't match. Since the index should always be 0 in this case, the match will fail unless the channel number is also 0. Fix it by passing 0 as the channel index. Note that when the bug first appeared, it was `pci171x_ai_dropout()` that was called with the wrong parameter value. `pci171x_ai_dropout()` got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2500d ("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()"). Fixes: 16c7eb6047bb ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Cc: stable <stable@vger.kernel.org> # 3.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01	android: binder: fix binder mmap failures	Andrey Ryabinin
	binder_update_page_range() initializes only addr and size fields in 'struct vm_struct tmp_area;' and passes it to map_vm_area(). Before 71394fe50146 ("mm: vmalloc: add flag preventing guard hole allocation") this was because map_vm_area() didn't use any other fields in vm_struct except addr and size. Now get_vm_area_size() (used in map_vm_area()) reads vm_struct's flags to determine whether vm area has guard hole or not. binder_update_page_range() don't initialize flags field, so this causes following binder mmap failures: -----------[ cut here ]------------ WARNING: CPU: 0 PID: 1971 at mm/vmalloc.c:130 vmap_page_range_noflush+0x119/0x144() CPU: 0 PID: 1971 Comm: healthd Not tainted 4.0.0-rc1-00399-g7da3fdc-dirty #157 Hardware name: ARM-Versatile Express [<c001246d>] (unwind_backtrace) from [<c000f7f9>] (show_stack+0x11/0x14) [<c000f7f9>] (show_stack) from [<c049a221>] (dump_stack+0x59/0x7c) [<c049a221>] (dump_stack) from [<c001cf21>] (warn_slowpath_common+0x55/0x84) [<c001cf21>] (warn_slowpath_common) from [<c001cfe3>] (warn_slowpath_null+0x17/0x1c) [<c001cfe3>] (warn_slowpath_null) from [<c00c66c5>] (vmap_page_range_noflush+0x119/0x144) [<c00c66c5>] (vmap_page_range_noflush) from [<c00c716b>] (map_vm_area+0x27/0x48) [<c00c716b>] (map_vm_area) from [<c038ddaf>] (binder_update_page_range+0x12f/0x27c) [<c038ddaf>] (binder_update_page_range) from [<c038e857>] (binder_mmap+0xbf/0x1ac) [<c038e857>] (binder_mmap) from [<c00c2dc7>] (mmap_region+0x2eb/0x4d4) [<c00c2dc7>] (mmap_region) from [<c00c3197>] (do_mmap_pgoff+0x1e7/0x250) [<c00c3197>] (do_mmap_pgoff) from [<c00b35b5>] (vm_mmap_pgoff+0x45/0x60) [<c00b35b5>] (vm_mmap_pgoff) from [<c00c1f39>] (SyS_mmap_pgoff+0x5d/0x80) [<c00c1f39>] (SyS_mmap_pgoff) from [<c000ce81>] (ret_fast_syscall+0x1/0x5c) ---[ end trace 48c2c4b9a1349e54 ]--- binder: 1982: binder_alloc_buf failed to map page at f0e00000 in kernel binder: binder_mmap: 1982 b6bde000-b6cdc000 alloc small buf failed -12 Use map_kernel_range_noflush() instead of map_vm_area() as this is better API for binder's purposes and it allows to get rid of 'vm_struct tmp_area' at all. Fixes: 71394fe50146 ("mm: vmalloc: add flag preventing guard hole allocation") Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Amit Pundir <amit.pundir@linaro.org> Tested-by: Amit Pundir <amit.pundir@linaro.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A CR4-shadow 32-bit init fix, plus two typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() x86/cpu/intel: Fix trivial typo in intel_tlb_table[]
2015-03-01	Merge branch 'timers-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Three clockevents/clocksource driver fixes" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: pxa: Fix section mismatch clocksource: mtk: Fix race conditions in probe code clockevents: asm9260: Fix compilation error with sparc/sparc64 allyesconfig
2015-03-01	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two kprobes fixes and a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Make sparc64 arch point to sparc perf symbols: Define EM_AARCH64 for older OSes perf top: Fix SIGBUS on sparc64 perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag perf tools: Fix pthread_attr_setaffinity_np build error perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check perf bench: Fix order of arguments to memcpy_alloc_mem kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace
2015-03-01	Merge branch 'locking-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "An rtmutex deadlock path fixlet" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Set state back to running on error
2015-03-01	Merge branch 'bcmgenet_systemport_stats'	David S. Miller
	Florian Fainelli says: ==================== net: bcmgenet and systemport statistics fixes This two patches fix a similar problem in the GENET and SYSTEMPORT drivers for software maintained statistics used to track DMA mapping and SKB re-allocation failures. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01	net: systemport: fix software maintained statistics	Florian Fainelli
	Commit 60b4ea1781fd ("net: systemport: log RX buffer allocation and RX/TX DMA failures") added a few software maintained statistics using BCM_SYSPORT_STAT_MIB_RX and BCM_SYSPORT_STAT_MIB_TX. These statistics are read from the hardware MIB counters, such that bcm_sysport_update_mib_counters() was trying to read from a non-existing MIB offset for these counters. Fix this by introducing a special type: BCM_SYSPORT_STAT_SOFT, similar to BCM_SYSPORT_STAT_NETDEV, such that bcm_sysport_get_ethtool_stats will read from the software mib. Fixes: 60b4ea1781fd ("net: systemport: log RX buffer allocation and RX/TX DMA failures") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01	net: bcmgenet: fix software maintained statistics	Florian Fainelli
	Commit 44c8bc3ce39f ("net: bcmgenet: log RX buffer allocation and RX/TX dma failures") added a few software maintained statistics using BCMGENET_STAT_MIB_RX and BCMGENET_STAT_MIB_TX. These statistics are read from the hardware MIB counters, such that bcmgenet_update_mib_counters() was trying to read from a non-existing MIB offset for these counters. Fix this by introducing a special type: BCMGENET_STAT_SOFT, similar to BCMGENET_STAT_NETDEV, such that bcmgenet_get_ethtool_stats will read from the software mib. Fixes: 44c8bc3ce39f ("net: bcmgenet: log RX buffer allocation and RX/TX dma failures") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01	rxrpc: don't multiply with HZ twice	Florian Westphal
	rxrpc_resend_timeout has an initial value of 4 * HZ; use it as-is. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01	rxrpc: terminate retrans loop when sending of skb fails	Florian Westphal
	Typo, 'stop' is never set to true. Seems intent is to not attempt to retransmit more packets after sendmsg returns an error. This change is based on code inspection only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01	net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR ↵	Arvid Brodin
	interface. To repeat: $ sudo ip link del hsr0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8187f495>] hsr_del_port+0x15/0xa0 etc... Bug description: As part of the hsr master device destruction, hsr_del_port() is called for each of the hsr ports. At each such call, the master device is updated regarding features and mtu. When the master device is freed before the slave interfaces, master will be NULL in hsr_del_port(), which led to a NULL pointer dereference. Additionally, dev_put() was called on the master device itself in hsr_del_port(), causing a refcnt error. A third bug in the same code path was that the rtnl lock was not taken before hsr_del_port() was called as part of hsr_dev_destroy(). The reporter (Nicolas Dichtel) also said: "hsr_netdev_notify() supposes that the port will always be available when the notification is for an hsr interface. It's wrong. For example, netdev_wait_allrefs() may resend NETDEV_UNREGISTER.". As a precaution against this, a check for port == NULL was added in hsr_dev_notify(). Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: 51f3c605318b056a ("net/hsr: Move slave init to hsr_slave.c.") Signed-off-by: Arvid Brodin <arvid.brodin@alten.se> Signed-off-by: David S. Miller <davem@davemloft.net>