git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2015-03-24	iommu/vt-d: Remove unused variable	Alex Williamson
	Unused after commit 71684406905f ("iommu/vt-d: Detach domain only from attached iommus"). Reported by 0-day builder. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-24	iwlwifi: Fix memory leak in iwl_req_fw_callback()	Larry Finger
	In this routine, kzalloc allocates a memory block. This allocation is freed in the error paths, but not in the normal exit, thus the allocation is leaked. The kmemleak facility was used to find the leak. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Cc: Intel Linux Wireless <ilw@linux.intel.com>
2015-03-24	x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user	Denys Vlasenko
	Having syscall32/sysenter32 initialization in a separate tiny function, called from within a function that is already syscall init specific, serves no real purpose. Its existense also caused an unintended effect of having wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy function returning -ENOSYS, and immediately after (if CONFIG_IA32_EMULATION), we set it to point to the proper syscall32 entry point, ia32_cstar_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	tcp: prevent fetching dst twice in early demux code	Michal Kubeček
	On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux() struct dst_entry *dst = sk->sk_rx_dst; if (dst) dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie); to code reading sk->sk_rx_dst twice, once for the test and once for the argument of ip6_dst_check() (dst_check() is inline). This allows ip6_dst_check() to be called with null first argument, causing a crash. Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6 TCP early demux code. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23	bnx2x: Fix statistics locking scheme	Yuval Mintz
	Statistics' state-machine in bnx2x driver must be synced with various driver flows, but its current locking scheme manages to be wasteful [using 2 locks + additional local variable] and prone to race-conditions at the same time, as the state-machine and 'action' are being accessed under different locks. In addition, current 'safe exec' isn't in fact safe, since the only guarantee it gives is that DMA transactions are over, but ramrods might still be running. This patch cleans up said logic, leaving us with a single lock for the entire flow and removing the possible races. Changes from v2: - Switched into mutex locking from semaphore locking. - Release locks on error flows. Changes from v1: Failure to acquire lock fails flow instead of printing a warning and allowing access to the critical section. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23	Merge tag 'linux-can-fixes-for-4.0-20150322' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-03-22 this is a pull-request of 7 patches for net/master. Ahmed S. Darwish fixes another two problems in the kvaser_usb driver. A patch by Colin Ian King for the gs_usb driver adds a missing check for kzalloc allocation failures. Two patches by Stephane Grosjean for the peak_usb driver add missing support for ISO / non-ISO mode switching. Andri Yngvason contributes a patch to fix the state handling in the flexcan driver. The last patch by Andreas Werner for the flexcan driver add missing EPROBE_DEFER handling for the transceiver regulator. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24	drm: Fixup racy refcounting in plane_force_disable	Daniel Vetter
	Originally it was impossible to be dropping the last refcount in this function since there was always one around still from the idr. But in commit 83f45fc360c8e16a330474860ebda872d1384c8c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Aug 6 09:10:18 2014 +0200 drm: Don't grab an fb reference for the idr we've switched to weak references, broke that assumption but forgot to fix it up. Since we still force-disable planes it's only possible to hit this when racing multiple rmfb with fbdev restoring or similar evil things. As long as userspace is nice it's impossible to hit the BUG_ON. But the BUG_ON would most likely be hit from fbdev code, which usually invovles the console_lock besides all modeset locks. So very likely we'd never get the bug reports if this was hit in the wild, hence better be safe than sorry and backport. Spotted by Matt Roper while reviewing other patches. [airlied: pull this back into 4.0 - the oops happens there] Cc: stable@vger.kernel.org Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-23	kvm: avoid page allocation failure in kvm_set_memory_region()	Igor Mammedov
	KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23	KVM: x86: inline kvm_ioapic_handles_vector()	Radim Krčmář
	An overhead from function call is not appropriate for its size and frequency of execution. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23	Merge tag 'kvm-s390-next-20150318' of ↵	Marcelo Tosatti
	git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue KVM: s390: Features and fixes for 4.1 (kvm/next) 1. Fixes 2. Implement access register mode in KVM 3. Provide a userspace post handler for the STSI instruction 4. Provide an interface for compliant memory accesses 5. Provide an interface for getting/setting the guest storage key 6. Fixup for the vector facility patches: do not announce the vector facility in the guest for old QEMUs. 1-5 were initially shown as RFC in http://www.spinics.net/lists/kvm/msg114720.html some small review changes - added some ACKs - have the AR mode patches first - get rid of unnecessary AR_INVAL define - typos and language 6. two new patches The two new patches fixup the vector support patches that were introduced in the last pull request for QEMU versions that dont know about vector support and guests that do. (We announce the facility bit, but dont enable the facility so vector aware guests will crash on vector instructions).
2015-03-23	KVM: x86: call irq notifiers with directed EOI	Radim Krčmář
	kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211 Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23	x86: kvm: Revert "remove sched notifier for cross-cpu migrations"	Marcelo Tosatti
	The following point: 2. per-CPU pvclock time info is updated if the underlying CPU changes. Is not true anymore since "KVM: x86: update pvclock area conditionally, on cpu migration". Add task migration notification back. Problem noticed by Andy Lutomirski. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> CC: stable@kernel.org # 3.11+
2015-03-23	dm: fix add_disk() NULL pointer due to race with free_dev()	Mike Snitzer
	Commit c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") exposed DM to a latent race in free_dev() vs add_disk() in relation to management of the device's minor number. Fix this by refactoring free_dev() to match cleanup order of the alloc_dev() error path. Move cleanup of the gendisk, queue, and bdev to _before_ the cleanup of the idr managed minor number. Also, purely due to cleanup that fell out during the free_dev() audit: - adjust dm_blk_close() to access the gendisk's private_data under the _minor_lock spinlock. - move __dm_destroy()'s dm_get_live_table() call out from under the _minor_lock spinlock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449 Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-03-23	usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers	Lu Baolu
	When a device with an isochronous endpoint is plugged into the Intel xHCI host controller, and the driver submits multiple frames per URB, the xHCI driver will set the Block Event Interrupt (BEI) flag on all but the last TD for the URB. This causes the host controller to place an event on the event ring, but not send an interrupt. When the last TD for the URB completes, BEI is cleared, and we get an interrupt for the whole URB. However, under Intel xHCI host controllers, if the event ring is full of events from transfers with BEI set, an "Event Ring is Full" event will be posted to the last entry of the event ring, but no interrupt is generated. Host will cease all transfer and command executions and wait until software completes handling the pending events in the event ring. That means xHC stops, but event of "event ring is full" is not notified. As the result, the xHC looks like dead to user. This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And it should be backported to kernels as old as 3.0, that contains the commit 69e848c2090a ("Intel xhci: Support EHCI/xHCI port switching."). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alistair Grant <akgrant0710@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23	usb: xhci: handle Config Error Change (CEC) in xhci driver	Lu Baolu
	Linux xHCI driver doesn't report and handle port cofig error change. If Port Configure Error for root hub port occurs, CEC bit in PORTSC would be set by xHC and remains 1. This happends when the root port fails to configure its link partner, e.g. the port fails to exchange port capabilities information using Port Capability LMPs. Then the Port Status Change Events will be blocked until all status change bits(CEC is one of the change bits) are cleared('0') (refer to xHCI spec 4.19.2). Otherwise, the port status change event for this root port will not be generated anymore, then root port would look like dead for user and can't be recovered until a Host Controller Reset(HCRST). This patch is to check CEC bit in PORTSC in xhci_get_port_status() and set a Config Error in the return status if CEC is set. This will cause a ClearPortFeature request, where CEC bit is cleared in xhci_clear_port_change_bit(). [The commit log is based on initial Marvell patch posted at http://marc.info/?l=linux-kernel&m=142323612321434&w=2] Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: stable <stable@vger.kernel.org> # v3.2+ Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23	Merge remote-tracking branches 'regulator/fix/doc' and ↵	Mark Brown
	'regulator/fix/palmas' into regulator-linus
2015-03-23	arm64: Use the reserved TTBR0 if context switching to the init_mm	Catalin Marinas
	The idle_task_exit() function may call switch_mm() with next == &init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so this patch simply sets the reserved TTBR0. Cc: <stable@vger.kernel.org> Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org> Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-23	Input: synaptics - add quirk for Thinkpad E440	Ramiro Morales
	Its ClickPad shares PNP ID "LEN2006" with the one in model E540 which is already handled by the driver (both are Haswell iterations of the Edge line, launched in 2014) but the dimensions it reports are different: $ sudo ./touchpad-edge-detector /dev/input/event3 Touchpad SynPS/2 Synaptics TouchPad on /dev/input/event3 Move one finger around the touchpad to detect the actual edges Kernel says: x [1472..5044], y [1408..3398] Touchpad sends: x [1024..5045], y [2457..4832] /^C Fortunately we can use the board ID, which is also different, to distinguish among them. $ dmesg \| grep -i synaptics psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd001a3/0x940300/0x127c00, board id: 2691, fw id: 1494646 psmouse serio1: synaptics: serio: Synaptics pass-through port at isa0060/serio1/input0 input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input4 Board ID in E540 is 2722: psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd001a3/0x940300/0x127c00, board id: 2722, fw id: 1484859 (from https://launchpadlibrarian.net/179702965/BootDmesg.txt) Signed-off-by: Ramiro Morales <cramm0@gmail.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-03-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Validate iov ranges before feeding them into iov_iter_init(), from Al Viro. 2) We changed copy_from_msghdr_from_user() to zero out the msg_namelen is a NULL pointer is given for the msg_name. Do the same in the compat code too. From Catalin Marinas. 3) Fix partially initialized tuples in netfilter conntrack helper, from Ian Wilson. 4) Missing continue; statement in nft_hash walker can lead to crashes, from Herbert Xu. 5) tproxy_tg6_check looks for IP6T_INV_PROTO in ->flags instead of ->invflags, fix from Pablo Neira Ayuso. 6) Incorrect memory account of TCP FINs can result in negative socket memory accounting values. Fix from Josh Hunt. 7) Don't allow virtual functions to enable VLAN promiscuous mode in be2net driver, from Vasundhara Volam. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set cx82310_eth: wait for firmware to become ready net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour be2net: use PCI MMIO read instead of config read for errors be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs be2net: Prevent VFs from enabling VLAN promiscuous mode tcp: fix tcp fin memory accounting ipv6: fix backtracking for throw routes net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5} ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check() netfilter: restore rule tracing via nfnetlink_log netfilter: nf_tables: allow to change chain policy without hook if it exists netfilter: Fix potential crash in nft_hash walker netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
2015-03-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc	Linus Torvalds
	Pull sparc fixes from David Miller: "Some perf bug fixes from David Ahern, and the fix for that nasty memmove() bug" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix several bugs in memmove(). sparc: Touch NMI watchdog when walking cpus and calling printk sparc: perf: Add support M7 processor sparc: perf: Make counting mode actually work sparc: perf: Remove redundant perf_pmu_{en\|dis}able calls
2015-03-23	ALSA: hda - Add dock support for Thinkpad T450s (17aa:5036)	Sebastian Wicki
	This model uses the same dock port as the previous generation. Signed-off-by: Sebastian Wicki <gandro@gmx.net> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-03-23	sparc64: Fix several bugs in memmove().	David S. Miller
	Firstly, handle zero length calls properly. Believe it or not there are a few of these happening during early boot. Next, we can't just drop to a memcpy() call in the forward copy case where dst <= src. The reason is that the cache initializing stores used in the Niagara memcpy() implementations can end up clearing out cache lines before we've sourced their original contents completely. For example, considering NG4memcpy, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call. That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. To avoid this, just fall through to a simple copy only mildly optimized for the case where src and dst are 8 byte aligned and the length is a multiple of 8 as well. We could get fancy and call GENmemcpy() but this is good enough for how this thing is actually used. Reported-by: David Ahern <david.ahern@oracle.com> Reported-by: Bob Picco <bpicco@meloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23	Input: ALPS - fix max coordinates for v5 and v7 protocols	Dmitry Torokhov
	Commit 3296f71cd2fde7a2ad52e66a27eae419f6328066 ("Input: ALPS - consolidate setting protocol parameters") inadvertently moved call to alps_dolphin_get_device_area() from v5 to v7 protocol, causing both protocols report incorrect maximum values for X and Y axes which resulted in crash in Synaptics X driver. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94801 Reported-by: Santiago Gala <sgala@apache.org> Reported-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-03-23	writeback: fix possible underflow in write bandwidth calculation	Tejun Heo
	From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Mon, 23 Mar 2015 00:08:19 -0400 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty") introduced account_page_redirty() which reverts stat updates for a redirtied page, making BDI_DIRTIED no longer monotonically increasing. bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the basis for bandwidth calculation. While unlikely, since the above patch, the newer value may be lower than the recorded past value and underflow the bandwidth calculation leading to a wild result. Fix it by subtracing min of the old and new values when calculating delta. AFAIK, there hasn't been any report of it happening but the resulting erratic behavior would be non-critical and temporary, so it's possible that the issue is happening without being reported. The risk of the fix is very low, so tagged for -stable. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Greg Thelen <gthelen@google.com> Fixes: 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-23	NVMe: Initialize device list head before starting	Keith Busch
	Driver recovery requires the device's list node to have been initialized. Fixes: https://lkml.org/lkml/2015/3/22/262 Reported-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-23	xen/balloon: before adding hotplugged memory, set frames to invalid	Juergen Gross
	Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set regions above the end of RAM as 1:1") introduced a regression. To be able to add memory pages which were added via memory hotplug to a pv domain, the pages must be "invalid" instead of "identity" in the p2m list before they can be added. Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 3.16+ Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-03-23	x86/xen: prepare p2m list for memory hotplug	Juergen Gross
	Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear virtual mapped sparse p2m list") introduced a regression regarding to memory hotplug for a pv-domain: as the virtual space for the p2m list is allocated for the to be expected memory size of the domain only, hotplugged memory above that size will not be usable by the domain. Correct this by using a configurable size for the p2m list in case of memory hotplug enabled (default supported memory size is 512 GB for 64 bit domains and 4 GB for 32 bit domains). Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 3.19+ Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-03-23	iommu: ipmmu-vmsa: Add terminating entry for ipmmu_of_ids	Axel Lin
	The of_device_id table is supposed to be zero-terminated. Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-23	iommu/vt-d: Detach domain only from attached iommus	Alex Williamson
	Device domains never span IOMMU hardware units, which allows the domain ID space for each IOMMU to be an independent address space. Therefore we can have multiple, independent domains, each with the same domain->id, but attached to different hardware units. This is also why we need to do a heavy-weight search for VM domains since they can span multiple IOMMUs hardware units and we don't require a single global ID to use for all hardware units. Therefore, if we call iommu_detach_domain() across all active IOMMU hardware units for a non-VM domain, the result is that we clear domain IDs that are not associated with our domain, allowing them to be re-allocated and causing apparent coherency issues when the device cannot access IOVAs for the intended domain. This bug was introduced in commit fb170fb4c548 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability"), but is significantly exacerbated by the more recent commit 62c22167dd70 ("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls domain_exit() more frequently to resolve a domain leak. Fixes: fb170fb4c548 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-23	iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS condition	Baptiste Reynal
	This patch is a fix to "iommu/arm-smmu: add support for iova_to_phys through ATS1PR". According to ARM documentation, translation registers are optional even in SMMUv1, so ID0_S1TS needs to be checked to verify their presence. Also, we check that the domain is a stage-1 domain. Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-23	x86/asm/entry/64: Fix incorrect comment	Denys Vlasenko
	The recent old_rsp -> rsp_scratch rename also changed this comment, but in this case "old_rsp" was not referring to PER_CPU(old_rsp). Fix this. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427115839-6397-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	metag: Fix ioremap_wc/ioremap_cached build errors	James Hogan
	When ioremap_wc() or ioremap_cached() are used without first including asm/pgtable.h, the _PAGE_CACHEABLE or _PAGE_WR_COMBINE definitions aren't found, resulting in build errors like the following (in next-20150323 due to "lib: devres: add a helper function for ioremap_wc"): lib/devres.c: In function ‘devm_ioremap_wc’: lib/devres.c:91: error: ‘_PAGE_WR_COMBINE’ undeclared We can't easily include asm/pgtable.h in asm/io.h due to dependency problems, so split out the _PAGE_* definitions from asm/pgtable.h into a separate asm/pgtable-bits.h header (as a couple of other architectures already do), and include that in io.h instead. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23	parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BIT	Helge Deller
	Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on CONFIG_64BIT. The reason is, that a 64bit kernel with a page size greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2. Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	parisc: mm: don't count preallocated pmds	Mikulas Patocka
	The patch dc6c9a35b66b520cf67e05d8ca60ebecad3b0479 that counts pmds allocated for a process introduced a bug on 64-bit PA-RISC kernels. The PA-RISC architecture preallocates one pmd with each pgd. This preallocated pmd can never be freed - pmd_free does nothing when it is called with this pmd. When the kernel attempts to free this preallocated pmd, it decreases the count of allocated pmds. The result is that the counter underflows and this error is reported. This patch fixes the bug by artifically incrementing the counter in pmd_free when the kernel tries to free the preallocated pmd. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	x86/asm/entry: Replace some open-coded VM86 checks with v8086_mode() checks	Andy Lutomirski
	This allows us to remove some unnecessary ifdefs. There should be no change to the generated code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f7e00f0d668e253abf0bd8bf36491ac47bd761ff.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Remove user_mode_vm()	Andy Lutomirski
	It has no callers anymore. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a594afd6a0bddb1311bd7c92a15201c87fbb8681.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'	Andy Lutomirski
	user_mode_vm() and user_mode() are now the same. Change all callers of user_mode_vm() to user_mode(). The next patch will remove the definition of user_mode_vm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org [ Merged to a more recent kernel. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode	Andy Lutomirski
	user_mode() is now identical to user_mode_vm(). Subsequent patches will change all callers of user_mode_vm() to user_mode() and then delete user_mode_vm(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0dd03eacb5f0a2b5ba0240de25347a31b493c289.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Use user_mode_ignore_vm86() where appropriate	Andy Lutomirski
	A few of the user_mode() checks in traps.c are immediately after explicit checks for vm86 mode. Change them to user_mode_ignore_vm86(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0b324d5b75c3402be07f8d3c6245ed7f4995029e.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()	Andy Lutomirski
	There's no point in checking the VM bit on 64-bit, and, since we're explicitly checking it, we can use user_mode_ignore_vm86() after the check. While we're at it, rearrange the #ifdef slightly to make the code flow a bit clearer. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	x86/asm/entry: Add user_mode_ignore_vm86()	Andy Lutomirski
	user_mode() is dangerous and user_mode_vm() has a confusing name. Add user_mode_ignore_vm86() (equivalent to current user_mode()). We'll change the small number of legitimate users of user_mode() to user_mode_ignore_vm86(). Inspired by grsec, although this works rather differently. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/202c56ca63823c338af8e2e54948dbe222da6343.1426728647.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts	Ingo Molnar
	Conflicts: arch/x86/kernel/entry_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	parisc: Add compile-time check when adding new syscalls	Helge Deller
	Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23	sched/rt: Use IPI to trigger RT task push migration instead of pulling	Steven Rostedt
	When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joern Engel <joern@purestorage.com> Cc: Clark Williams <williams@redhat.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	irq_work: Fix build failure when CONFIG_IRQ_WORK is not defined	Steven Rostedt
	When CONFIG_IRQ_WORK is not defined (difficult to do, as it also requires CONFIG_PRINTK not to be defined), we get a build failure: kernel/built-in.o: In function `flush_smp_call_function_queue': kernel/smp.c:263: undefined reference to `irq_work_run' kernel/smp.c:263: undefined reference to `irq_work_run' Makefile:933: recipe for target 'vmlinux' failed Simplest thing to do is to make irq_work_run() a nop when not set. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20150319101851.4d224d9b@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵	Ingo Molnar
	applying new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop	Preeti U Murthy
	The hrtimer mode of broadcast queues hrtimers in the idle entry path so as to wakeup cpus in deep idle states. The associated call graph is : cpuidle_idle_call() \|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....)) \|_____tick_broadcast_set_event() \|____clockevents_program_event() \|____bc_set_next() The hrtimer_{start/cancel} functions call into tracing which uses RCU. But it is not legal to call into RCU in cpuidle because it is one of the quiescent states. Hence protect this region with RCU_NONIDLE which informs RCU that the cpu is momentarily non-idle. As an aside it is helpful to point out that the clock event device that is programmed here is not a per-cpu clock device; it is a pseudo clock device, used by the broadcast framework alone. The per-cpu clock device programming never goes through bc_set_next(). Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: linuxppc-dev@ozlabs.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	lockdep: Fix the module unload key range freeing logic	Peter Zijlstra
	Module unload calls lockdep_free_key_range(), which removes entries from the data structures. Most of the lockdep code OTOH assumes the data structures are append only; in specific see the comments in add_lock_to_list() and look_up_lock_class(). Clearly this has only worked by accident; make it work proper. The actual scenario to make it go boom would involve the memory freed by the module unlock being re-allocated and re-used for a lock inside of a rcu-sched grace period. This is a very unlikely scenario, still better plug the hole. Use RCU list iteration in all places and ammend the comments. Change lockdep_free_key_range() to issue a sync_sched() between removal from the lists and returning -- which results in the memory being freed. Further ensure the callers are placed correctly and comment the requirements. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Tsyvarev <tsyvarev@ispras.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	sched: Fix RLIMIT_RTTIME when PI-boosting to RT	Brian Silverman
	When non-realtime tasks get priority-inheritance boosted to a realtime scheduling class, RLIMIT_RTTIME starts to apply to them. However, the counter used for checking this (the same one used for SCHED_RR timeslices) was not getting reset. This meant that tasks running with a non-realtime scheduling class which are repeatedly boosted to a realtime one, but never block while they are running realtime, eventually hit the timeout without ever running for a time over the limit. This patch resets the realtime timeslice counter when un-PI-boosting from an RT to a non-RT scheduling class. I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT mutex which induces priority boosting and spins while boosted that gets killed by a SIGXCPU on non-fixed kernels but doesn't with this patch applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and does happen eventually with PREEMPT_VOLUNTARY kernels. Signed-off-by: Brian Silverman <brian@peloton-tech.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: austin@peloton-tech.com Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23	perf: Fix irq_work 'tail' recursion	Peter Zijlstra
	Vince reported a watchdog lockup like: [<ffffffff8115e114>] perf_tp_event+0xc4/0x210 [<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160 [<ffffffff810b7f10>] lock_release+0x130/0x260 [<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40 [<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80 [<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0 [<ffffffff811f71ce>] send_sigio+0xae/0x100 [<ffffffff811f72b7>] kill_fasync+0x97/0xf0 [<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0 [<ffffffff8115d103>] perf_pending_event+0x33/0x60 [<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80 [<ffffffff8114e448>] irq_work_run+0x18/0x40 [<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0 [<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80 Which is caused by an irq_work generating new irq_work and therefore not allowing forward progress. This happens because processing the perf irq_work triggers another perf event (tracepoint stuff) which in turn generates an irq_work ad infinitum. Avoid this by raising the recursion counter in the irq_work -- which effectively disables all software events (including tracepoints) from actually triggering again. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>