summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-19platform:x86: Add Intel Telemetry Debugfs interfacesSouvik Kumar Chakravarty
This implements debugfs interfaces for reading the telemetry samples from SSRAM and configuring firmware trace verbosity. Interface created under /sys/kernel/debug/telemetry soc_states: SoC Device and Low Power States pss_info: Info from the Primary SubSystem ioss_info: Info from IO SubSusytem pss_trace_verbosity: Read/Modify PSS F/W trace verbosity ioss_trace_verbosity: Read/Modify IOSS F/W trace verbosity. Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform:x86: Add Intel telemetry platform deviceSouvik Kumar Chakravarty
Telemetry Device is created by the pmc_ipc driver. Resources are populated according SSRAM region as indicated by the BIOS tables. Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform:x86: Add Intel telemetry platform driverSouvik Kumar Chakravarty
Telemetry platform driver implements the telemetry interfaces. Currently it supports ApolloLake. It uses the PUNIT and PMC IPC interfaces to configure the telemetry samples to read. The samples are read from a Secure SRAM region. Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform/x86: Add Intel Telemetry Core DriverSouvik Kumar Chakravarty
Intel PM Telemetry is a software mechanism via which various SoC PM and performance related parameters like PM counters, firmware trace verbosity, the status of different devices inside the SoC, etc. can be monitored and analyzed. The different samples that may be monitored can be configured at runtime via exported APIs. This patch adds the telemetry core driver that implements basic exported APIs. Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19intel_punit_ipc: add NULL check for input parametersQipeng Zha
intel_punit_ipc_command() maybe called when in or out data pointers are NULL. Signed-off-by: Qipeng Zha <qipeng.zha@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19thinkpad_acpi: Add support for keyboard backlightPali Rohár
This patch adds support for controlling keyboard backlight via standard linux led class interface (::kbd_backlight). It uses ACPI HKEY device with MLCG and MLCS methods. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Fabio D'Urso <fabiodurso@hotmail.it> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19dell-wmi: Process only one event on devices with interface version 0Pali Rohár
BIOS/ACPI on devices with WMI interface version 0 does not clear buffer before filling it. So next time when BIOS/ACPI send WMI event which is smaller as previous then it contains garbage in buffer from previous event. BIOS/ACPI on devices with WMI interface version 1 clears buffer and sometimes send more events in buffer at one call. Since commit 83fc44c32ad8 ("dell-wmi: Update code for processing WMI events") dell-wmi process all events in buffer (and not just first). To prevent reading garbage from the buffer we process only the first event on devices with WMI interface version 0. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19dell-wmi: Check if Dell WMI descriptor structure is validPali Rohár
After examining existing DSDT ACPI tables of more laptops and looking into Dell WMI document mentioned in ML dicussion archived at http://www.spinics.net/lists/platform-driver-x86/msg07220.html we will parse and check WMI descriptor if contains expected data. It is because WMI descriptor contains interface version number and it is needed to know in next commit. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19tc1100-wmi: fix build warning when CONFIG_PM not enabledColin Ian King
Conditionally declare suspend_data on CONFIG_PM to avoid the following warning when CONFIG_OM is not enabled: drivers/platform/x86/tc1100-wmi.c:55:27: warning: 'suspend_data' defined but not used [-Wunused-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19asus-wireless: Add ACPI HID ATK4001João Paulo Rechi Vita
As reported in https://bugzilla.kernel.org/show_bug.cgi?id=98931#c22 in the Asus UX31A the Asus Wireless Radio Control device (ASHS) uses the HID "ATK4001". Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform/x86: Add Asus Wireless Radio Control driverJoão Paulo Rechi Vita
Some Asus notebooks like the Asus E202SA and the Asus X555UB have a separate ACPI device for notifications from the airplane mode hotkey. This device is called "Wireless Radio Control" in Asus websites and ASHS in the DSDT, and its ACPI _HID is ATK4002 in the two models mentioned above. For these models, when the airplane mode hotkey (Fn+F2) is pressed, a query 0x0B is started in the Embedded Controller, and all this query does is a notify ASHS with the value 0x88 (for acpi_osi >= "Windows 2012"): Scope (_SB.PCI0.SBRG.EC0) { (...) Method (_Q0B, 0, NotSerialized) // _Qxx: EC Query { If ((MSOS () >= OSW8)) { Notify (ASHS, 0x88) // Device-Specific } Else { (...) } } } Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19asus-wmi: drop to_platform_driver macroGeliang Tang
to_platform_driver has been defined in platform_device.h, so drop this repetitive macro in asus-wmi.c. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19intel-hid: new hid event driver for hotkeysAlex Hung
This driver supports various HID events including hotkeys. Dell XPS 13 9350 requires it for the wireless hotkey. Signed-off-by: Alex Hung <alex.hung@canonical.com> Reviewed-and-tested-by: Andy Lutomirski <luto@kernel.org> [dvhart: Kconfig help typo fix and INPUT_SPARSEKMAP fix from Sedat Dilek] Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19pipe: limit the per-user amount of pages allocated in pipesWilly Tarreau
On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user soft limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each, as well as a hard limit above which no new pipes may be created for this user. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit are controlled by two new sysctls : pipe-user-pages-soft, and pipe-user-pages-hard. Both may be disabled by setting them to zero. The default soft limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB = 1084 MB of memory allocated for a user. The hard limit is disabled by default to avoid breaking existing applications that make intensive use of pipes (eg: for splicing). Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem update from James Morris: "A CVE fix and a maintainers file update" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: KEYS: Fix keyring ref leak in join_session_keyring() Fix the MAINTAINERS record for the certs/ directory
2016-01-20KEYS: Fix keyring ref leak in join_session_keyring()Yevgeny Pats
This fixes CVE-2016-0728. If a thread is asked to join as a session keyring the keyring that's already set as its session, we leak a keyring reference. This can be tested with the following program: #include <stddef.h> #include <stdio.h> #include <sys/types.h> #include <keyutils.h> int main(int argc, const char *argv[]) { int i = 0; key_serial_t serial; serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, "leaked-keyring"); if (serial < 0) { perror("keyctl"); return -1; } if (keyctl(KEYCTL_SETPERM, serial, KEY_POS_ALL | KEY_USR_ALL) < 0) { perror("keyctl"); return -1; } for (i = 0; i < 100; i++) { serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, "leaked-keyring"); if (serial < 0) { perror("keyctl"); return -1; } } return 0; } If, after the program has run, there something like the following line in /proc/keys: 3f3d898f I--Q--- 100 perm 3f3f0000 0 0 keyring leaked-keyring: empty with a usage count of 100 * the number of times the program has been run, then the kernel is malfunctioning. If leaked-keyring has zero usages or has been garbage collected, then the problem is fixed. Reported-by: Yevgeny Pats <yevgeny@perception-point.io> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-01-19Keyboard backlight control for some Vaio Fit modelsMattia Dongili
SVF1521P6EW, SVF1521DCXW, SVF13N1L2ES and likely most SVF*. do not expose separate timeout controls in auto mode. Signed-off-by: Dominik Matta <dominik@matta.sk> Signed-off-by: Mattia Dongili <malattia@linux.it> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform/x86: Add rfkill dependency to ACPI_TOSHIBA entryAzael Avalos
Commit 2fdde83443aa ("toshiba_acpi: Add WWAN RFKill support") added WWAN rfkill support to the driver, but the KConfig entry was not updated to add the RFKill dependency, causing a broken build if RFKill is not selected. This patch adds the RFKILL dependency to the KConfig entry, fixing the build issue. Signed-off-by: Azael Avalos <coproscefalo@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19platform:x86: add Intel P-Unit mailbox IPC driverQipeng Zha
This driver provides support for P-Unit mailbox IPC on Intel platforms. The heart of the P-Unit is the Foxton microcontroller and its firmware, which provide mailbox interface for power management usage. Signed-off-by: Qipeng Zha <qipeng.zha@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19intel_pmc_ipc: update acpi resource structure for PunitQipeng Zha
BIOS restructure exported memory resources for Punit in acpi table, So update resources for Punit. Signed-off-by: Qipeng Zha <qipeng.zha@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi listJosh Boyer
One of the newest ideapad models also lacks a physical hw rfkill switch, and trying to read the hw rfkill switch through the ideapad module causes it to always reported blocking breaking wifi. Fix it by adding this model to the DMI list. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1286293 Cc: stable@vger.kernel.org Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19dell-wmi: Improve unknown hotkey handlingAndy Lutomirski
If DMI lists a hotkey that we don't recognize, log and ignore it instead of trying to map it to keycode 0. I haven't seen this happen, but it will help maintain the key map in the future and it will help avoid sending bogus events. This also improves the message that we log when we get an unknown key event. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> [dvhart: remove BUILD_BUG_ON per mutual agreement on list] Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19apple-gmux: Assign apple_gmux_data before registeringMatthew Garrett
Registering the handler after both GPUs will trigger a DDC switch for connector reprobing. This will oops if apple_gmux_data hasn't already been assigned. Reorder the code to do that. [Lukas: More generally, this commit fixes a race condition that is triggered by invoking a handler callback between the call to vga_switcheroo_register_handler() and the assignment of apple_gmux_data.] Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MBP 5,3 2009 nvidia MCP79 + G96 pre-retina 15"] Tested-by: Paul Hordiienko <pvt.gord@gmail.com> [MBP 6,2 2010 intel ILK + nvidia GT216 pre-retina 15"] Tested-by: Lukas Wunner <lukas@wunner.de> [MBP 9,1 2012 intel IVB + nvidia GK107 pre-retina 15"] Tested-by: William Brown <william@blackhats.net.au> [MBP 8,2 2011 intel SNB + amd turks pre-retina 15"] Tested-by: Bruno Bierbaumer <bruno@bierbaumer.net> [MBP 11,3 2013 intel HSW + nvidia GK107 retina 15"] Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19toshiba_acpi: Fix keyboard backlight sysfs entries not being updatedAzael Avalos
Certain Toshiba models with the second generation keyboard backlight (type 2) do not generate the keyboard backlight changed event (0x92), and thus, the sysfs entries are never being updated. This patch adds a workquee and a global boolean variable to address the issue. For those models that do generate the event, the sysfs entries are being updated via the *notify function and the boolean is set to true to avoid a second call to update the entries. For those models that do not generate the event, the workquee is used to update the sysfs entries and also to emulate the event via netlink, to make userspace aware of such change. Signed-off-by: Azael Avalos <coproscefalo@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "We don't have a lot of core changes this time around, it's mostly in drivers, which will come in a subsequent pull. The cores changes include: - blk-mq - Prep patch from Christoph, changing blk_mq_alloc_request() to take flags instead of just using gfp_t for sleep/nosleep. - Doc patch from me, clarifying the difference between legacy and blk-mq for timer usage. - Fixes from Raghavendra for memory-less numa nodes, and a reuse of CPU masks. - Cleanup from Geliang Tang, using offset_in_page() instead of open coding it. - From Ilya, rename request_queue slab to it reflects what it holds, and a fix for proper use of bdgrab/put. - A real fix for the split across stripe boundaries from Keith. We yanked a broken version of this from 4.4-rc final, this one works. - From Mike Krinkin, emit a trace message when we split. - From Wei Tang, two small cleanups, not explicitly clearing memory that is already cleared" * 'for-4.5/core' of git://git.kernel.dk/linux-block: block: use bd{grab,put}() instead of open-coding block: split bios to max possible length block: add call to split trace point blk-mq: Avoid memoryless numa node encoded in hctx numa_node blk-mq: Reuse hardware context cpumask for tags blk-mq: add a flags parameter to blk_mq_alloc_request Revert "blk-flush: Queue through IO scheduler when flush not required" block: clarify blk_add_timer() use case for blk-mq bio: use offset_in_page macro block: do not initialise statics to 0 or NULL block: do not initialise globals to 0 or NULL block: rename request_queue slab cache
2016-01-19Merge tag 'iommu-updates-v4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The updates include: - Small code cleanups in the AMD IOMMUv2 driver - Scalability improvements for the DMA-API implementation of the AMD IOMMU driver. This is just a starting point, but already showed some good improvements in my tests. - Removal of the unused Renesas IPMMU/IPMMUI driver - Updates for ARM-SMMU include: * Some fixes to get the driver working nicely on Broadcom hardware * A change to the io-pgtable API to indicate the unit in which to flush (all callers converted, with Ack from Laurent) * Use of devm_* for allocating/freeing the SMMUv3 buffers - Some other small fixes and improvements for other drivers" * tag 'iommu-updates-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (46 commits) iommu/vt-d: Fix up error handling in alloc_iommu iommu/vt-d: Check the return value of iommu_device_create() iommu/amd: Remove an unneeded condition iommu/amd: Preallocate dma_ops apertures based on dma_mask iommu/amd: Use trylock to aquire bitmap_lock iommu/amd: Make dma_ops_domain->next_index percpu iommu/amd: Relax locking in dma_ops path iommu/amd: Initialize new aperture range before making it visible iommu/amd: Build io page-tables with cmpxchg64 iommu/amd: Allocate new aperture ranges in dma_ops_alloc_addresses iommu/amd: Optimize dma_ops_free_addresses iommu/amd: Remove need_flush from struct dma_ops_domain iommu/amd: Iterate over all aperture ranges in dma_ops_area_alloc iommu/amd: Flush iommu tlb in dma_ops_free_addresses iommu/amd: Rename dma_ops_domain->next_address to next_index iommu/amd: Remove 'start' parameter from dma_ops_area_alloc iommu/amd: Flush iommu tlb in dma_ops_aperture_alloc() iommu/amd: Retry address allocation within one aperture iommu/amd: Move aperture_range.offset to another cache-line iommu/amd: Add dma_ops_aperture_alloc() function ...
2016-01-19find_filesystem(): simplify comparisonAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-19Merge branches 's390', 'arm/renesas', 'arm/msm', 'arm/shmobile', 'arm/smmu', ↵Joerg Roedel
'x86/amd' and 'x86/vt-d' into next
2016-01-19cpuidle: menu: Avoid pointless checks in menu_select()Rafael J. Wysocki
If menu_select() cannot find a suitable state to return, it will return the state index stored in data->last_state_idx. This means that it is pointless to look at the states whose indices are less than or equal to data->last_state_idx in the main loop, so don't do that. Given that those checks are done on every idle state selection, this change can save quite a bit of completely unnecessary overhead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com>
2016-01-19sched / idle: Drop default_idle_call() fallback from call_cpuidle()Rafael J. Wysocki
After commit 9c4b2867ed7c (cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0) it is clear that menu_select() cannot return negative values. Moreover, ladder_select_state() will never return a negative value too, so make find_deepest_state() return non-negative values too and drop the default_idle_call() fallback from call_cpuidle(). This eliminates one branch from the idle loop and makes the governors and find_deepest_state() handle the case when all states have been disabled from sysfs consistently. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com>
2016-01-19btrfs: remove duplicate const specifierColin Ian King
duplicate const is redundant so remove it Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-19Fix the MAINTAINERS record for the certs/ directoryDavid Howells
Fix the MAINTAINERS record for the certs/ directory to have the new keyrings mailing list and also to be authoritative for the sign-file tool Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio barrier rework+fixes from Michael Tsirkin: "This adds a new kind of barrier, and reworks virtio and xen to use it. Plus some fixes here and there" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits) checkpatch: add virt barriers checkpatch: check for __smp outside barrier.h checkpatch.pl: add missing memory barriers virtio: make find_vqs() checkpatch.pl-friendly virtio_balloon: fix race between migration and ballooning virtio_balloon: fix race by fill and leak s390: more efficient smp barriers s390: use generic memory barriers xen/events: use virt_xxx barriers xen/io: use virt_xxx barriers xenbus: use virt_xxx barriers virtio_ring: use virt_store_mb sh: move xchg_cmpxchg to a header by itself sh: support 1 and 2 byte xchg virtio_ring: update weak barriers to use virt_xxx Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" asm-generic: implement virt_xxx memory barriers x86: define __smp_xxx xtensa: define __smp_xxx tile: define __smp_xxx ...
2016-01-19cpupower: Fix build error in cpufreq-infoShreyas B. Prabhu
Fix the following build error by including limits.h - utils/cpufreq-info.c: In function ‘get_latency’: utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in this function) if (!latency || latency == UINT_MAX) { ^ Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Fixes: e98f033f94f3 (cpupower: fix how "cpupower frequency-info" interprets latency) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-19Merge branch 'xfs-misc-fixes-for-4.5-3' into for-nextDave Chinner
2016-01-19xfs: log mount failures don't wait for buffers to be releasedDave Chinner
Recently I've been seeing xfs/051 fail on 1k block size filesystems. Trying to trace the events during the test lead to the problem going away, indicating that it was a race condition that lead to this ASSERT failure: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156 ..... [<ffffffff814e1257>] xfs_free_perag+0x87/0xb0 [<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900 [<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0 [<ffffffff811d8800>] mount_bdev+0x180/0x1b0 [<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20 [<ffffffff811d90a8>] mount_fs+0x38/0x170 [<ffffffff811f4347>] vfs_kern_mount+0x67/0x120 [<ffffffff811f7018>] do_mount+0x218/0xd60 [<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0 When I finally caught it with tracing enabled, I saw that AG 2 had an elevated reference count and a buffer was responsible for it. I tracked down the specific buffer, and found that it was missing the final reference count release that would put it back on the LRU and hence be found by xfs_wait_buftarg() calls in the log mount failure handling. The last four traces for the buffer before the assert were (trimmed for relevance) kworker/0:1-5259 xfs_buf_iodone: hold 2 lock 0 flags ASYNC kworker/0:1-5259 xfs_buf_ioerror: hold 2 lock 0 error -5 mount-7163 xfs_buf_lock_done: hold 2 lock 0 flags ASYNC mount-7163 xfs_buf_unlock: hold 2 lock 1 flags ASYNC This is an async write that is completing, so there's nobody waiting for it directly. Hence we call xfs_buf_relse() once all the processing is complete. That does: static inline void xfs_buf_relse(xfs_buf_t *bp) { xfs_buf_unlock(bp); xfs_buf_rele(bp); } Now, it's clear that mount is waiting on the buffer lock, and that it has been released by xfs_buf_relse() and gained by mount. This is expected, because at this point the mount process is in xfs_buf_delwri_submit() waiting for all the IO it submitted to complete. The mount process, however, is waiting on the lock for the buffer because it is in xfs_buf_delwri_submit(). This waits for IO completion, but it doesn't wait for the buffer reference owned by the IO to go away. The mount process collects all the completions, fails the log recovery, and the higher level code then calls xfs_wait_buftarg() to free all the remaining buffers in the filesystem. The issue is that on unlocking the buffer, the scheduler has decided that the mount process has higher priority than the the kworker thread that is running the IO completion, and so immediately switched contexts to the mount process from the semaphore unlock code, hence preventing the kworker thread from finishing the IO completion and releasing the IO reference to the buffer. Hence by the time that xfs_wait_buftarg() is run, the buffer still has an active reference and so isn't on the LRU list that the function walks to free the remaining buffers. Hence we miss that buffer and continue onwards to tear down the mount structures, at which time we get find a stray reference count on the perag structure. On a non-debug kernel, this will be ignored and the structure torn down and freed. Hence when the kworker thread is then rescheduled and the buffer released and freed, it will access a freed perag structure. The problem here is that when the log mount fails, we still need to quiesce the log to ensure that the IO workqueues have returned to idle before we run xfs_wait_buftarg(). By synchronising the workqueues, we ensure that all IO completions are fully processed, not just to the point where buffers have been unlocked. This ensures we don't end up in the situation above. cc: <stable@vger.kernel.org> # 3.18 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-19Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"Dave Chinner
This reverts commit 24ba16bb3d499c49974669cd8429c3e4138ab102 as it prevents machines from suspending. This regression occurs when the xfsaild is idle on entry to suspend, and so there s no activity to wake it from it's idle sleep and hence see that it is supposed to freeze. Hence the freezer times out waiting for it and suspend is cancelled. There is no obvious fix for this short of freezing the filesystem properly, so revert this change for now. cc: <stable@vger.kernel.org> # 4.4 Signed-off-by: Dave Chinner <david@fromorbit.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-19Merge branch 'xfs-setxattr-promotion' into for-nextDave Chinner
2016-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds
Pull arch/tile updates from Chris Metcalf: "This is a grab bag of changes that includes some NOHZ and context-tracking related changes, some debugging improvements, JUMP_LABEL support, and some fixes for tilepro allmodconfig support. We also remove the now-unused node_has_online_mem() definitions both for tile's asm/topology.h as well as in linux/topology.h itself" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: numa: remove stale node_has_online_mem() define arch/tile: move user_exit() to early kernel entry sequence tile: fix bug in setting PT_FLAGS_DISABLE_IRQ on kernel entry tile: fix tilepro casts for readl, writel, etc tile: fix a -Wframe-larger-than warning tile: include the syscall number in the backtrace MAINTAINERS: add git URL for tile arch/tile: adopt prepare_exit_to_usermode() model from x86 tile/jump_label: add jump label support for TILE-Gx tile: define a macro ktext_writable_addr to get writable kernel text address
2016-01-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 updates from Hans-Christian Noren Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: mmc: atmel: get rid of struct mci_dma_data mmc: atmel-mci: restore dma on AVR32 avr32: wire up missing syscalls avr32: wire up accept4 syscall
2016-01-18Merge branch 'for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has our usual assortment of fixes and cleanups, but the biggest change included is Omar Sandoval's free space tree. It's not the default yet, mounting -o space_cache=v2 enables it and sets a readonly compat bit. The tree can actually be deleted and regenerated if there are any problems, but it has held up really well in testing so far. For very large filesystems (30T+) our existing free space caching code can end up taking a huge amount of time during commits. The new tree based code is faster and less work overall to update as the commit progresses. Omar worked on this during the summer and we'll hammer on it in production here at FB over the next few months" * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits) Btrfs: fix fitrim discarding device area reserved for boot loader's use Btrfs: Check metadata redundancy on balance btrfs: statfs: report zero available if metadata are exhausted btrfs: preallocate path for snapshot creation at ioctl time btrfs: allocate root item at snapshot ioctl time btrfs: do an allocation earlier during snapshot creation btrfs: use smaller type for btrfs_path locks btrfs: use smaller type for btrfs_path lowest_level btrfs: use smaller type for btrfs_path reada btrfs: cleanup, use enum values for btrfs_path reada btrfs: constify static arrays btrfs: constify remaining structs with function pointers btrfs tests: replace whole ops structure for free space tests btrfs: use list_for_each_entry* in backref.c btrfs: use list_for_each_entry_safe in free-space-cache.c btrfs: use list_for_each_entry* in check-integrity.c Btrfs: use linux/sizes.h to represent constants btrfs: cleanup, remove stray return statements btrfs: zero out delayed node upon allocation btrfs: pass proper enum type to start_transaction() ...
2016-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull more networking fixes from David Miller: 1) Fix brcmfmac build with older gcc, from Arend van Spriel. 2) IRQ values unintentionally truncated to u8 in mlx5 driver, from Doron Tsur. 3) Fix build warnings wrt tcp cgroup changes, from Geert Uytterhoeven. 4) Limit deep recursion in ovs stack, from Hannes Frederic Sowa. 5) at803x phy driver bug fixes from, Martin Blumenstingl. 6) Fix TSO handling in hns driver, from Daode Huang * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits) ovs: limit ovs recursions in ovs_execute_actions to not corrupt stack team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid net: hns: bug fix about hisilicon TSO BD mode brcmfmac: fix BRCMF_FW_NVRAM_DEF macro for older gcc compilers net: phy: at803x: Add the interrupt register bit definitions net: phy: at803x: Clean up duplicate register definitions net: phy: at803x: Allow specifying the RGMII RX clock delay via phy mode net: phy: at803x: Don't set gbit features for the AR8030 phy arm64: bpf: add extra pass to handle faulty codegen arm64: insn: remove BUG_ON from codegen sctp: the temp asoc's transports should not be hashed/unhashed net/mlx5_core: Fix trimming down IRQ number tcp_memcontrol: Forward declare cgroup_subsys and mem_cgroup stucts batman-adv: Drop immediate orig_node free function batman-adv: Drop immediate batadv_hard_iface free function batman-adv: Drop immediate neigh_ifinfo free function batman-adv: Drop immediate batadv_hardif_neigh_node free function batman-adv: Drop immediate batadv_neigh_node free function batman-adv: Drop immediate batadv_orig_ifinfo free function batman-adv: Avoid recursive call_rcu for batadv_nc_node ...
2016-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ideLinus Torvalds
Pull IDE updates from David Miller: "Just a few small changes this merge window, marking ops const, printf string type fixes, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide: drivers/ide: make ide-scan-pci.c driver explicitly non-modular ide: constify ide_dma_ops structures ide: silence some underflow warnings
2016-01-18Merge tag 'rtc-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Core: - fix module reference count in rtc-proc - Replace simple_strtoul by kstrtoul New driver: - Epson RX8010SJ Subsystem wide cleanups: - use %ph for short hex dumps - constify *_chip_ops structures Drivers: - abx80x: Microcrystal rv1805 support, alarm support - cmos: prevent kernel warning on IRQ flags mismatch - s5m: various cleanups - rv8803: rx8900 compatibility, small error path fix - sunxi: various cleanups - lpc32xx: remove irq > NR_IRQS check from probe() - imxdi: fix spelling mistake in warning message - ds1685: don't try to micromanage sysfs output size - da9063: avoid writing undefined data to rtc - gemini: Remove unnecessary platform_set_drvdata() - efi: add efi_procfs in efi_rtc_ops - pcf8523: refuse to write dates later than 2099" * tag 'rtc-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (24 commits) rtc: cmos: prevent kernel warning on IRQ flags mismatch rtc: rtc-ds2404: constify ds2404_chip_ops structures rtc: s5m: Make register configuration per S2MPS device to remove exceptions rtc: s5m: Add separate field for storing auto-cleared mask in register config rtc: s5m: Cleanup by removing useless 'rtc' prefix from fields rtc: Replace simple_strtoul by kstrtoul rtc: abx80x: add alarm support rtc: abx80x: Add Microcrystal rv1805 support rtc: v3020: constify v3020_chip_ops structures rtc: rv8803: Extend compatibility with the rx8900 rtc: rv8803: fix handling return value of i2c_smbus_read_byte_data rtc: Add Epson RX8010SJ RTC driver rtc: lpc32xx: remove irq > NR_IRQS check from probe() rtc: imxdi: fix spelling mistake in warning message rtc: ds1685: don't try to micromanage sysfs output size rtc: use %ph for short hex dumps rtc: da9063: avoid writing undefined data to rtc rtc: sunxi: use of_device_get_match_data rtc: sunxi: constify the data_year_param structure rtc: sunxi: fix signedness issues ...
2016-01-18Merge tag 'fbdev-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev updates from Tomi Valkeinen: "Summary: - pxafb: device-tree support - An unsafe kernel parameter 'lockless_register_fb' for debugging problems happening while inside the console lock - Small miscellaneous fixes & cleanups - omapdss: add writeback support functions - Separation of omapfb and omapdrm (see below) About the separation of omapfb and omapdrm, see http://permalink.gmane.org/gmane.comp.video.dri.devel/143151 for longer story. The short version: omapfb and omapdrm have shared low level drivers (omapdss and panel drivers), making further development of omapdrm difficult. After these patches omapfb and omapdrm have their own versions of the drivers, which are more or less direct copies for now but will diverge soon. This also means that omapfb (everything under drivers/video/fbdev/omap2/) is now in maintenance mode, and all new development will be done for omapdrm (drivers/gpu/drm/omapdrm/)" * tag 'fbdev-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (49 commits) video: fbdev: pxafb: fix out of memory error path drm/omap: make omapdrm select OMAP2_DSS drm/omap: move omapdss & displays under omapdrm omapfb: move vrfb into omapfb omapfb: take omapfb's private omapdss into use omapfb/displays: change CONFIG_DISPLAY_* to CONFIG_FB_OMAP2_* omapfb/dss: change CONFIG_OMAP* to CONFIG_FB_OMAP* omapdss: remove CONFIG_OMAP2_DSS_VENC from omapdss.h omapfb: copy omapdss & displays for omapfb omapfb: allow compilation only if DRM_OMAP is disabled fbdev: omap2: panel-dpi: simplify gpio setting fbdev: omap2: panel-dpi: in .disable first disable backlight then display OMAPDSS: DSS: fix a warning message video: omapdss: delete unneeded of_node_put OMAPDSS: DISPC: Remove boolean comparisons OMAPDSS: DSI: cleanup DSI_IRQ_ERROR_MASK define OMAPDSS: remove extra out == NULL checks OMAPDSS: change internal dispc functions to static OMAPDSS: make a two dss feat funcs internal to omapdss OMAPDSS: remove extra EXPORT_SYMBOLs ...
2016-01-18numa: remove stale node_has_online_mem() defineChris Metcalf
This isn't used anywhere, so delete it. Looks like the last usage (in x86-specific code) was removed by Tejun in 2011 in commit bd6709a91a59 ("x86, NUMA: Make 32bit use common NUMA init path"). Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2016-01-18arch/tile: move user_exit() to early kernel entry sequenceChris Metcalf
This ensures that we always notify context tracking that we have exited from user space no matter how we enter the kernel. It is similar to how arm64 handles context tracking, for example. This allows the removal of all the exception_enter() calls that were added in commit 49e4e15619cd ("tile: support CONTEXT_TRACKING and thus NOHZ_FULL"). Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2016-01-18tile: fix bug in setting PT_FLAGS_DISABLE_IRQ on kernel entryChris Metcalf
This flag value is saved in ptregs and used to decide whether to disable irqs when returning from the kernel. Commit 1168df528fe4 ("tile: don't assume user privilege is zero") performed a bad merge from some KVM-enabled code that had not yet been upstreamed. The only issue with the old code is that we will read the interrupt mask in more conditions than we need to (e.g., coming from user space when user space has the Interrupt Critical Section bit set, or coming from a guest kernel), which is a slow multi-cycle operation. This change saves those few cycles in the common case. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2016-01-18tile: fix tilepro casts for readl, writel, etcChris Metcalf
Missing parentheses could cause an argument of the form "integer + pointer" to get cast to "(long)integer + pointer" and remain a pointer type, causing compiler warnings. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2016-01-18tile: fix a -Wframe-larger-than warningChris Metcalf
The warning occurs in setup.c, where it is known that it can't be a problem, but it's still a good idea to silence the warning. The onstack array is converted from an s32 to a u8, which still is plenty of range for the values being managed there. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>