summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-07-04Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "It's been a busy development cycle for target-core in a number of different areas. The fabric API usage for se_node_acl allocation is now within target-core code, dropping the external API callers for all fabric drivers tree-wide. There is a new conversion to RCU hlists for se_node_acl and se_portal_group LUN mappings, that turns fast-past LUN lookup into a completely lockless code-path. It also removes the original hard-coded limitation of 256 LUNs per fabric endpoint. The configfs attributes for backends can now be shared between core and driver code, allowing existing drivers to use common code while still allowing flexibility for new backend provided attributes. The highlights include: - Merge sbc_verify_dif_* into common code (sagi) - Remove iscsi-target support for obsolete IFMarker/OFMarker (Christophe Vu-Brugier) - Add bidi support in target/user backend (ilias + vangelis + agover) - Move se_node_acl allocation into target-core code (hch) - Add crc_t10dif_update common helper (akinobu + mkp) - Handle target-core odd SGL mapping for data transfer memory (akinobu) - Move transport ID handling into target-core (hch) - Move task tag into struct se_cmd + support 64-bit tags (bart) - Convert se_node_acl->device_list[] to RCU hlist (nab + hch + paulmck) - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch + paulmck) - Simplify target backend driver registration (hch) - Consolidate + simplify target backend attribute implementations (hch + nab) - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch) - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab) - Drop unnecessary core_tpg_register TFO parameter (nab) - Use 64-bit LUNs tree-wide (hannes) - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits) target: Bump core version to v5.0 target: remove target_core_configfs.h target: remove unused TARGET_CORE_CONFIG_ROOT define target: consolidate version defines target: implement WRITE_SAME with UNMAP bit using ->execute_unmap target: simplify UNMAP handling target: replace se_cmd->execute_rw with a protocol_data field target/user: Fix inconsistent kmap_atomic/kunmap_atomic target: Send UA when changing LUN inventory target: Send UA upon LUN RESET tmr completion target: Send UA on ALUA target port group change target: Convert se_lun->lun_deve_lock to normal spinlock target: use 'se_dev_entry' when allocating UAs target: Remove 'ua_nacl' pointer from se_ua structure target_core_alua: Correct UA handling when switching states xen-scsiback: Fix compile warning for 64-bit LUN target: Remove TARGET_MAX_LUNS_PER_TRANSPORT target: use 64-bit LUNs target: Drop duplicate + unused se_dev_check_wce target: Drop unnecessary core_tpg_register TFO parameter ...
2015-07-04Merge tag 'ntb-4.2' of git://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "This includes a pretty significant reworking of the NTB core code, but has already produced some significant performance improvements. An abstraction layer was added to allow the hardware and clients to be easily added. This required rewriting the NTB transport layer for this abstraction layer. This modification will allow future "high performance" NTB clients. In addition to this change, a number of performance modifications were added. These changes include NUMA enablement, using CPU memcpy instead of asyncdma, and modification of NTB layer MTU size" * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits) NTB: Add split BAR output for debugfs stats NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe NTB: Print driver name and version in module init NTB: Increase transport MTU to 64k from 16k NTB: Rename Intel code names to platform names NTB: Default to CPU memcpy for performance NTB: Improve performance with write combining NTB: Use NUMA memory in Intel driver NTB: Use NUMA memory and DMA chan in transport NTB: Rate limit ntb_qp_link_work NTB: Add tool test client NTB: Add ping pong test client NTB: Add parameters for Intel SNB B2B addresses NTB: Reset transport QP link stats on down NTB: Do not advance transport RX on link down NTB: Differentiate transport link down messages NTB: Check the device ID to set errata flags NTB: Enable link for Intel root port mode in probe NTB: Read peer info from local SPAD in transport NTB: Split ntb_hw_intel and ntb_transport drivers ...
2015-07-049p: cope with bogus responses from server in p9_client_{read,write}Al Viro
if server claims to have written/read more than we'd told it to, warn and cap the claimed byte count to avoid advancing more than we are ready to.
2015-07-04p9_client_write(): avoid double p9_free_req()Al Viro
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *"; if response is impossible to parse and we discard the request, get the out of the loop right there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-049p: forgetting to cancel request on interrupted zero-copy RPCAl Viro
If we'd already sent a request and decide to abort it, we *must* issue TFLUSH properly and not just blindly reuse the tag, or we'll get seriously screwed when response eventually arrives and we confuse it for response to later request that had reused the same tag. Cc: stable@vger.kernel.org # v3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: bdev_direct_access() may sleepMatthew Wilcox
The brd driver is the only in-tree driver that may sleep currently. After some discussion on linux-fsdevel, we decided that any driver may choose to sleep in its ->direct_access method. To ensure that all callers of bdev_direct_access() are prepared for this, add a call to might_sleep(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04block: Add support for DAX reads/writes to block devicesMatthew Wilcox
If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Use copy_from_iter_nocacheMatthew Wilcox
When userspace does a write, there's no need for the written data to pollute the CPU cache. This matches the original XIP code. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Add block size note to documentationMatthew Wilcox
For block devices which are small enough, mkfs will default to creating a filesystem with block sizes smaller than page size. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Except for the preempt notifiers fix, these are all small bugfixes that could have been waited for -rc2. Sending them now since I was taking care of Peter's patch anyway" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: add hyper-v crash msrs values KVM: x86: remove data variable from kvm_get_msr_common KVM: s390: virtio-ccw: don't overwrite config space values KVM: x86: keep track of LVT0 changes under APICv KVM: x86: properly restore LVT0 KVM: x86: make vapics_in_nmi_mode atomic sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04NTB: Add split BAR output for debugfs statsDave Jiang
When split BAR is enabled, the driver needs to dump out the split BAR registers rather than the original 64bit BAR registers. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Change WARN_ON_ONCE to pr_warn_once on unsafeDave Jiang
The unsafe doorbell and scratchpad access should display reason when WARN is called. Otherwise we get a stack dump without any explanation. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Print driver name and version in module initDave Jiang
Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Increase transport MTU to 64k from 16kDave Jiang
Benchmarking showed a significant performance increase with the MTU size to 64k instead of 16k. Change the driver default to 64k. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rename Intel code names to platform namesDave Jiang
Instead of using the platform code names, use the correct platform names to identify the respective Intel NTB hardware. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Default to CPU memcpy for performanceDave Jiang
Disable DMA usage by default, since the CPU provides much better performance with write combining. Provide a module parameter to enable DMA usage when offloading the memcpy is preferred. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Improve performance with write combiningDave Jiang
Changing the memory window BAR mappings to write combining significantly boosts the performance. We will also use memcpy that uses non-temporal store, which showed performance improvement when doing non-cached memcpys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory in Intel driverAllen Hubbe
Allocate memory for the NUMA node of the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory and DMA chan in transportAllen Hubbe
Allocate memory and request the DMA channel for the same NUMA node as the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rate limit ntb_qp_link_workAllen Hubbe
When the ntb transport is connecting and waiting for the peer, the debug console receives lots of debug level messages about the remote qp link status being down. Rate limit those messages. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add tool test clientAllen Hubbe
This is a simple debugging driver that enables the doorbell and scratch pad registers to be read and written from the debugfs. This tool enables more complicated debugging to be scripted from user space. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add ping pong test clientAllen Hubbe
This is a simple ping pong driver that exercises the scratch pads and doorbells of the ntb hardware. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add parameters for Intel SNB B2B addressesAllen Hubbe
Add module parameters for the addresses to be used in B2B topology. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Reset transport QP link stats on downAllen Hubbe
Reset the link stats when the link goes down. In particular, the TX and RX index and count must be reset, or else the TX side will be sending packets to the RX side where the RX side is not expecting them. Reset all the stats, to be consistent. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Do not advance transport RX on link downAllen Hubbe
On link down, don't advance RX index to the next entry. The next entry should never be valid after receiving the link down flag. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Differentiate transport link down messagesAllen Hubbe
The same message "qp %d: Link Down\n" was printed at two locations in ntb_transport. Change the messages so they are distinct. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Check the device ID to set errata flagsDave Jiang
Set errata flags for the specific device IDs to which they apply, instead of the whole Xeon hardware class. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Enable link for Intel root port mode in probeDave Jiang
Link training should be enabled in the driver probe for root port mode. We should not have to wait for transport to be loaded for this to happen. Otherwise the ntb device will not show up on the transparent bridge side of the link. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Read peer info from local SPAD in transportDave Jiang
The transport was writing and then reading the peer scratch pad, essentially reading what it just wrote instead of exchanging any information with the peer. The transport expects the peer values to be the same as the local values, so this issue was not obvious. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Split ntb_hw_intel and ntb_transport driversAllen Hubbe
Change ntb_hw_intel to use the new NTB hardware abstraction layer. Split ntb_transport into its own driver. Change it to use the new NTB hardware abstraction layer. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add NTB hardware abstraction layerAllen Hubbe
Abstract the NTB device behind a programming interface, so that it can support different hardware and client drivers. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq update from Thomas Gleixner: "The last update for 4.2 is just moving a macro from a local header to the global one, so it can be used in architecture code as well. Cleanup of the now empty local header is 4.3 material" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Move IRQCHIP_DECLARE macro to include/linux/irqchip.h
2015-07-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two FPU rewrite related fixes. This addresses all known x86 regressions at this stage. Also some other misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix boot crash in the early FPU code x86/asm/entry/64: Update path names x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled x86/boot/setup: Clean up the e820_reserve_setup_data() code x86/kaslr: Fix typo in the KASLR_FLAG documentation
2015-07-04Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Debug info and other statistics fixes and related enhancements" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Fix numa balancing stats in /proc/pid/sched sched/numa: Show numa_group ID in /proc/sched_debug task listings sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y sched/stat: Simplify the sched_info accounting dependency
2015-07-04Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes an x86 PMU scheduling fix, but most changes are late breaking tooling fixes and updates: User visible fixes: - Create config.detected into OUTPUT directory, fixing parallel builds sharing the same source directory (Aaro Kiskinen) - Allow to specify custom linker command, fixing some MIPS64 builds. (Aaro Kiskinen) - Fix to show proper convergence stats in 'perf bench numa' (Srikar Dronamraju) User visible changes: - Validate syscall list passed via -e argument to 'perf trace'. (Arnaldo Carvalho de Melo) - Introduce 'perf stat --per-thread' (Jiri Olsa) - Check access permission for --kallsyms and --vmlinux (Li Zhang) - Move toggling event logic from 'perf top' and into hists browser, allowing freeze/unfreeze with event lists with more than one entry (Namhyung Kim) - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and showing the Aggregated stats in 'perf report -D' (Adrian Hunter) Infrastructure fixes: - Add missing break for PERF_RECORD_ITRACE_START, which caused those events samples to be parsed as well as PERF_RECORD_LOST_SAMPLES. ITRACE_START only appears when Intel PT or BTS are present, so.. (Jiri Olsa) - Call the perf_session destructor when bailing out in the inject, kmem, report, kvm and mem tools (Taeung Song) Infrastructure changes: - Move stuff out of 'perf stat' and into the lib for further use (Jiri Olsa) - Reference count the cpu_map and thread_map classes (Jiri Olsa) - Set evsel->{cpus,threads} from the evlist, if not set, allowing the generalization of some 'perf stat' functions that previously were accessing private static evlist variable (Jiri Olsa) - Delete an unnecessary check before the calling free_event_desc() (Markus Elfring) - Allow auxtrace data alignment (Adrian Hunter) - Allow events with dot (Andi Kleen) - Fix failure to 'perf probe' events on arm (He Kuang) - Add testing for Makefile.perf (Jiri Olsa) - Add test for make install with prefix (Jiri Olsa) - Fix single target build dependency check (Jiri Olsa) - Access thread_map entries via accessors, prep patch to hold more info per entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa) - Use __weak definition from compiler.h (Sukadev Bhattiprolu) - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) perf tools: Allow to specify custom linker command perf tools: Create config.detected into OUTPUT directory perf mem: Fill in the missing session freeing after an error occurs perf kvm: Fill in the missing session freeing after an error occurs perf report: Fill in the missing session freeing after an error occurs perf kmem: Fill in the missing session freeing after an error occurs perf inject: Fill in the missing session freeing after an error occurs perf tools: Add missing break for PERF_RECORD_ITRACE_START perf/x86: Fix 'active_events' imbalance perf symbols: Check access permission when reading symbol files perf stat: Introduce --per-thread option perf stat: Introduce print_counters function perf stat: Using init_stats instead of memset perf stat: Rename print_interval to process_interval perf stat: Remove perf_evsel__read_cb function perf stat: Move perf_stat initialization counter process code perf stat: Move zero_per_pkg into counter process code perf stat: Separate counters reading and processing perf stat: Introduce read_counters function perf stat: Introduce perf_evsel__read function ...
2015-07-04Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull max log buf size increase from Ingo Molnar: "Ran into this limit recently, so increase it by an order of magnitude" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull second round of input updates from Dmitry Torokhov: "A new driver for Weida wdt87xx touch controllers, and a bunch of fixups for other drivers" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wdt87xx_i2c - add a scaling factor for TOUCH_MAJOR event Input: wdt87xx_i2c - remove stray newline in diagnostic message Input: arc_ps2 - add HAS_IOMEM dependency Input: wdt87xx_i2c - fix format warning Input: improve parsing OF parameters for touchscreens Input: edt-ft5x06 - mark as direct input device Input: use for_each_set_bit() where appropriate Input: add a driver for wdt87xx touchscreen controller Input: axp20x-pek - fix reporting button state as inverted Input: xpad - re-send LED command on present event Input: xpad - set the LEDs properly on XBox Wireless controllers Input: imx_keypad - check for clk_prepare_enable() error
2015-07-04x86/fpu: Fix boot crash in the early FPU codeIngo Molnar
Jan Kara and Thomas Gleixner reported boot crashes in the FPU code: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40 2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp) and bisected it down to the following FPU commit: 91a8c2a5b43f ("x86/fpu: Clean up and fix MXCSR handling") The reason is that the on-stack FPU registers state variable, used by the FXSAVE instruction, did not have the required minimum alignment of 16 bytes, causing the general protection fault. This is most likely a GCC bug in older GCC versions, but the offending commit also added a bogus extra 32-byte alignment (which GCC ignored too). So fix this bug by making the variable static again, but also mark it __initdata this time, because fpu__init_system_mxcsr() is now an __init function. Reported-and-bisected-by: Jan Kara <jack@suse.cz> Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/numa: Fix numa balancing stats in /proc/pid/schedSrikar Dronamraju
Commit 44dba3d5d6a1 ("sched: Refactor task_struct to use numa_faults instead of numa_* pointers") modified the way tsk->numa_faults stats are accounted. However that commit never touched show_numa_stats() that is displayed in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched don't match the actual numbers. Fix it by making sure that /proc/pid/sched reflects the task fault numbers. Also add group fault stats too. Also couple of more modifications are added here: 1. Format changes: - Previously we would list two entries per node, one for private and one for shared. Also the home node info was listed in each entry. - Now preferred node, total_faults and current node are displayed separately. - Now there is one entry per node, that lists private,shared task and group faults. 2. Unit changes: - p->numa_pages_migrated was getting reset after every read of /proc/pid/sched. It's more useful to have absolute numbers since differential migrations between two accesses can be more easily calculated. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/numa: Show numa_group ID in /proc/sched_debug task listingsSrikar Dronamraju
Having the numa group ID in /proc/sched_debug helps to see how the numa groups have spread across the system. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.hSrikar Dronamraju
Currently print_cfs_rq() is declared in include/linux/sched.h. However it's not used outside kernel/sched. Hence move the declaration to kernel/sched/sched.h Also some functions are only available for CONFIG_SCHED_DEBUG=y. Hence move the declarations to within the #ifdef. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=yNaveen N. Rao
Expand /proc/pid/schedstat output: - enable it on CONFIG_TASK_DELAY_ACCT=y && !CONFIG_SCHEDSTATS kernels. - dump all zeroes on kernels that are booted with the 'nodelayacct' option, which boot option disables delay accounting on CONFIG_TASK_DELAY_ACCT=y kernels. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/5ccbef17d4bc841084ea6e6421d4e4a23b7b806f.1435654789.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/stat: Simplify the sched_info accounting dependencyNaveen N. Rao
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task sched_info, which results in ugly #if clauses. Simplify the code by introducing a synthethic CONFIG_SCHED_INFO switch, selected by both. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-03Merge branch 'next' into for-linusDmitry Torokhov
Prepare second round of input updates for 4.2 merge window.
2015-07-04ext4: correctly migrate a file with a hole at the beginningEryu Guan
Currently ext4_ind_migrate() doesn't correctly handle a file which contains a hole at the beginning of the file. This caused the migration to be done incorrectly, and then if there is a subsequent following delayed allocation write to the "hole", this would reclaim the same data blocks again and results in fs corruption. # assmuing 4k block size ext4, with delalloc enabled # skip the first block and write to the second block xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile # converting to indirect-mapped file, which would move the data blocks # to the beginning of the file, but extent status cache still marks # that region as a hole chattr -e /mnt/ext4/testfile # delayed allocation writes to the "hole", reclaim the same data block # again, results in i_blocks corruption xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile umount /mnt/ext4 e2fsck -nf /dev/sda6 ... Inode 53, i_blocks is 16, should be 8. Fix? no ... Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-07-03ext4: be more strict when migrating to non-extent based fileEryu Guan
Currently the check in ext4_ind_migrate() is not enough before doing the real conversion: a) delayed allocated extents could bypass the check on eh->eh_entries and eh->eh_depth This can be demonstrated by this script xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile chattr -e /mnt/ext4/testfile where testfile has two extents but still be converted to non-extent based file format. b) only extent length is checked but not the offset, which would result in data lose (delalloc) or fs corruption (nodelalloc), because non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30) blocks This can be demostrated by xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile chattr -e /mnt/ext4/testfile sync If delalloc is enabled, dmesg prints EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53 EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5 EXT4-fs (dm-4): This should not happen!! Data will be lost If delalloc is disabled, e2fsck -nf shows corruption Inode 53, i_size is 5497558142976, should be 4096. Fix? no Fix the two issues by a) forcing all delayed allocation blocks to be allocated before checking eh->eh_depth and eh->eh_entries b) limiting the last logical block of the extent is within direct map Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-07-03ext4: fix reservation release on invalidatepage for delalloc fsLukas Czerner
On delalloc enabled file system on invalidatepage operation in ext4_da_page_release_reservation() we want to clear the delayed buffer and remove the extent covering the delayed buffer from the extent status tree. However currently there is a bug where on the systems with page size > block size we will always remove extents from the start of the page regardless where the actual delayed buffers are positioned in the page. This leads to the errors like this: EXT4-fs warning (device loop0): ext4_da_release_space:1225: ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data blocks This however can cause data loss on writeback time if the file system is in ENOSPC condition because we're releasing reservation for someones else delayed buffer. Fix this by only removing extents that corresponds to the part of the page we want to invalidate. This problem is reproducible by the following fio receipt (however I was only able to reproduce it with fio-2.1 or older. [global] bs=8k iodepth=1024 iodepth_batch=60 randrepeat=1 size=1m directory=/mnt/test numjobs=20 [job1] ioengine=sync bs=1k direct=1 rw=randread filename=file1:file2 [job2] ioengine=libaio rw=randwrite direct=1 filename=file1:file2 [job3] bs=1k ioengine=posixaio rw=randwrite direct=1 filename=file1:file2 [job5] bs=1k ioengine=sync rw=randread filename=file1:file2 [job7] ioengine=libaio rw=randwrite filename=file1:file2 [job8] ioengine=posixaio rw=randwrite filename=file1:file2 [job10] ioengine=mmap rw=randwrite bs=1k filename=file1:file2 [job11] ioengine=mmap rw=randwrite direct=1 filename=file1:file2 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2015-07-03Merge tag 'topic/drm-fixes-2015-07-04' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm-intel Pull drm EDID fix from Daniel Vetter: "Since Dave is enjoying vacation I figured I'll send you this drm core fix directly" * tag 'topic/drm-fixes-2015-07-04' of git://anongit.freedesktop.org/drm-intel: drm/crtc: Fix edid length computation
2015-07-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost cross endian support from Michael Tsirkin: "I have just queued some more bugfix patches today but none fix regressions and none are related to these ones, so it looks like a good time for a merge for -rc1. The motivation for this is support for legacy BE guests on the new LE hosts. There are two redeeming properties that made me merge this: - It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY, and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce the second point: prevent people from enabling it eg on x86. I will look into this" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-pci: alloc only resources actually used. macvtap/tun: cross-endian support for little-endian hosts vhost: cross-endian support for legacy devices virtio: add explicit big-endian support to memory accessors vhost: introduce vhost_is_little_endian() helper vringh: introduce vringh_is_little_endian() helper macvtap: introduce macvtap_is_little_endian() helper tun: add tun_is_little_endian() helper virtio: introduce virtio_is_little_endian() helper
2015-07-04drm/crtc: Fix edid length computationShixin Zeng
The length of each EDID block is EDID_LENGTH, and number of blocks is (1 + edid->extensions) - we need to multiply not add them. This causes wrong EDID to be passed on, and is a regression introduced by d2ed34362a52 (drm: Introduce helper for replacing blob properties) Signed-off-by: Shixin Zeng <zeng.shixin@gmail.com> Cc: Daniel Stone <daniels@collabora.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com> [danvet: Add Cc: and fix commit summary.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>