summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-03-21blk-stat: convert to callback-based statistics reportingOmar Sandoval
Currently, statistics are gathered in ~0.13s windows, and users grab the statistics whenever they need them. This is not ideal for both in-tree users: 1. Writeback throttling wants its own dynamically sized window of statistics. Since the blk-stats statistics are reset after every window and the wbt windows don't line up with the blk-stats windows, wbt doesn't see every I/O. 2. Polling currently grabs the statistics on every I/O. Again, depending on how the window lines up, we may miss some I/Os. It's also unnecessary overhead to get the statistics on every I/O; the hybrid polling heuristic would be just as happy with the statistics from the previous full window. This reworks the blk-stats infrastructure to be callback-based: users register a callback that they want called at a given time with all of the statistics from the window during which the callback was active. Users can dynamically bucketize the statistics. wbt and polling both currently use read vs. write, but polling can be extended to further subdivide based on request size. The callbacks are kept on an RCU list, and each callback has percpu stats buffers. There will only be a few users, so the overhead on the I/O completion side is low. The stats flushing is also simplified considerably: since the timer function is responsible for clearing the statistics, we don't have to worry about stale statistics. wbt is a trivial conversion. After the conversion, the windowing problem mentioned above is fixed. For polling, we register an extra callback that caches the previous window's statistics in the struct request_queue for the hybrid polling heuristic to use. Since we no longer have a single stats buffer for the request queue, this also removes the sysfs and debugfs stats entries. To replace those, we add a debugfs entry for the poll statistics. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21blk-stat: move BLK_RQ_STAT_BATCH definition to blk-stat.cOmar Sandoval
This is an implementation detail that no-one outside of blk-stat.c uses. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21HID: hiddev: reallocate hiddev's minor numberJaejoong Kim
We need to store the minor number each drivers. In case of hidraw, the minor number is stored stores in struct hidraw. But hiddev's minor is located in struct hid_device. The hid-core driver announces a kernel message which driver is loaded when HID device connected, but hiddev's minor number is always zero. To proper display hiddev's minor number, we need to store the minor number asked from usb core and do some refactoring work (move from hiddev.c to hiddev.h) to access hiddev in hid-core. [jkosina@suse.cz: rebase on top of newer codebase] Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21HID: sony: Report DS4 motion sensors through a separate deviceRoderick Colenbrander
The DS4 motion sensors are currently mapped by the hid-core driver to non-existing axes in between ABS_MISC and ABS_MT_SLOT, because the device already exhausted ABS_X-ABS_RZ. For a part the mapping by hid-core is accomplished by a fixup in hid-sony as the motion axes actually use vendor specific usage pages. This patch makes the DS4 use a separate input device for the motion sensors and reports acceleration data through ABS_X-ABS_Z and gyroscope data through ABS_RX-ABS_RZ. In addition it extends the event spec to allow gyroscope data through ABS_RX-ABS_RZ when INPUT_PROP_ACCELEROMETER is set. This change was suggested by Peter Hutterer during a discussion on linux-input. [jkosina@suse.cz: rebase onto slightly newer codebase] Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21HID: remove initial reading of reports at connectBenjamin Tissoires
It looks like a bunch of devices do not like to be polled for their reports at init time. When you look into the details, it seems that for those that are requiring the quirk HID_QUIRK_NO_INIT_REPORTS, the driver fails to retrieve part of the features/inputs while others (more generic) work. IMO, it should be acceptable to remove the need for the quirk in the general case. On the small amount of cases where we actually need to read the current values, the driver in charge (hid-mt or wacom) already retrieves the features manually. There are 2 cases where we might need to retrieve the reports at init: 1. hiddev devices with specific use-space tool 2. a device that would require the driver to fetch a specific feature/input at plug For case 2, I have seen this a few time on hid-multitouch. It is solved in hid-multitouch directly by fetching the feature. I hope it won't be too common and this can be solved on a per-case basis (crossing fingers). For case 1, we moved the implementation of HID_QUIRK_NO_INIT_REPORTS in hiddev. When somebody starts calling ioctls that needs an initial update, the hiddev device will fetch the initial state of the reports to mimic the current behavior. This adds a small amount of time during the first HIDIOCGUSAGE(S), but it should be acceptable in most cases. To keep the currently known broken devices, we have to keep around HID_QUIRK_NO_INIT_REPORTS, but the scope will only be for hiddev. Note that I don't think hidraw would be affected and I checked that the FF drivers that need to interact with the report fields are all using output reports, which are not initialized by usbhid_init_reports(). NO_INIT_INPUT_REPORTS is then replaced by HID_QUIRK_NO_INIT_REPORTS: there is no point keeping it for just one device. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21usb/early: Add driver for xhci debug capabilityLu Baolu
XHCI debug capability (DbC) is an optional but standalone functionality provided by an xHCI host controller. Software learns this capability by walking through the extended capability list of the host. XHCI specification describes DbC in section 7.6. This patch introduces the code to probe and initialize the debug capability hardware during early boot. With hardware initialized, the debug target (system on which this code is running) will present a debug device through the debug port (normally the first USB3 port). The debug device is fully compliant with the USB framework and provides the equivalent of a very high performance (USB3) full-duplex serial link between the debug host and target. The DbC functionality is independent of the xHCI host. There isn't any precondition from the xHCI host side for the DbC to work. One use for this feature is kernel debugging, for example when your machine crashes very early before the regular console code is initialized. Other uses include simpler, lockless logging instead of a full-blown printk console driver and klogd. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-3-git-send-email-baolu.lu@linux.intel.com [ Small fix to the Kconfig help text. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21drm/edid: detect SCDC support in HF-VSDBShashank Sharma
This patch does following: - Adds a new structure (drm_hdmi_info) in drm_display_info. This structure will be used to save and indicate if sink supports advanced HDMI 2.0 features - Adds another structure drm_scdc within drm_hdmi_info, to reflect scdc support and capabilities in connected HDMI 2.0 sink. - Checks the HF-VSDB block for presence of SCDC, and marks it in scdc structure - If SCDC is present, checks if sink is capable of generating SCDC read request, and marks it in scdc structure. V2: Addressed review comments Thierry: - Fix typos in commit message and make abbreviation consistent across the commit message. - Change structure object name from hdmi_info -> hdmi - Fix typos and abbreviations in description of structure drm_hdmi_info end the description with a full stop. - Create a structure drm_scdc, and keep all information related to SCDC register set (supported, read request supported) etc in it. Ville: - Change rr -> read_request - Call drm_detect_scrambling function drm_parse_hf_vsdb so that all of HF-VSDB parsing can be kept in same function, in incremental patches. V3: Rebase. V4: Rebase. V5: Rebase. V6: Addressed review comments from Ville - Add clock rate calculations for 1/10 and 1/40 ratios - Remove leftovers from old patchset V7: Added R-B from Jose. V8: Rebase. V9: Rebase. V10: Rebase. Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1489404244-16608-5-git-send-email-shashank.sharma@intel.com
2017-03-21drm/edid: detect SCDC support in HF-VSDBShashank Sharma
This patch does following: - Adds a new structure (drm_hdmi_info) in drm_display_info. This structure will be used to save and indicate if sink supports advanced HDMI 2.0 features - Adds another structure drm_scdc within drm_hdmi_info, to reflect scdc support and capabilities in connected HDMI 2.0 sink. - Checks the HF-VSDB block for presence of SCDC, and marks it in scdc structure - If SCDC is present, checks if sink is capable of generating SCDC read request, and marks it in scdc structure. V2: Addressed review comments Thierry: - Fix typos in commit message and make abbreviation consistent across the commit message. - Change structure object name from hdmi_info -> hdmi - Fix typos and abbreviations in description of structure drm_hdmi_info end the description with a full stop. - Create a structure drm_scdc, and keep all information related to SCDC register set (supported, read request supported) etc in it. Ville: - Change rr -> read_request - Call drm_detect_scrambling function drm_parse_hf_vsdb so that all of HF-VSDB parsing can be kept in same function, in incremental patches. V3: Rebase. V4: Rebase. V5: Rebase. V6: Rebase. V7: Added R-B from Jose. V8: Rebase. V9: Rebase. V10: Rebase. Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1489404244-16608-4-git-send-email-shashank.sharma@intel.com
2017-03-21drm/edid: check for HF-VSDB blockThierry Reding
This patch implements a small function that finds if a given CEA db is hdmi-forum vendor specific data block or not. V2: Rebase. V3: Added R-B from Jose. V4: Rebase V5: Rebase V6: Rebase V7: Rebase V8: Rebase V9: Rebase V10: Rebase Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1489404244-16608-3-git-send-email-shashank.sharma@intel.com
2017-03-21drm: Add SCDC helpersThierry Reding
SCDC is a mechanism defined in the HDMI 2.0 specification that allows the source and sink devices to communicate. This commit introduces helpers to access the SCDC and provides the symbolic names for the various registers defined in the specification. V2: Rebase. V3: Added R-B from Jose. V4: Rebase V5: Addressed review comments from Ville - Handle the I2c return values in a better way (dp_dual_mode) - Make the macros for SCDC Major/Minor more readable, by adding a 'GET' in the macro names V6: Rebase V7: Rebase V8: Rebase V9: Rebase V10: Rebase Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1489404244-16608-2-git-send-email-shashank.sharma@intel.com
2017-03-21chardev: add helper function to register char devs with a struct deviceLogan Gunthorpe
Credit for this patch goes is shared with Dan Williams [1]. I've taken things one step further to make the helper function more useful and clean up calling code. There's a common pattern in the kernel whereby a struct cdev is placed in a structure along side a struct device which manages the life-cycle of both. In the naive approach, the reference counting is broken and the struct device can free everything before the chardev code is entirely released. Many developers have solved this problem by linking the internal kobjs in this fashion: cdev.kobj.parent = &parent_dev.kobj; The cdev code explicitly gets and puts a reference to it's kobj parent. So this seems like it was intended to be used this way. Dmitrty Torokhov first put this in place in 2012 with this commit: 2f0157f char_dev: pin parent kobject and the first instance of the fix was then done in the input subsystem in the following commit: 4a215aa Input: fix use-after-free introduced with dynamic minor changes Subsequently over the years, however, this issue seems to have tripped up multiple developers independently. For example, see these commits: 0d5b7da iio: Prevent race between IIO chardev opening and IIO device (by Lars-Peter Clausen in 2013) ba0ef85 tpm: Fix initialization of the cdev (by Jason Gunthorpe in 2015) 5b28dde [media] media: fix use-after-free in cdev_put() when app exits after driver unbind (by Shauh Khan in 2016) This technique is similarly done in at least 15 places within the kernel and probably should have been done so in another, at least, 5 places. The kobj line also looks very suspect in that one would not expect drivers to have to mess with kobject internals in this way. Even highly experienced kernel developers can be surprised by this code, as seen in [2]. To help alleviate this situation, and hopefully prevent future wasted effort on this problem, this patch introduces a helper function to register a char device along with its parent struct device. This creates a more regular API for tying a char device to its parent without the developer having to set members in the underlying kobject. This patch introduce cdev_device_add and cdev_device_del which replaces a common pattern including setting the kobj parent, calling cdev_add and then calling device_add. It also introduces cdev_set_parent for the few cases that set the kobject parent without using device_add. [1] https://lkml.org/lkml/2017/2/13/700 [2] https://lkml.org/lkml/2017/2/10/370 Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-21drm/exynos/decon5433: signal frame done interrupt at front porchAndrzej Hajda
DECON in case of video mode generates interrupt by default at start of vertical back porch. As this interrupt is used to generate VBLANK events more optimal point is start of vertical front porch. Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-21drm/exynos/decon5433: fix vblank event handlingAndrzej Hajda
Current implementation of event handling assumes that vblank interrupt is always called at the right time. It is not true, it can be delayed due to various reasons. As a result different races can happen. The patch fixes the issue by using hardware frame counter present in DECON to serialize vblank and commit completion events. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-20x86/arch_prctl: Add ARCH_[GET|SET]_CPUIDKyle Huey
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. Exposing this feature to userspace will allow a ptracer to trap and emulate the CPUID instruction. When supported, this feature is controlled by toggling bit 0 of MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Implement a new pair of arch_prctls, available on both x86-32 and x86-64. ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting is enabled (and thus the CPUID instruction is not available) or 1 if CPUID faulting is not enabled. ARCH_SET_CPUID: Set the CPUID state to the second argument. If cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will be deactivated. Returns ENODEV if CPUID faulting is not supported on this system. The state of the CPUID faulting flag is propagated across forks, but reset upon exec. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20x86/syscalls/32: Wire up arch_prctl on x86-32Kyle Huey
Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat mode on x86-64. This allows to have arch_prctls that are not specific to 64 bits. On UML, simply stub out this syscall. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20generic syscalls: Wire up statx syscallStafford Horne
The new syscall statx is implemented as generic code, so enable it for architectures like openrisc which use the generic syscall table. Fixes: a528d35e8bfcc ("statx: Add a system call to make enhanced file info available") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull SCSI target fixes from Nicholas Bellinger: "The bulk of the changes are in qla2xxx target driver code to address various issues found during Cavium/QLogic's internal testing (stable CC's included), along with a few other stability and smaller miscellaneous improvements. There are also a couple of different patch sets from Mike Christie, which have been a result of his work to use target-core ALUA logic together with tcm-user backend driver. Finally, a patch to address some long standing issues with pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices, which will make folks using physical (or virtual) magnetic tape happy" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits) qla2xxx: Update driver version to 9.00.00.00-k qla2xxx: Fix delayed response to command for loop mode/direct connect. qla2xxx: Change scsi host lookup method. qla2xxx: Add DebugFS node to display Port Database qla2xxx: Use IOCB interface to submit non-critical MBX. qla2xxx: Add async new target notification qla2xxx: Export DIF stats via debugfs qla2xxx: Improve T10-DIF/PI handling in driver. qla2xxx: Allow relogin to proceed if remote login did not finish qla2xxx: Fix sess_lock & hardware_lock lock order problem. qla2xxx: Fix inadequate lock protection for ABTS. qla2xxx: Fix request queue corruption. qla2xxx: Fix memory leak for abts processing qla2xxx: Allow vref count to timeout on vport delete. tcmu: Convert cmd_time_out into backend device attribute tcmu: make cmd timeout configurable tcmu: add helper to check if dev was configured target: fix race during implicit transition work flushes target: allow userspace to set state to transitioning target: fix ALUA transition timeout handling ...
2017-03-18target: fix ALUA transition timeout handlingMike Christie
The implicit transition time tells initiators the min time to wait before timing out a transition. We currently schedule the transition to occur in tg_pt_gp_implicit_trans_secs seconds so there is no room for delays. If core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata needs to write out info to a remote file, then the initiator can easily time out the operation. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18target: allow ALUA setup for some passthrough backendsMike Christie
This patch allows passthrough backends to use the core/base LIO ALUA setup and state checks, but still handle the execution of commands. This will allow the target_core_user module to execute STPG and RTPG in userspace, and not have to duplicate the ALUA state checks, path information (needed so we can check if command is executable on specific paths) and setup (rtslib sets/updates the configfs ALUA interface like it does for iblock or file). For STPG, the target_core_user userspace daemon, tcmu-runner will still execute the STPG, and to update the core/base LIO state it will use the existing configfs interface. For RTPG, tcmu-runner will loop over configfs and/or cache the state. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18mm/gup: Implement the dev_pagemap() logic in the generic ↵Kirill A. Shutemov
get_user_pages_fast() function This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's only implemented on x86. On non-x86, the new code will be compiled out. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Move permission checks into helpersKirill A. Shutemov
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86, we would need to do additional permission checks to determine if access is allowed. Let's abstract it out into separate helpers. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Drop the arch_pte_access_permitted() MMU callbackKirill A. Shutemov
The only arch that defines it to something meaningful is x86. But x86 doesn't use the generic GUP_fast() implementation -- the only place where the callback is called. Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17Merge branch 'x86-acpi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
2017-03-17dma-fence: add dma_fence_match_context helperPhilipp Zabel
Add a helper to check if all fences in a fence array are from a given context. For convenience, the function can also handle being given a non-array fence. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: http://patchwork.freedesktop.org/patch/msgid/1489768492-25190-1-git-send-email-p.zabel@pengutronix.de
2017-03-17Merge tag 'drm-fixes-for-v4.11-rc3' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Bunch of fixes across the drivers, in a St Patrick's day pull request (please turn terminal colors to green on black or black on green for full effect). On the arm side, tilcdc, omap and malidp got fixes, while amd has some powermanagement fixes, and intel has a set of fixes across the driver. Nothing seems to bad or scary at this point" * tag 'drm-fixes-for-v4.11-rc3' of git://people.freedesktop.org/~airlied/linux: (27 commits) drm/amd/amdgpu: Fix debugfs reg read/write address width drm/amdgpu/si: add dpm quirk for Oland drm/radeon/si: add dpm quirk for Oland drm: amd: remove broken include path drm/amd/powerplay: fix copy error in smu7_clockpoweragting.c drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled drm/tilcdc: Fix hardcoded fail-return value in tilcdc_crtc_create() drm/i915: Fix forcewake active domain tracking drm/i915: Nuke skl_update_plane debug message from the pipe update critical section drm/i915: use correct node for handling cache domain eviction uapi: fix drm/omap_drm.h userspace compilation errors drm/omap: fix dmabuf mmap for dma_alloc'ed buffers drm/amdgpu: fix parser init error path to avoid crash in parser fini drm/amd/amdgpu: Disable GFX_PG on Carrizo until compute issues solved drm: mali-dp: Fix smart layer not going to composition drm: mali-dp: Remove mclk rate management drm/i915: Drain the freed state from the tail of the next commit drm/i915: Nuke debug messages from the pipe update critical section drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl drm/i915: Store a permanent error in obj->mm.pages ...
2017-03-17hrtimer: Remove hrtimer_peek_ahead_timers() leftoversStephen Boyd
This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted. Delete it now. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-17drm/i915: Add i810/i815 pci-ids for completenessChris Wilson
To improve our historical record and to simplify userspace that wants to include i915_pciids.h as its canonical breakdown of PCI IDs and their respective generations, include the gen1 ids for i810 and i815. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313112810.4202-1-chris@chris-wilson.co.uk Acked-by: Jani Nikula <jani.nikula@intel.com>
2017-03-17cgroup, kthread: close race window where new kthreads can be migrated to ↵Tejun Heo
non-root cgroups Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3) Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-17netfilter: refcounter conversionsReshetova, Elena
refcount_t type and corresponding API (see include/linux/refcount.h) should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-17Merge branch 'linus' into x86/mm, to pick up a bugfixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17auxdisplay: charlcd: Add support for 4-bit interfacesGeert Uytterhoeven
In 4-bit mode, 8-bit commands and data are written using two raw writes to the data interface: high nibble first, low nibble last. This must be handled by the low-level driver. However, as we don't know in which mode (4-bit or 8-bit) nor 4-bit phase the LCD was left, initialization must always be handled using raw writes, and needs to configure the LCD for 8-bit mode first. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17auxdisplay: charlcd: Extract character LCD core from misc/panelGeert Uytterhoeven
Extract the character LCD core from the Parallel port LCD/Keypad Panel driver in the misc subsystem, and convert it into a subdriver in the auxdisplay subsystem. This allows the character LCD core to be used by other drivers later. Compilation is controlled by its own Kconfig symbol CHARLCD, which is to be selected by its users, but can be enabled manually for compile-testing. All functions changed their prefix from "lcd_" to "charlcd_", and gained a "struct charlcd *" parameter to operate on a specific instance. While the driver API thus is ready to support multiple instances, the current limitation of a single display (/dev/lcd has a single misc minor assigned) is retained. No functional changes intended. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17pps: fix padding issue with PPS_FETCH for ioctl_compatMatt Ranostay
Issue is that x86 32-bit aligns to 4-bytes instead of 8-bytes so this patchset works around the issue and corrects the data returned in pps_fdata_compat. Acked-by: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Matt Ranostay <matt.ranostay@konsulko.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17docs: Add kernel-doc comments to VME driver APIMartyn Welch
Add kernel-doc comments to the VME driver API and structures. This documentation will be integrated into the RST documentation in a later patch. Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17drivers/misc: Add Aspeed LPC control driverCyril Bur
In order to manage server systems, there is typically another processor known as a BMC (Baseboard Management Controller) which is responsible for powering the server and other various elements, sometimes fans, often the system flash. The Aspeed BMC family which is what is used on OpenPOWER machines and a number of x86 as well is typically connected to the host via an LPC (Low Pin Count) bus (among others). The LPC bus is an ISA bus on steroids. It's generally used by the BMC chip to provide the host with access to the system flash (via MEM/FW cycles) that contains the BIOS or other host firmware along with a number of SuperIO-style IOs (via IO space) such as UARTs, IPMI controllers. On the BMC chip side, this is all configured via a bunch of registers whose content is related to a given policy of what devices are exposed at a per system level, which is system/vendor specific, so we don't want to bolt that into the BMC kernel. This started with a need to provide something nicer than /dev/mem for user space to configure these things. One important aspect of the configuration is how the MEM/FW space is exposed to the host (ie, the x86 or POWER). Some registers in that bridge can define a window remapping all or portion of the LPC MEM/FW space to a portion of the BMC internal bus, with no specific limits imposed in HW. I think it makes sense to ensure that this window is configured by a kernel driver that can apply some serious sanity checks on what it is configured to map. In practice, user space wants to control this by flipping the mapping between essentially two types of portions of the BMC address space: - The flash space. This is a region of the BMC MMIO space that more/less directly maps the system flash (at least for reads, writes are somewhat more complicated). - One (or more) reserved area(s) of the BMC physical memory. The latter is needed for a number of things, such as avoiding letting the host manipulate the innards of the BMC flash controller via some evil backdoor, we want to do flash updates by routing the window to a portion of memory (under control of a mailbox protocol via some separate set of registers) which the host can use to write new data in bulk and then request the BMC to flash it. There are other uses, such as allowing the host to boot from an in-memory flash image rather than the one in flash (very handy for continuous integration and test, the BMC can just download new images). It is important to note that due to the way the Aspeed chip lets the kernel configure the mapping between host LPC addresses and BMC ram addresses the offset within the window must be a multiple of size. Not doing so will fragment the accessible space rather than simply moving 'zero' upwards. This is caused by the nature of HICR8 being a mask and the way host LPC addresses are translated. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17vmbus: expose debug info for driversStephen Hemminger
Allow driver to get debug information about state of the ring. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17vmbus: cleanup header file styleStephen Hemminger
Minor changes to align hyper-v vmbus include files with current linux kernel style. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17vmbus: remove useless return'sStephen Hemminger
No need for empty return at end of void function Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17fpga: Add flag to indicate bitstream needs decryptingMoritz Fischer
Add a flag that is passed to the write_init() callback, indicating that the bitstream is encrypted. The low-level driver will deal with the flag, or return an error, if encrypted bitstreams are not supported. Signed-off-by: Moritz Fischer <mdf@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Alan Tull <atull@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17linux/serdev.h: Replace 'ctrl->serdev' with 'serdev'Andrey Smirnov
Replace 'ctrl->serdev' with 'serdev' in serdev_controller_write_wakeup() and serdev_controller_receive_buf(). Cc: Rob Herring <robh@kernel.org> Cc: cphealy@gmail.com Cc: linux-serial@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17usb: of: add functions to bind a companion controllerYoshihiro Shimoda
EHCI controllers will have a companion controller. However, on platform bus, there was difficult to bind them in previous code. So, this patch adds helper functions to bind them using a "companion" property. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17usb: add CONFIG_USB_PCI for system have both PCI HW and non-PCI based USB HWyuan linyu
a lot of embeded system SOC (e.g. freescale T2080) have both PCI and USB modules. But USB module is controlled by registers directly, it have no relationship with PCI module. when say N here it will not build PCI related code in USB driver. Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-16bpf: add helper inlining infra and optimize map_array lookupAlexei Starovoitov
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16tcp: remove tcp_tw_recycleSoheil Hassas Yeganeh
The tcp_tw_recycle was already broken for connections behind NAT, since the per-destination timestamp is not monotonically increasing for multiple machines behind a single destination address. After the randomization of TCP timestamp offsets in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection), the tcp_tw_recycle is broken for all types of connections for the same reason: the timestamps received from a single machine is not monotonically increasing, anymore. Remove tcp_tw_recycle, since it is not functional. Also, remove the PAWSPassive SNMP counter since it is only used for tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req since the strict argument is only set when tcp_tw_recycle is enabled. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Lutz Vieweg <lvml@5t9.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16tcp: remove per-destination timestamp cacheSoheil Hassas Yeganeh
Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection) randomizes TCP timestamps per connection. After this commit, there is no guarantee that the timestamps received from the same destination are monotonically increasing. As a result, the per-destination timestamp cache in TCP metrics (i.e., tcpm_ts in struct tcp_metrics_block) is broken and cannot be relied upon. Remove the per-destination timestamp cache and all related code paths. Note that this cache was already broken for caching timestamps of multiple machines behind a NAT sharing the same address. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Lutz Vieweg <lvml@5t9.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16net/mlx4_core: Avoid delays during VF driver device shutdownJack Morgenstein
Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF. In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method. For such Hypervisors, there is a race condition between the VF's shutdown method and its internal-error detection/reset thread. The internal-error detection/reset thread (which runs every 5 seconds) also detects a disabled comm channel. If the internal-error detection/reset flow wins the race, we still get delays (while that flow tries repeatedly to detect comm-channel recovery). The cited commit fixed the command timeout problem when the internal-error detection/reset flow loses the race. This commit avoids the unneeded delays when the internal-error detection/reset flow wins. Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Simon Xiao <sixiao@microsoft.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16drivers core: remove assert_held_device_hotplug()Heiko Carstens
The last caller of assert_held_device_hotplug() is gone, so remove it again. Link: http://lkml.kernel.org/r/20170314125226.16779-3-heiko.carstens@de.ibm.com Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16kasan: add a prototype of task_struct to avoid warningMasami Hiramatsu
Add a prototype of task_struct to fix below warning on arm64. In file included from arch/arm64/kernel/probes/kprobes.c:19:0: include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] static inline void kasan_unpoison_task_stack(struct task_struct *task) {} As same as other types (kmem_cache, page, and vm_struct) this adds a prototype of task_struct data structure on top of kasan.h. [arnd] A related warning was fixed before, but now appears in a different line in the same file in v4.11-rc2. The patch from Masami Hiramatsu still seems appropriate, so let's take his version. Fixes: 71af2ed5eeea ("kasan, sched/headers: Remove <linux/sched.h> from <linux/kasan.h>") Link: https://patchwork.kernel.org/patch/9569839/ Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16raid5-ppl: Partial Parity Log write logging implementationArtur Paszkiewicz
Implement the calculation of partial parity for a stripe and PPL write logging functionality. The description of PPL is added to the documentation. More details can be found in the comments in raid5-ppl.c. Attach a page for holding the partial parity data to stripe_head. Allocate it only if mddev has the MD_HAS_PPL flag set. Partial parity is the xor of not modified data chunks of a stripe and is calculated as follows: - reconstruct-write case: xor data from all not updated disks in a stripe - read-modify-write case: xor old data and parity from all updated disks in a stripe Implement it using the async_tx API and integrate into raid_run_ops(). It must be called when we still have access to old data, so do it when STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is stored into sh->ppl_page. Partial parity is not meaningful for full stripe write and is not stored in the log or used for recovery, so don't attempt to calculate it when stripe has STRIPE_FULL_WRITE. Put the PPL metadata structures to md_p.h because userspace tools (mdadm) will also need to read/write PPL. Warn about using PPL with enabled disk volatile write-back cache for now. It can be removed once disk cache flushing before writing PPL is implemented. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-03-16md: superblock changes for PPLArtur Paszkiewicz
Include information about PPL location and size into mdp_superblock_1 and copy it to/from rdev. Because PPL is mutually exclusive with bitmap, put it in place of 'bitmap_offset'. Add a new flag MD_FEATURE_PPL for 'feature_map', analogically to MD_FEATURE_BITMAP_OFFSET. Add MD_HAS_PPL to mddev->flags to indicate that PPL is enabled on an array. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>