summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-20arm64: setup: introduce kaslr_offset()Alexander Popov
Introduce kaslr_offset() similar to x86_64 to fix kcov. [ Updated by Will Deacon ] Link: http://lkml.kernel.org/r/1481417456-28826-2-git-send-email-alex.popov@linux.com Signed-off-by: Alexander Popov <alex.popov@linux.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Jon Masters <jcm@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEEDJohannes Weiner
When FADV_DONTNEED cannot drop all pages in the range, it observes that some pages might still be on per-cpu LRU caches after recent instantiation and so initiates remote calls to all CPUs to flush their local caches. However, in most cases, the fadvise happens from the same context that instantiated the pages, and any pre-LRU pages in the specified range are most likely sitting on the local CPU's LRU cache, and so in many cases this results in unnecessary remote calls, which, in a loaded system, can hold up the fadvise() call significantly. [ I didn't record it in the extreme case we observed at Facebook, unfortunately. We had a slow-to-respond system and noticed it lru_add_drain_all() leading the profile during fadvise calls. This patch came out of thinking about the code and how we commonly call FADV_DONTNEED. FWIW, I wrote a silly directory tree walker/searcher that recurses through /usr to read and FADV_DONTNEED each file it finds. On a 2 socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With the patch, that cost is gone; the local drain cost shows at 0.09%. ] Try to avoid the remote call by flushing the local LRU cache before even attempting to invalidate anything. It's a cheap operation, and the local LRU cache is the most likely to hold any pre-LRU pages in the specified fadvise range. Link: http://lkml.kernel.org/r/20161214210017.GA1465@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: platform-independent hash valueAndreas Steffen
For remote attestion it is important for the ima measurement values to be platform-independent. Therefore integer fields to be hashed must be converted to canonical format. Link: http://lkml.kernel.org/r/1480554346-29071-11-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Andreas Steffen <andreas.steffen@strongswan.org> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: define a canonical binary_runtime_measurements list formatMimi Zohar
The IMA binary_runtime_measurements list is currently in platform native format. To allow restoring a measurement list carried across kexec with a different endianness than the targeted kernel, this patch defines little-endian as the canonical format. For big endian systems wanting to save/restore the measurement list from a system with a different endianness, a new boot command line parameter named "ima_canonical_fmt" is defined. Considerations: use of the "ima_canonical_fmt" boot command line option will break existing userspace applications on big endian systems expecting the binary_runtime_measurements list to be in platform native format. Link: http://lkml.kernel.org/r/1480554346-29071-10-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: support restoring multiple template formatsMimi Zohar
The configured IMA measurement list template format can be replaced at runtime on the boot command line, including a custom template format. This patch adds support for restoring a measuremement list containing multiple builtin/custom template formats. Link: http://lkml.kernel.org/r/1480554346-29071-9-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: store the builtin/custom template definitions in a listMimi Zohar
The builtin and single custom templates are currently stored in an array. In preparation for being able to restore a measurement list containing multiple builtin/custom templates, this patch stores the builtin and custom templates as a linked list. This will permit defining more than one custom template per boot. Link: http://lkml.kernel.org/r/1480554346-29071-8-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: on soft reboot, save the measurement listMimi Zohar
The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and restored on boot. This patch uses the kexec buffer passing mechanism to pass the serialized IMA binary_runtime_measurements to the next kernel. Link: http://lkml.kernel.org/r/1480554346-29071-7-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20powerpc: ima: send the kexec buffer to the next kernelThiago Jung Bauermann
The IMA kexec buffer allows the currently running kernel to pass the measurement list via a kexec segment to the kernel that will be kexec'd. This is the architecture-specific part of setting up the IMA kexec buffer for the next kernel. It will be used in the next patch. Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: maintain memory size needed for serializing the measurement listMimi Zohar
In preparation for serializing the binary_runtime_measurements, this patch maintains the amount of memory required. Link: http://lkml.kernel.org/r/1480554346-29071-5-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: permit duplicate measurement list entriesMimi Zohar
Measurements carried across kexec need to be added to the IMA measurement list, but should not prevent measurements of the newly booted kernel from being added to the measurement list. This patch adds support for allowing duplicate measurements. The "boot_aggregate" measurement entry is the delimiter between soft boots. Link: http://lkml.kernel.org/r/1480554346-29071-4-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ima: on soft reboot, restore the measurement listMimi Zohar
The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and restored on boot. This patch restores the measurement list. Link: http://lkml.kernel.org/r/1480554346-29071-3-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20powerpc: ima: get the kexec buffer passed by the previous kernelThiago Jung Bauermann
Patch series "ima: carry the measurement list across kexec", v8. The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and then restored on the subsequent boot, possibly of a different architecture. The existing securityfs binary_runtime_measurements file conveniently provides a serialized format of the IMA measurement list. This patch set serializes the measurement list in this format and restores it. Up to now, the binary_runtime_measurements was defined as architecture native format. The assumption being that userspace could and would handle any architecture conversions. With the ability of carrying the measurement list across kexec, possibly from one architecture to a different one, the per boot architecture information is lost and with it the ability of recalculating the template digest hash. To resolve this problem, without breaking the existing ABI, this patch set introduces the boot command line option "ima_canonical_fmt", which is arbitrarily defined as little endian. The need for this boot command line option will be limited to the existing version 1 format of the binary_runtime_measurements. Subsequent formats will be defined as canonical format (eg. TPM 2.0 support for larger digests). A simplified method of Thiago Bauermann's "kexec buffer handover" patch series for carrying the IMA measurement list across kexec is included in this patch set. The simplified method requires all file measurements be taken prior to executing the kexec load, as subsequent measurements will not be carried across the kexec and restored. This patch (of 10): The IMA kexec buffer allows the currently running kernel to pass the measurement list via a kexec segment to the kernel that will be kexec'd. The second kernel can check whether the previous kernel sent the buffer and retrieve it. This is the architecture-specific part which enables IMA to receive the measurement list passed by the previous kernel. It will be used in the next patch. The change in machine_kexec_64.c is to factor out the logic of removing an FDT memory reservation so that it can be used by remove_ima_buffer. Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20ipv4: Should use consistent conditional judgement for ip fragment in ↵zheng li
__ip_append_data and ip_finish_output There is an inconsistent conditional judgement in __ip_append_data and ip_finish_output functions, the variable length in __ip_append_data just include the length of application's payload and udp header, don't include the length of ip header, but in ip_finish_output use (skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ip header. That causes some particular application's udp payload whose length is between (MTU - IP Header) and MTU were fragmented by ip_fragment even though the rst->dev support UFO feature. Add the length of ip header to length in __ip_append_data to keep consistent conditional judgement as ip_finish_output for ip fragment. Signed-off-by: Zheng Li <james.z.li@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20mmc: core: Further fix thread wake-upAdrian Hunter
Commit e0097cf5f2f1 ("mmc: queue: Fix queue thread wake-up") did not go far enough. mmc_wait_for_data_req_done() still contains some problems and can be further simplified. First it should not touch context_info->is_waiting_last_req because that is a wake-up control used by the owner of the context. Secondly, it should always return when one of its wake-up conditions is met because, again, that is contolled by the owner of the context. While the current block driver does not have an issue, these problems were exposed during testing of the Software Command Queue patches. Fixes: e0097cf5f2f1 ("mmc: queue: Fix queue thread wake-up") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Harjani Ritesh <riteshh@codeaurora.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-20mmc: sdhci: Fix to handle MMC_POWER_UNDEFINEDAdrian Hunter
Since commit c2c24819b280 ("mmc: core: Don't power off the card when starting the host"), the power state can still be MMC_POWER_UNDEFINED after mmc_start_host() is called. That can trigger a warning in SDHCI during runtime resume as it tries to restore the I/O state. Handle MMC_POWER_UNDEFINED simply by not updating the I/O state in that case. Fixes: c2c24819b280 ("mmc: core: Don't power off the card when starting the host") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-20mmc: sdhci-cadence: add Socionext UniPhier specific compatible stringMasahiro Yamada
Add a Socionext SoC specific compatible (suggested by Rob Herring). No SoC specific data are associated with the compatible strings for now, but other SoC vendors may use this IP and want to differentiate IP variants in the future. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-19NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCESTrond Myklebust
If our DELEGRETURN RPC call is rejected with an EACCES call, then we should remove the GETATTR call from the compound RPC and retry. This could potentially happen when there is a conflict between an ACL denying attribute reads and our use of SP4_MACH_CRED. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCESTrond Myklebust
If our CLOSE RPC call is rejected with an EACCES call, then we should remove the GETATTR call from the compound RPC and retry. This could potentially happen when there is a conflict between an ACL denying attribute reads and our use of SP4_MACH_CRED. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: Place the GETATTR operation before the CLOSETrond Myklebust
In order to benefit from the DENY share lock protection, we should put the GETATTR operation before the CLOSE. Otherwise, we might race with a Windows machine that thinks it is now safe to modify the file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: Also ask for attributes when downgrading to a READ-only stateTrond Myklebust
If we're downgrading from a READ+WRITE mode to a READ-only mode, then ask for cache consistency attributes so that we avoid the revalidation in nfs_close_context() Fixes: 3947b74d0f9d ("NFSv4: Don't request a GETATTR on open_downgrade.") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()Trond Myklebust
The NFS_INO_REVAL_FORCED flag now really only has meaning for the case when we've just been handed a delegation for a file that was already cached, and we're unsure about that cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19pNFS: Return RW layouts on OPEN_DOWNGRADETrond Myklebust
If the client holds no more writeable open state, and does not hold a write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE. We do this only for writes, since some layout drivers may require you to also hold a read layout if you are doing a R/W workload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADETrond Myklebust
While we do not need to return the RW layout when downgrading from a read/write open state to read-only, we might want to do so in order to reduce the burden on the metadataserver so that it does not need to check for changed data when responding to GETATTR requests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQIDNeilBrown
When an NFS4ERR_BAD_SEQID is received the open-owner is removed from the ->state_owners rbtree so that it will no longer be used. If any stateids attached to this open-owner are still in use, and if a request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad. The state is marked as needing recovery and the nfs4_state_manager() is scheduled to clean up. nfs4_state_manager() finds states to be recovered by walking the state_owners rbtree. As the open-owner is not in the rbtree, the bad state is not found so nfs4_state_manager() completes having done nothing. The request is then retried, with a predicatable result (indefinite retries). If the stateid is for a delegation, this open_owner will be used to open files when the delegation is returned. For that to work, a new open-owner needs to be presented to the server. This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner in the rbtree but updates the 'create_time' so it looks like a new open-owner. With this the indefinite retries no longer happen. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: ensure __nfs4_find_lock_state returns consistent result.NeilBrown
If a file has both flock locks and OFD locks, then it is possible that two different nfs4 lock states could apply to file accesses from a single process. It is not possible to know, efficiently, which one is "correct". Presumably the state which represents a lock that covers the region undergoing IO would be the "correct" one to use, but finding that has a non-trivial cost and would provide miniscule value. Currently we just return whichever is first in the list, which could result in inconsistent behaviour if an application ever put it self in this position. As consistent behaviour is preferable (when perfectly correct behaviour is not available), change the search to return a consistent result in this circumstance. Specifically: if there is both a flock and OFD lock state, always return the flock one. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.NeilBrown
Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds', then ds->ds_clp will also be non-NULL. This is not necessasrily true in the case when the process received a fatal signal while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the devid may not recently have been marked unavailable. So add a test for ds_clp == NULL and return NULL in that case. Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Olga Kornievskaia <aglo@umich.edu> Acked-by: Adamson, Andy <William.Adamson@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19pNFS/flexfiles: delete deviceid, don't mark inactiveWeston Andros Adamson
Instead of marking a device inactive, remove it from the cache entirely. Flexfiles has a way to report errors back to the server, so we don't want to stop devices from being tried again for 120 seconds. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Clean up nfs_attribute_timeout()Trond Myklebust
It can be made static. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Remove unused function nfs_revalidate_inode_rcu()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Fix and clean up the access cache validity checkingTrond Myklebust
The access cache needs to check whether or not the mode bits, ownership, or ACL has changed or the cache has timed out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Only look at the change attribute cache state in nfs_weak_revalidate()Trond Myklebust
Just like in nfs_check_verifier(), we want to use nfs_mapping_need_revalidate_inode() to check our knowledge of the change attribute is up to date. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Clean up cache validity checkingTrond Myklebust
Consolidate the open-coded checking of NFS_I(inode)->cache_validity into a couple of helper functions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFS: Don't revalidate the file on close if we hold a delegationTrond Myklebust
If we're holding a delegation, we can skip sending the close-to-open GETATTR until we're returning that delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURNTrond Myklebust
DELEGRETURN will always carry a reference to the inode except when the latter is being freed, so let's ensure that we always use that inode information to ensure close-to-open cache consistency, even when the DELEGRETURN call is asynchronous. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19NFSv4: Update the attribute cache info in update_changeattrTrond Myklebust
If we successfully updated the change attribute, we should timestamp the cache. While we do know that the other attributes are not completely up to date, we have the NFS_INO_INVALID_ATTR flag that let us know that, so it is valid to say that the cache has not timed out. We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute is now known to be valid. Conversely, if the change attribute did not match, we should make sure to also revalidate the access and ACL caches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, fsnotify and ext2 updates from Jan Kara: "Changes to locking of some quota operations from dedicated quota mutex to s_umount semaphore, a fsnotify fix and a simple ext2 fix" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Fix bogus warning in dquot_disable() fsnotify: Fix possible use-after-free in inode iteration on umount ext2: reject inodes with negative size quota: Remove dqonoff_mutex ocfs2: Use s_umount for quota recovery protection quota: Remove dqonoff_mutex from dquot_scan_active() ocfs2: Protect periodic quota syncing with s_umount semaphore quota: Use s_umount protection for quota operations quota: Hold s_umount in exclusive mode when enabling / disabling quotas fs: Provide function to get superblock with exclusive s_umount
2016-12-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Early fixes for x86. Instead of the (botched) revert, the lockdep/might_sleep splat has a real fix provided by Andrea" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) kvm: take srcu lock around kvm_steal_time_set_preempted() kvm: fix schedule in atomic in kvm_steal_time_set_preempted() KVM: hyperv: fix locking of struct kvm_hv fields KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest. kvm: nVMX: Correct a VMX instruction error code for VMPTRLD
2016-12-19Merge branch 'dmi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi fix from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi_scan: Always show system identification string
2016-12-19Merge tag 'mfd-for-linus-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Device Support - Add support for Ricoh RC5T619 PMIC to rn5t618 - Add support for PM8821 PMIC to qcom-pm8xxx New Functionality: - Add support for GPIO to lpc_ich - Add support for GPADC to sun4i - Add ability for rk808 to shutdown Fix-ups: - Simplify/strip unnecessary code; tps65218, palmas, tps65217 - Device Tree binding updates; tps65218, altera-a10sr - Provide/export device ID info; tps65218, axp20x-i2c, hi655x-pmic, fsl-imx25-tsadc, intel_soc_pmic_bxtwc - Use MFD API instead of of_platform_populate(); tps65218 - Generalise name-space; pm8xxx - Supply/edit regmap configuration; axp20x, cs47l24-tables, axp20x - Enable compile testing; max77620, max77686, exynos-lpass, abx500-core - Coding style issues; wm8994-core, wm5102-tables - Supply endian support; syscon - Remove module support; ab3100-core, ab8500-debugfs, ab8500-gpadc, abx500-core Bug Fixes: - Fix ordering issues; wm8994 - Fix dependencies (build-time/run-time); exynos_lpass, sun4i-gpadc - Fix compiler warnings; sun4i-gpadc - Fix leaks; mfd-core - Fix page fault during module unload; tps65217" * tag 'mfd-for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (49 commits) mfd: tps65217: Support an interrupt pin as the system wakeup mfd: tps65217: Make an interrupt handler simpler mfd: tps65217: Update register interrupt mask bits instead of writing operation mfd: tps65217: Specify the IRQ name mfd: tps65217: Fix page fault on unloading modules mfd: palmas: Remove redundant check in palmas_power_off mfd: arizona: Disable IRQs during driver remove mfd: pm8xxx: add support to pm8821 mfd: intel-lpss: Try to enable Memory-Write-Invalidate mfd: rn5t618: Add Ricoh RC5T619 PMIC support mfd: axp20x: Add address extension registers for AXP806 regmap mfd: intel_soc_pmic_bxtwc: Fix a typo in MODULE_DEVICE_TABLE() mfd: core: Fix device reference leak in mfd_clone_cell mfd: bcm590xx: Simplify a test mfd: sun4i-gpadc: Select regmap-irq mfd: abx500-core: drop unused MODULE_ tags from non-modular code mfd: ab8500: make sysctrl explicitly non-modular mfd: ab8500-gpadc: Make it explicitly non-modular mfd: ab8500-debugfs: Make it explicitly non-modular mfd: ab8500-core: Make it explicitly non-modular ...
2016-12-19stmmac: fix memory barriersPavel Machek
Fix up memory barriers in stmmac driver. They are meant to protect against DMA engine, so smp_ variants are certainly wrong, and dma_ variants are preferable. Signed-off-by: Pavel Machek <pavel@denx.de> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-19net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from ↵Arvind Yadav
devm_ioremap Here, If devm_ioremap will fail. It will return NULL. Kernel can run into a NULL-pointer dereference. This error check will avoid NULL pointer dereference. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-19kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)Jim Mattson
When L2 exits to L0 due to "exception or NMI", software exceptions (#BP and #OF) for which L1 has requested an intercept should be handled by L1 rather than L0. Previously, only hardware exceptions were forwarded to L1. Signed-off-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19kvm: take srcu lock around kvm_steal_time_set_preempted()Andrea Arcangeli
kvm_memslots() will be called by kvm_write_guest_offset_cached() so take the srcu lock. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19kvm: fix schedule in atomic in kvm_steal_time_set_preempted()Andrea Arcangeli
kvm_steal_time_set_preempted() isn't disabling the pagefaults before calling __copy_to_user and the kernel debug notices. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19mailbox: mailbox-test: allow reserved areas in SRAMSudeep Holla
When CONFIG_SRAM is enable and the SRAM region is found, the entire SRAM region resource is requested and marked as occupied by SRAM driver even if certain parts of regions is marked reserved. It's quite possible that a small region of the SRAM is reserved for all the mailbox communication and hence it may fail to request the region as it's already marked busy region. This patch tries to just do a ioremap of this mailbox memory region if it finds it busy. Cc: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-12-19mailbox: mailbox-test: add support for fasync/pollSudeep Holla
Currently the read operation on the message debug file returns error if there's no data ready to be read. It expects the userspace to retry if it fails. Since the mailbox response could be asynchronous, it would be good to add support to block the read until the data is available. We can also implement poll file operations so that the userspace can wait to become ready to perform any I/O. This patch implements the poll and fasync file operation callback for the test mailbox device. Cc: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-12-19mailbox: bcm-pdc: Remove unnecessary void* castsRob Rice
Remove unnecessary void* casts in register writes. Fix two other minor formatting issues. Signed-off-by: Rob Rice <rob.rice@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jon Mason <jon.mason@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-12-19mailbox: bcm-pdc: Simplify interrupt handler logicRob Rice
Earlier versions of the PDC driver registered for both transmit and receive interrupts. The hard IRQ handler had to communicate to the soft handler which interrupt(s) had occurred. The PDC driver no longer registers for tx interrupts. So there is no reason to save the intstatus. So remove the intstatus member of the PDC state. Signed-off-by: Rob Rice <rob.rice@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-12-19mailbox: bcm-pdc: Performance improvementsRob Rice
Three changes to improve performance in the PDC driver: - disable and reenable interrupts while the interrupt handler is running - update rxin and txin descriptor indexes more efficiently - group receive descriptor context into a structure and keep context in a single array rather than five to improve locality of reference Signed-off-by: Rob Rice <rob.rice@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-12-19mailbox: bcm-pdc: Don't use iowrite32 to write DMA descriptorsRob Rice
In PDC driver, it is not necessary to use iowrite32() when writing DMA descriptors to the transmit and receive rings. The ring memory is in host memory. So convert to normal assignment statements. Signed-off-by: Rob Rice <rob.rice@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>