summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-10-03Merge tag 'for-4.9' of github.com:linux-nand/linuxBrian Norris
" Notable core changes: - add the infrastructure to automate NAND timings configuration - provide a generic DT property to maximize ECC strength The rest is just a bunch of minor drivers and core fixes/cleanup patches. " Also not noted: some refactoring in the core bad block table handling, to help with improving some of the logic in error cases. Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2016-10-03Merge tag 'pm-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Traditionally, cpufreq is the area with the greatest number of changes, but there are fewer of them than last time. There also is some activity in the generic power domains and the devfreq frameworks, a couple of system suspend and hibernation fixes and some assorted changes in other places. One new feature is the cpufreq change to allow the scheduler to pass hints to the governors' utilization update callbacks and some code rework based on that. Another one is the support for domain removal in the generic power domains framework. Also it is now possible to use hibernation with PAGE_POISONING_ZERO enabled and devfreq supports the RockChip DFI controller and the rk3399 DMC. The rest of the changes is mostly fixes and cleanups in a number of places. Specifics: - Add a mechanism for passing hints from the scheduler to cpufreq governors via their utilization update callbacks and use it to introduce "IOwait boosting" into the schedutil governor and intel_pstate that will make them boost performance if the enqueued task was previously waiting on I/O (Rafael Wysocki). - Fix a schedutil governor problem that causes it to overestimate utilization if SMT is in use (Steve Muckle). - Update defconfigs trying to use the schedutil governor as a module which is not possible any more (Javier Martinez Canillas). - Update the intel_pstate's pstate_sample tracepoint to take "IOwait boosting" into account (Srinivas Pandruvada). - Fix a problem in the cpufreq core causing it to mishandle the initialization of CPUs registered after the cpufreq driver (Viresh Kumar, Rafael Wysocki). - Make the cpufreq-dt driver support per-policy governor tunables, clean it up and update its Kconfig description (Viresh Kumar). - Add support for more ARM platforms to the cpufreq-dt driver (Chanwoo Choi, Dave Gerlach, Geert Uytterhoeven). - Make the cpufreq CPPC driver report frequencies in KHz to avoid user space compatiblility issues (Al Stone, Hoan Tran). - Clean up a few cpufreq drivers (st, kirkwood, SCPI) a bit (Colin Ian King, Markus Elfring). - Constify some local structures in the intel_pstate driver (Julia Lawall). - Add a Documentation/cpu-freq/ entry to MAINTAINERS (Jean Delvare). - Add support for PM domain removal to the generic power domains (genpd) framework, add new DT helper functions to it and make it always enable debugfs support if available (Jon Hunter, Tomeu Vizoso). - Clean up the generic power domains (genpd) framework and make it avoid measuring power-on and power-off latencies during system-wide PM transitions (Ulf Hansson). - Add support for the RockChip DFI controller and the rk3399 DMC to the devfreq framework (Lin Huang, Axel Lin, Arnd Bergmann). - Add COMPILE_TEST to the devfreq framework (Krzysztof Kozlowski, Stephen Rothwell). - Fix a minor issue in the exynos-ppmu devfreq driver and fix up devfreq Kconfig indentation style (Wei Yongjun, Jisheng Zhang). - Fix the system suspend interface to make suspend-to-idle work if platform suspend operations have not been registered (Sudeep Holla). - Make it possible to use hibernation with PAGE_POISONING_ZERO enabled (Anisse Astier). - Increas the default timeout of the system suspend/resume watchdog and make it depend on EXPERT (Chen Yu). - Make the operating performance points (OPP) framework avoid using OPPs that aren't supported by the platform and fix a build warning in it (Dave Gerlach, Arnd Bergmann). - Fix the ARM cpuidle driver's return value (Christophe Jaillet). - Make the SmartReflex AVS (Adaptive Voltage Scaling) driver use more common logging style (Joe Perches)" * tag 'pm-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits) PM / OPP: Don't support OPP if it provides supported-hw but platform does not cpufreq: st: add missing \n to end of dev_err message cpufreq: kirkwood: add missing \n to end of dev_err messages PM / Domains: Rename pm_genpd_sync_poweron|poweroff() PM / Domains: Don't measure latency of ->power_on|off() during system PM PM / Domains: Remove redundant system PM callbacks PM / Domains: Simplify detaching a device from its genpd PM / devfreq: rk3399_dmc: Remove explictly regulator_put call in .remove PM / devfreq: rockchip: add PM_DEVFREQ_EVENT dependency PM / OPP: avoid maybe-uninitialized warning PM / Domains: Allow holes in genpd_data.domains array cpufreq: CPPC: Avoid overflow when calculating desired_perf cpufreq: ti: Use generic platdev driver cpufreq: intel_pstate: Add io_boost trace partial revert of "PM / devfreq: Add COMPILE_TEST for build coverage" cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm cpufreq: schedutil: Add iowait boosting cpufreq / sched: SCHED_CPUFREQ_IOWAIT flag to indicate iowait condition PM / Domains: Add support for removing nested PM domains by provider PM / Domains: Add support for removing PM domains ...
2016-10-03xfs: introduce reflink utility functionsDarrick J. Wong
These functions will be used by the other reflink functions to find the maximum length of a range of shared blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: reserve AG space for the refcount btree rootDarrick J. Wong
Reduce the max AG usable space size so that we always have space for the refcount btree root. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: add refcount btree block detection to log recoveryDarrick J. Wong
Identify refcountbt blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: adjust refcount when unmapping file blocksDarrick J. Wong
When we're unmapping blocks from a reflinked file, decrease the refcount of the affected blocks and free the extents that are no longer in use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: connect refcount adjust functions to upper layersDarrick J. Wong
Plumb in the upper level interface to schedule and finish deferred refcount operations via the deferred ops mechanism. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: adjust refcount of an extent of blocks in refcount btreeDarrick J. Wong
Provide functions to adjust the reference counts for an extent of physical blocks stored in the refcount btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: log refcount intent itemsDarrick J. Wong
Provide a mechanism for higher levels to create CUI/CUD items, submit them to the log, and a stub function to deal with recovered CUI items. These parts will be connected to the refcountbt in a later patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: create refcount update intent log itemsDarrick J. Wong
Create refcount update intent/done log items to record redo information in the log. Because we need to roll transactions between updating the bmbt mapping and updating the reverse mapping, we also have to track the status of the metadata updates that will be recorded in the post-roll transactions, just in case we crash before committing the final transaction. This mechanism enables log recovery to finish what was already started. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: add refcount btree operationsDarrick J. Wong
Implement the generic btree operations required to manipulate refcount btree blocks. The implementation is similar to the bmapbt, though it will only allocate and free blocks from the AG. Since the refcount root and level fields are separate from the existing roots and levels array, they need a separate logging flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: fix logging of AGF refcount btree fields] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: account for the refcount btree in the alloc/free log reservationDarrick J. Wong
Every time we allocate or free a data extent, we might need to split the refcount btree. Reserve some blocks in the transaction to handle this possibility. Even though the deferred refcount code can roll a transaction to avoid overloading the transaction, we can still exceed the reservation. Certain pathological workloads (1k blocks, no cowextsize hint, random directio writes), cause a perfect storm wherein a refcount adjustment of a large range of blocks causes full tree splits in two separate extents in two separate refcount tree blocks; allocating new refcount tree blocks causes rmap btree splits; and all the allocation activity causes the freespace btrees to split, blowing the reservation. (Reproduced by generic/167 over NFS atop XFS) Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick.wong@oracle.com: add commit message] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-03xfs: add refcount btree support to growfsDarrick J. Wong
Modify the growfs code to initialize new refcount btree blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: define the on-disk refcount btree formatDarrick J. Wong
Start constructing the refcount btree implementation by establishing the on-disk format and everything needed to read, write, and manipulate the refcount btree blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: refcount btree add more reserved blocksDarrick J. Wong
Since XFS reserves a small amount of space in each AG as the minimum free space needed for an operation, save some more space in case we touch the refcount btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: introduce refcount btree definitionsDarrick J. Wong
Add new per-AG refcount btree definitions to the per-AG structures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: define tracepoints for refcount btree activitiesDarrick J. Wong
Define all the tracepoints we need to inspect the refcount btree runtime operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: return an error when an inline directory is too smallDarrick J. Wong
If the size of an inline directory is so small that it doesn't even cover the required header size, return an error to userspace instead of ASSERTing and returning 0 like everything's ok. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocksDarrick J. Wong
Add a new fallocate mode flag that explicitly unshares blocks on filesystems that support such features. The new flag can only be used with an allocate-mode fallocate call. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-10-03vfs: support FS_XFLAG_COWEXTSIZE and get/set of CoW extent size hintDarrick J. Wong
Introduce XFLAGs for the new XFS CoW extent size hint, and actually plumb the CoW extent size hint into the fsxattr structure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "It's a bit all over the place this time with no "killer feature" to speak of. Support for mismatched cache line sizes should help people seeing whacky JIT failures on some SoCs, and the big.LITTLE perf updates have been a long time coming, but a lot of the changes here are cleanups. We stray outside arch/arm64 in a few areas: the arch/arm/ arch_timer workaround is acked by Russell, the DT/OF bits are acked by Rob, the arch_timer clocksource changes acked by Marc, CPU hotplug by tglx and jump_label by Peter (all CC'd). Summary: - Support for execute-only page permissions - Support for hibernate and DEBUG_PAGEALLOC - Support for heterogeneous systems with mismatches cache line sizes - Errata workarounds (A53 843419 update and QorIQ A-008585 timer bug) - arm64 PMU perf updates, including cpumasks for heterogeneous systems - Set UTS_MACHINE for building rpm packages - Yet another head.S tidy-up - Some cleanups and refactoring, particularly in the NUMA code - Lots of random, non-critical fixes across the board" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (100 commits) arm64: tlbflush.h: add __tlbi() macro arm64: Kconfig: remove SMP dependence for NUMA arm64: Kconfig: select OF/ACPI_NUMA under NUMA config arm64: fix dump_backtrace/unwind_frame with NULL tsk arm/arm64: arch_timer: Use archdata to indicate vdso suitability arm64: arch_timer: Work around QorIQ Erratum A-008585 arm64: arch_timer: Add device tree binding for A-008585 erratum arm64: Correctly bounds check virt_addr_valid arm64: migrate exception table users off module.h and onto extable.h arm64: pmu: Hoist pmu platform device name arm64: pmu: Probe default hw/cache counters arm64: pmu: add fallback probe table MAINTAINERS: Update ARM PMU PROFILING AND DEBUGGING entry arm64: Improve kprobes test for atomic sequence arm64/kvm: use alternative auto-nop arm64: use alternative auto-nop arm64: alternative: add auto-nop infrastructure arm64: lse: convert lse alternatives NOP padding to use __nops arm64: barriers: introduce nops and __nops macros for NOP sequences arm64: sysreg: replace open-coded mrs_s/msr_s with {read,write}_sysreg_s ...
2016-10-03IB/rdmavt: Trivial function comment corrected.Parav Pandit
Corrected function name in comment from qib_ to rvt_. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-03Merge branch 'pci/virtualization' into nextBjorn Helgaas
* pci/virtualization: PCI: xilinx: Relax device number checking to allow SR-IOV PCI: designware: Relax device number checking to allow SR-IOV PCI: altera: Relax device number checking to allow SR-IOV PCI: Check for pci_setup_device() failure in pci_iov_add_virtfn() PCI: Mark Atheros AR9580 to avoid bus reset
2016-10-03Merge branch 'pci/resource' into nextBjorn Helgaas
* pci/resource: PCI: Ignore requested alignment for VF BARs PCI: Ignore requested alignment for PROBE_ONLY and fixed resources
2016-10-03Merge branch 'pci/pm' into nextBjorn Helgaas
* pci/pm: PCI: Avoid unnecessary resume after direct-complete PCI: Recognize D3cold in pci_update_current_state() PCI: Query platform firmware for device power state PCI: Afford direct-complete to devices with non-standard PM
2016-10-03Merge branch 'pci/msi' into nextBjorn Helgaas
* pci/msi: PCI/MSI: Enable PCI_MSI_IRQ_DOMAIN support for ARC
2016-10-03Merge branch 'pci/misc' into nextBjorn Helgaas
* pci/misc: PCI: Drop CONFIG_KEXEC_CORE ifdeffery
2016-10-03Merge branch 'pci/hotplug' into nextBjorn Helgaas
* pci/hotplug: x86/PCI: VMD: Request userspace control of PCIe hotplug indicators PCI: pciehp: Allow exclusive userspace control of indicators PCI: pciehp: Remove useless pciehp_get_latch_status() calls PCI: pciehp: Clean up dmesg "Slot(%s)" messages PCI: pciehp: Remove unnecessary guard PCI: pciehp: Don't re-read Slot Status when handling surprise event PCI: pciehp: Don't re-read Slot Status when queuing hotplug event PCI: pciehp: Process all hotplug events before looking for new ones PCI: pciehp: Return IRQ_NONE when we can't read interrupt status PCI: pciehp: Rename pcie_isr() locals for clarity PCI: pciehp: Clear attention LED on device add
2016-10-03Merge branch 'pci/enumeration' into nextBjorn Helgaas
* pci/enumeration: PCI: tegra: Fix pci_remap_iospace() failure path PCI: generic: Fix pci_remap_iospace() failure path PCI: rcar: Fix pci_remap_iospace() failure path PCI: versatile: Fix pci_remap_iospace() failure path PCI: designware: Fix pci_remap_iospace() failure path PCI: aardvark: Fix pci_remap_iospace() failure path
2016-10-03Merge branch 'pci/aer' into nextBjorn Helgaas
* pci/aer: PCI/AER: Fix aer_probe() kernel-doc comment PCI/AER: Cache capability position PCI/AER: Avoid memory allocation in interrupt handling path ACPI / APEI: Send correct severity to calculate AER severity PCI/AER: Remove duplicate AER severity translation PCI/AER: Remove aerdriver.forceload kernel parameter PCI/AER: Remove aerdriver.nosourceid kernel parameter x86/PCI: VMD: Add quirk for AER to ignore source ID PCI/AER: Add bus flag to skip source ID matching Conflicts: drivers/pci/probe.c
2016-10-03perf tools: Add jsmn `jasmine' JSON parserAndi Kleen
I need a JSON parser. This adds the simplest JSON parser I could find -- Serge Zaitsev's jsmn `jasmine' -- to the perf library. I merely converted it to (mostly) Linux style and added support for non 0 terminated input. The parser is quite straight forward and does not copy any data, just returns tokens with offsets into the input buffer. So it's relatively efficient and simple to use. The code is not fully checkpatch clean, but I didn't want to completely fork the upstream code. Original source: http://zserge.bitbucket.org/jsmn.html In addition I added a simple wrapper that mmaps a json file and provides some straight forward access functions. Used in follow-on patches to parse event files. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1473978296-20712-2-git-send-email-sukadev@linux.vnet.ibm.com Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [ Use fcntl.h instead of sys/fcntl.h to fix the build on Alpine Linux 3.4/musl libc, use stdbool.h to avoid clashing with 'bool' typedef there ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-03tools build: Make fixdep a hostprogJiri Olsa
It is used in the build process, so stop suppressing its build in tools cross builds. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160927141846.GA6589@krava [ Use HOSTCC on the $(OUTPUT)fixdep target, it was using the x-compiler to link fixdep-in.o, that was correctly built with HOSTCC and thus failing ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-03tools build: Add support for host programs formatJiri Olsa
In some cases, like for fixdep and shortly for jevents, we need to build a tool to run on the host that will be used in building a tool, such as perf, that is being cross compiled, so do like the kernel and provide HOSTCC, HOSTLD and HOSTAR to do that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Requested-by: Andi Kleen <andi@firstfloor.org> Requested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160927141846.GA6589@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-03perf tools: Experiment with cppcheckArnaldo Carvalho de Melo
Experimenting a bit using cppcheck[1], a static checker brought to my attention by Colin, reducing the scope of some variables, reducing the line of source code lines in the process: $ cppcheck --enable=style tools/perf/util/thread.c Checking tools/perf/util/thread.c... [tools/perf/util/thread.c:17]: (style) The scope of the variable 'leader' can be reduced. [tools/perf/util/thread.c:133]: (style) The scope of the variable 'err' can be reduced. [tools/perf/util/thread.c:273]: (style) The scope of the variable 'err' can be reduced. Will continue later, but these are already useful, keep them. 1: https://sourceforge.net/p/cppcheck/wiki/Home/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ixws7lbycihhpmq9cc949ti6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-03perf probe: Check if *ptr2 is zero and not ptr2Colin Ian King
Static anaylsis with cppcheck[1] detected an incorrect comparison: [tools/perf/util/probe-event.c:216]: (warning) Char literal compared with pointer 'ptr2'. Did you intend to dereference it? Dereference ptr2 for the comparison to fix this. 1: https://sourceforge.net/p/cppcheck/wiki/Home/ Signed-off-by: Colin King <colin.king@canonical.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 35726d3a4ca9 ("perf probe: Fix to cut off incompatible chars from group name") Link: http://lkml.kernel.org/r/20161003103431.18534-1-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-03libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()Ilya Dryomov
A static bug finder (EBA) on Linux 4.7: Double lock in net/ceph/auth.c second lock at 108: mutex_lock(& ac->mutex); [ceph_auth_build_hello] after calling from 263: ret = ceph_auth_build_hello(ac, msg_buf, msg_len); if ! ac->protocol -> true at 262 first lock at 261: mutex_lock(& ac->mutex); [ceph_build_auth] ceph_auth_build_hello() is never called, because the protocol is always initialized, whether we are checking existing tickets (in delayed_work()) or getting new ones after invalidation (in invalidate_authorizer()). Reported-by: Iago Abal <iari@itu.dk> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03ceph: fix description for rsize and rasize mount optionsAndreas Gerstmayr
Signed-off-by: Andreas Gerstmayr <andreas.gerstmayr@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03rbd: use kmalloc_array() in rbd_header_from_disk()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Delete the local variable "size" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03ceph: use list_move instead of list_del/list_addWei Yongjun
Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03ceph: handle CEPH_SESSION_REJECT messageYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: avoid accessing / when mounting a subpathYan, Zheng
Accessing / causes failuire if the client has caps that restrict path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: fix mandatory flock checkYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: remove warning when ceph_releasepage() is called on dirty pageNeilBrown
If O_DIRECT writes are racing with buffered writes, then the call to invalidate_inode_pages2_range() can call ceph_releasepage() on dirty pages. Most filesystems hold inode_lock() across O_DIRECT writes so they do not suffer this race, but cephfs deliberately drops the lock, and opens a window for the race. This race can be triggered with the generic/036 test from the xfstests test suite. It doesn't happen every time, but it does happen often. As the possibilty is expected, remove the warning, and instead include the PageDirty() status in the debug message. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: ignore error from invalidate_inode_pages2_range() in direct writeNeilBrown
This call can fail if there are dirty pages. The preceding call to filemap_write_and_wait_range() will normally remove dirty pages, but as inode_lock() is not held over calls to ceph_direct_read_write(), it could race with non-direct writes and pages could be dirtied immediately after filemap_write_and_wait_range() returns If there are dirty pages, they will be removed by the subsequent call to truncate_inode_pages_range(), so having them here is not a problem. If the 'ret' value is left holding an error, then in the async IO case (aio_req is not NULL) the loop that would normally call ceph_osdc_start_request() will see the error in 'ret' and abort all requests. This doesn't seem like correct behaviour. So use separate 'ret2' instead of overloading 'ret'. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: fix error handling of start_read()Yan, Zheng
If start_page() fails to add a page to page cache or fails to send OSD request. It should cal put_page() (instead of free_page()) for relevant pages. Besides, start_page() need to cancel fscache readpage if it fails to send OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
2016-10-03rbd: add rbd_obj_request_error() helperIlya Dryomov
Pull setting an error and marking a request done code into a new helper. obj_request_img_data_test() check isn't strictly needed right now, but makes it applicable to !img_data requests and a bit safer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-10-03rbd: img_data requests don't own their page arrayIlya Dryomov
Move the check into rbd_obj_request_destroy() to avoid use-after-free on errors in rbd_img_request_fill(..., OBJ_REQUEST_PAGES, ...), where pages, owned by the caller, gets freed in rbd_img_request_fill(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
2016-10-03rbd: don't call rbd_osd_req_format_read() for !img_data requestsIlya Dryomov
Accessing obj_request->img_request union field is only valid for object requests associated with an image (i.e. if obj_request_img_data_test() returns true). rbd_osd_req_format_read() used to do more, but now it just sets osd_req->snap_id. Standalone and stat object requests always go to the HEAD revision and are fine with CEPH_NOSNAP set by libceph, so get around the invalid union field use by simply not calling rbd_osd_req_format_read() in those places. Reported-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
2016-10-03rbd: rework rbd_img_obj_exists_submit() error pathsIlya Dryomov
- don't put obj_request before rbd_obj_request_get() if rbd_obj_request_create() fails - don't leak pages if rbd_obj_request_create() fails - don't leak stat_request if rbd_osd_req_create() fails Reported-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>