summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-04-04libceph: rename __decode_pool{,_names}() to decode_pool{,_names}()Ilya Dryomov
To be in line with all the other osdmap decode helpers. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: fix and clarify ceph_decode_need() sizesIlya Dryomov
Sum up sizeof(...) results instead of (incorrectly) hard-coding the number of bytes, expressed in ints and longs. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: nuke bogus encoding version check in osdmap_apply_incremental()Ilya Dryomov
Only version 6 of osdmap encoding is supported, anything other than version 6 results in an error and halts the decoding process. Checking if version is >= 5 is therefore bogus. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: fixup error handling in osdmap_apply_incremental()Ilya Dryomov
The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Follow osdmap_decode() and fix this by adding a special e_inval label to be used by all ceph_decode_* macros. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: fix crush_decode() call site in osdmap_decode()Ilya Dryomov
The size of the memory area feeded to crush_decode() should be limited not only by osdmap end, but also by the crush map length. Also, drop unnecessary dout() (dout() in crush_decode() conveys the same info) and step past crush map only if it is decoded successfully. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: check length of osdmap osd arraysIlya Dryomov
Check length of osd_state, osd_weight and osd_addr arrays. They should all have exactly max_osd elements after the call to osdmap_set_max_osd(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: safely decode max_osd value in osdmap_decode()Ilya Dryomov
max_osd value is not covered by any ceph_decode_need(). Use a safe version of ceph_decode_* macro to decode it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: fixup error handling in osdmap_decode()Ilya Dryomov
The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Fix this by adding a special e_inval label to be used by all ceph_decode_* macros. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: split osdmap allocation and decode stepsIlya Dryomov
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: dump osdmap and enhance output on decode errorsIlya Dryomov
Dump osdmap in hex on both full and incremental decode errors, to make it easier to match the contents with error offset. dout() map epoch and max_osd value on success. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: dump pg_temp mappings to debugfsIlya Dryomov
Dump pg_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap, one 'pg_temp <pgid> [<osd>, ..., <osd>]' per line, e.g: pg_temp 2.6 [2,3,4] Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: do not prefix osd lines with \t in debugfs outputIlya Dryomov
To save screen space in anticipation of more fields (e.g. primary affinity). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04libceph: refer to osdmap directly in osdmap_show()Ilya Dryomov
To make it more readable and save screen space. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04crush: support chooseleaf_vary_r tunable (tunables3) by defaultIlya Dryomov
Add TUNABLES3 feature (chooseleaf_vary_r tunable) to a set of features supported by default. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04crush: add SET_CHOOSELEAF_VARY_R stepIlya Dryomov
This lets you adjust the vary_r tunable on a per-rule basis. Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04crush: add chooseleaf_vary_r tunableIlya Dryomov
The current crush_choose_firstn code will re-use the same 'r' value for the recursive call. That means that if we are hitting a collision or rejection for some reason (say, an OSD that is marked out) and need to retry, we will keep making the same (bad) choice in that recursive selection. Introduce a tunable that fixes that behavior by incorporating the parent 'r' value into the recursive starting point, so that a different path will be taken in subsequent placement attempts. Note that this was done from the get-go for the new crush_choose_indep algorithm. This was exposed by a user who was seeing PGs stuck in active+remapped after reweight-by-utilization because the up set mapped to a single OSD. Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04crush: allow crush rules to set (re)tries counts to 0Ilya Dryomov
These two fields are misnomers; they are *retry* counts. Reflects ceph.git commit f17caba8ae0cad7b6f8f35e53e5f73b444696835. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04crush: fix off-by-one errors in total_tries refactorIlya Dryomov
Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH code to allow adjustment of the retry counts on a per-pool basis. That commit had an off-by-one bug: the previous "tries" counter was a *retry* count, not a *try* count, but the new code was passing in 1 meaning there should be no retries. Fix the ftotal vs tries comparison to use < instead of <= to fix the problem. Note that the original code used <= here, which means the global "choose_total_tries" tunable is actually counting retries. Compensate for that by adding 1 in crush_do_rule when we pull the tunable into the local variable. This was noticed looking at output from a user provided osdmap. Unfortunately the map doesn't illustrate the change in mapping behavior and I haven't managed to construct one yet that does. Inspection of the crush debug output now aligns with prior versions, though. Reflects ceph.git commit 795704fd615f0b008dcc81aa088a859b2d075138. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04ceph: don't include ceph.{file,dir}.layout vxattr in listxattr()Yan, Zheng
This avoids 'cp -a' modifying layout of new files/directories. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: check buffer size in ceph_vxattrcb_layout()Yan, Zheng
If buffer size is zero, return the size of layout vxattr. If buffer size is not zero, check if it is large enough for layout vxattr. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: fix null pointer dereference in discard_cap_releases()Yan, Zheng
send_mds_reconnect() may call discard_cap_releases() after all release messages have been dropped by cleanup_cap_releases() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04libceph: fix oops in ceph_msg_data_{pages,pagelist}_advance()Yan, Zheng
When there is no more data, ceph_msg_data_{pages,pagelist}_advance() should not move on to the next page. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: Remove get/set acl on symlinksFabian Frederick
Remove unsupported symlink operations. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-04ceph: set mds_wanted when MDS reply changes a cap to auth capYan, Zheng
When adjusting caps client wants, MDS does not record caps that are not allowed. For non-auth MDS, it does not record WR caps. So when a MDS reply changes a non-auth cap to auth cap, client needs to set cap's mds_wanted according to the reply. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: use fl->fl_file as owner identifier of flock and posix lockYan, Zheng
flock and posix lock should use fl->fl_file instead of process ID as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner is usually equal to fl->fl_file, but it also can be a customized value). The process ID of who holds the lock is just for F_GETLK fcntl(2). The fix is rename the 'pid' fields of struct ceph_mds_request_args and struct ceph_filelock to 'owner', rename 'pid_namespace' fields to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages. We also set the most significant bit of the 'owner' field. MDS can use that bit to distinguish between old and new clients. The MDS counterpart of this patch modifies the flock code to not take the 'pid_namespace' into consideration when checking conflict locks. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04ceph: forbid mandatory file lockYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: use fl->fl_type to decide flock operationYan, Zheng
VFS does not directly pass flock's operation code to filesystem's flock callback. It translates the operation code to the form how posix lock's parameters are presented. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: update i_max_size even if inode version does not changeYan, Zheng
handle following sequence of events: - client releases a inode with i_max_size > 0. The release message is queued. (is not sent to the auth MDS) - a 'lookup' request reply from non-auth MDS returns the same inode. - client opens the inode in write mode. The version of inode trace in 'open' request reply is equal to the cached inode's version. - client requests new max size. The MDS ignores the request because it does not affect client's write range Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04ceph: make sure write caps are registered with auth MDSYan, Zheng
Only auth MDS can issue write caps to clients, so don't consider write caps registered with non-auth MDS as valid. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04Hexagon: update CR year for elf.hRichard Kuo
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04Hexagon: remove SP macroRichard Kuo
The SP/r29 macro wasn't used anywhere else and was causing conflicts with another module, so just remove it. Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04Hexagon: set ELF_EXEC_PAGESIZE to PAGE_SIZERichard Kuo
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04Hexagon: set the e_flags in user regset view for core dumpsRichard Kuo
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04Hexagon: fix atomic_setRichard Kuo
Normal writes in our our architecture don't invalidate lock reservations. Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04Hexagon: add screen_info for VGA_CONSOLERichard Kuo
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04hexagon: correct type on pgd copyIlia Mirkin
swapper_pg_dir is an array of pgd_t, not pgd_t*. This has no actual effect since sizeof(pgd_t) == sizeof(pgd_t*), but unconfuses tools that check types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04smp, hexagon: kill SMP single function call interruptJiang Liu
Commit 9a46ad6d6df3b54 "smp: make smp_call_function_many() use logic similar to smp_call_function_single()" has unified the way to handle single and multiple cross-CPU function calls. Now only one intterupt is needed for architecture specific code to support generic SMP function call interfaces, so kill the redundant single function call interrupt. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohua Li <shli@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Kosina <trivial@kernel.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: linux-hexagon@vger.kernel.org Signed-off-by: Jiang Liu <liuj97@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: include: asm: add generic macro 'mmiowb' in "io.h"Chen Gang
Need dumy mmiowb(), or can not pass compiling, the related error with allmodconfig: CC [M] drivers/mmc/host/sdhci.o drivers/mmc/host/sdhci.c: In function 'sdhci_request': drivers/mmc/host/sdhci.c:1409:2: error: implicit declaration of function 'mmiowb' [-Werror=implicit-function-declaration] Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: kernel: hexagon_ksyms.c: export related symbols which various ↵Chen Gang
modules need Need export all related functions and symbols for various modules with allmodconfig. The related errors: MODPOST 2879 modules ERROR: "__vmyield" [sound/sound_firmware.ko] undefined! ERROR: "__phys_offset" [sound/drivers/snd-dummy.ko] undefined! ERROR: "ioremap_nocache" [drivers/char/ipmi/ipmi_si.ko] undefined! ERROR: "__iounmap" [drivers/char/ipmi/ipmi_si.ko] undefined! ... For including files, need "linux/*" first, then "asm/*". All related included files and symbols need be sorted by alphabetical order. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: kernel: reset.c: use function pointer instead of function for ↵Chen Gang
pm_power_off and export it 'pm_power_off' is a function pointer, not a function, so need change its type, and also need export it, or can not pass compiling with allmodconfig. The related error: MODPOST 2879 modules ERROR: "pm_power_off" [drivers/char/ipmi/ipmi_poweroff.ko] undefined! Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: include: asm: add "vga.h" in KbuildChen Gang
Need include generic "vga.h", or can not pass compiling with allmodconfig, the related error: CC [M] drivers/gpu/drm/drm_irq.o In file included from include/linux/vgaarb.h:34:0, from drivers/gpu/drm/drm_irq.c:42: include/video/vga.h:22:21: fatal error: asm/vga.h: No such file or directory Also move "preempt.h" upper to match sort order. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: include: asm: Kbuild: add generic "serial.h" in KbuildChen Gang
Add "serial.h" in Kbuild, or can not pass compiling with allmodconfig, the related error: CC [M] drivers/staging/speakup/speakup_acntpc.o In file included from drivers/staging/speakup/speakup_acntpc.c:33:0: drivers/staging/speakup/serialio.h:7:24: fatal error: asm/serial.h: No such file or directory Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: include: uapi: asm: setup.h add swith macro __KERNEL__Chen Gang
Define dummy '__init' instead of include "linux/init.h" if !__KERNEL__, or can not pass checking. The related error (with allmodconfig under hexagon): CHECK include/asm (34 files) usr/include/asm/setup.h:22: included file 'linux/init.h' is not exported Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: include: asm: add prefix "hvm[ci]_" for all enum members in ↵Chen Gang
"hexagon_vm.h" Append "hvmc_" or "hvmi_" to all related enum members (which are too common to make conflict with another sub-systems). The related error with allmodconfig: CC [M] drivers/md/raid1.o drivers/md/raid1.c:1440:13: error: 'status' redeclared as different kind of symbol arch/hexagon/include/asm/hexagon_vm.h:76:2: note: previous definition of 'status' was here Also use 'affinity' instead of 'locdis' for __vmintop_affinity(). Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: Kconfig: add HAVE_DMA_ATTR in Kconfig and remove ↵Chen Gang
"linux/dma-mapping.h" from "asm/dma-mapping.h" When HAS_DMA, and also need use generic implementation, HAVE_DMA_ATTR must be enabled, or can not pass compiling with allmodconfig, the related error: CC [M] drivers/ata/libata-core.o drivers/ata/libata-core.c: In function 'ata_sg_clean': drivers/ata/libata-core.c:4598:3: error: implicit declaration of function 'dma_unmap_sg' [-Werror=implicit-function-declaration] drivers/ata/libata-core.c: In function 'ata_sg_setup': drivers/ata/libata-core.c:4708:2: error: implicit declaration of function 'dma_map_sg' [-Werror=implicit-function-declaration] "linux/dma-mapping.h" will include "asm/dma-mapping.h", so need remove "linux/dma-mapping.h" from "asm/dma-mapping.h", Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04arch: hexagon: kernel: add export symbol function __delay()Chen Gang
Need add __delay() implementation, or can not pass allmodconfig in next-20131118 tree. The related error: CC kernel/locking/spinlock_debug.o kernel/locking/spinlock_debug.c: In function '__spin_lock_debug': kernel/locking/spinlock_debug.c:114:3: error: implicit declaration of function '__delay' [-Werror=implicit-function-declaration] Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04hexagon: include: asm: kgdb: extend DBG_MAX_REG_NUM for "cs0/1"Chen Gang
Need extend maximized number for "cs0/1", the related warning (with allmodconfig for v4): arch/hexagon/kernel/kgdb.c:79: warning: excess elements in array initializer arch/hexagon/kernel/kgdb.c:79: warning: (near initialization for 'dbg_reg_def') arch/hexagon/kernel/kgdb.c:80: warning: excess elements in array initializer arch/hexagon/kernel/kgdb.c:80: warning: (near initialization for 'dbg_reg_def') Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04hexagon: kernel: kgdb: include related header for pass compiling.Chen Gang
Need include related headers for pass compiling, the related error (with allmodconfig for v4): CC arch/hexagon/kernel/kgdb.o arch/hexagon/kernel/kgdb.c:30: error: invalid use of undefined type 'struct pt_regs' arch/hexagon/kernel/kgdb.c:31: error: invalid use of undefined type 'struct pt_regs' ... arch/hexagon/kernel/kgdb.c:220: error: implicit declaration of function 'local_irq_save' arch/hexagon/kernel/kgdb.c:222: error: implicit declaration of function 'local_irq_restore' ... Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04hexagon: kernel: remove useless variables 'dn', 'r' and 'err' in ↵Chen Gang
time_init_deferred() in "time.c" Remove them, since they are useless. The related warnings (with allmodconfig for v4): CC arch/hexagon/kernel/time.o arch/hexagon/kernel/time.c: In function 'time_init_deferred': arch/hexagon/kernel/time.c:196: warning: unused variable 'err' arch/hexagon/kernel/time.c:195: warning: unused variable 'r' arch/hexagon/kernel/time.c:194: warning: unused variable 'dn' Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
2014-04-04mm: get_user_pages(write,force) refuse to COW in shared areasHugh Dickins
get_user_pages(write=1, force=1) has always had odd behaviour on write- protected shared mappings: although it demands FMODE_WRITE-access to the underlying object (do_mmap_pgoff sets neither VM_SHARED nor VM_MAYWRITE without that), it ends up with do_wp_page substituting private anonymous Copied-On-Write pages for the shared file pages in the area. That was long ago intentional, as a safety measure to prevent ptrace setting a breakpoint (or POKETEXT or POKEDATA) from inadvertently corrupting the underlying executable. Yet exec and dynamic loaders open the file read-only, and use MAP_PRIVATE rather than MAP_SHARED. The traditional odd behaviour still causes surprises and bugs in mm, and is probably not what any caller wants - even the comment on the flag says "You do not want this" (although it's undoubtedly necessary for overriding userspace protections in some contexts, and good when !write). Let's stop doing that. But it would be dangerous to remove the long- standing safety at this stage, so just make get_user_pages(write,force) fail with EFAULT when applied to a write-protected shared area. Infiniband may in future want to force write through to underlying object: we can add another FOLL_flag later to enable that if required. Odd though the old behaviour was, there is no doubt that we may turn out to break userspace with this change, and have to revert it quickly. Issue a WARN_ON_ONCE to help debug the changed case (easily triggered by userspace, so only once to prevent spamming the logs); and delay a few associated cleanups until this change is proved. get_user_pages callers who might see trouble from this change: ptrace poking, or writing to /proc/<pid>/mem drivers/infiniband/ drivers/media/v4l2-core/ drivers/gpu/drm/exynos/exynos_drm_gem.c drivers/staging/tidspbridge/core/tiomap3430.c if they ever apply get_user_pages to write-protected shared mappings of an object which was opened for writing. I went to apply the same change to mm/nommu.c, but retreated. NOMMU has no place for COW, and its VM_flags conventions are not the same: I'd be more likely to screw up NOMMU than make an improvement there. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>