summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-21mm: fix mlock accoutingKirill A. Shutemov
Tetsuo Handa reported underflow of NR_MLOCK on munlock. Testcase: #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #define BASE ((void *)0x400000000000) #define SIZE (1UL << 21) int main(int argc, char *argv[]) { void *addr; system("grep Mlocked /proc/meminfo"); addr = mmap(BASE, SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED | MAP_FIXED, -1, 0); if (addr == MAP_FAILED) printf("mmap() failed\n"), exit(1); munmap(addr, SIZE); system("grep Mlocked /proc/meminfo"); return 0; } It happens on munlock_vma_page() due to unfortunate choice of nr_pages data type: __mod_zone_page_state(zone, NR_MLOCK, -nr_pages); For unsigned int nr_pages, implicitly casted to long in __mod_zone_page_state(), it becomes something around UINT_MAX. munlock_vma_page() usually called for THP as small pages go though pagevec. Let's make nr_pages signed int. Similar fixes in 6cdb18ad98a4 ("mm/vmstat: fix overflow in mod_zone_page_state()") used `long' type, but `int' here is OK for a count of the number of sub-pages in a huge page. Fixes: ff6a6da60b89 ("mm: accelerate munlock() treatment of THP pages") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Michel Lespinasse <walken@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-21thp: change pmd_trans_huge_lock() interface to return ptlKirill A. Shutemov
After THP refcounting rework we have only two possible return values from pmd_trans_huge_lock(): success and failure. Return-by-pointer for ptl doesn't make much sense in this case. Let's convert pmd_trans_huge_lock() to return ptl on success and NULL on failure. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22MIPS: hpet: Choose a safe value for the ETIME checkHuacai Chen
This patch is borrowed from x86 hpet driver and explaind below: Due to the overly intelligent design of HPETs, we need to workaround the problem that the compare value which we write is already behind the actual counter value at the point where the value hits the real compare register. This happens for two reasons: 1) We read out the counter, add the delta and write the result to the compare register. When a NMI hits between the read out and the write then the counter can be ahead of the event already. 2) The write to the compare register is delayed by up to two HPET cycles in AMD chipsets. We can work around this by reading back the compare register to make sure that the written value has hit the hardware. But that is bad performance wise for the normal case where the event is far enough in the future. As we already know that the write can be delayed by up to two cycles we can avoid the read back of the compare register completely if we make the decision whether the delta has elapsed already or not based on the following calculation: cmp = event - actual_count; If cmp is less than 64 HPET clock cycles, then we decide that the event has happened already and return -ETIME. That covers the above #1 and #2 problems which would cause a wait for HPET wraparound (~306 seconds). Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12162/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-01-22MIPS: Loongson-3: Fix SMP_ASK_C0COUNT IPI handlerHuacai Chen
When Core-0 handle SMP_ASK_C0COUNT IPI, we should make other cores to see the result as soon as possible (especially when Store-Fill-Buffer is enabled). Otherwise, C0_Count syncronization makes no sense. BTW, array is more suitable than per-cpu variable for syncronization, and there is a corner case should be avoid: C0_Count of Core-0 can be really 0. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/12160/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-01-22MIPS: Loongson-3: Improve -march option and move it to PlatformHuacai Chen
If GCC >= 4.9 and Binutils >=2.25, we use -march=loongson3a, otherwise we use -march=mips64r2, this can slightly improve performance. Besides, arch/mips/loongson64/Platform is a better location rather than arch/ mips/Makefile. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12161/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-01-22MIPS: Cleanup the unused __arch_local_irq_restore() functionHuacai Chen
In history, __arch_local_irq_restore() is only used by SMTC. However, SMTC support has been removed since 3.16, this patch remove the unused function. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12159/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-01-21NTB: Fix macro parameter conflict with field nameAllen Hubbe
If the parameter given to the macro is replaced throughout the macro as it is evaluated. The intent is that the macro parameter should replace the only the first parameter to container_of(). However, the way the macro was written, it would also inadvertantly replace a structure field name. If a parameter of any other name is given to the macro, it will fail to compile, if the structure does not contain a field of the same name. At worst, it will compile, and hide improper access of an unintended field in the structure. Change the macro parameter name, so it does not conflict with the structure field name. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-01-21NTB: Add support for AMD PCI-Express Non-Transparent BridgeXiangliang Yu
This adds support for AMD's PCI-Express Non-Transparent Bridge (NTB) device on the Zeppelin platform. The driver connnects to the standard NTB sub-system interface, with modification to add hooks for power management in a separate patch. The AMD NTB device has 3 memory windows, 16 doorbell, 16 scratch-pad registers, and supports up to 16 PCIe lanes running a Gen3 speeds. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-01-21[regression] fix braino in fs/dlm/user.cAl Viro
it's "bugger off if we got ERR_PTR", not the other way round... Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-21Merge branch 'uaccess' (batched user access infrastructure)Linus Torvalds
Expose an interface to allow users to mark several accesses together as being user space accesses, allowing batching of the surrounding user space access markers (SMAP on x86, PAN on arm64, domain register switching on arm). This is currently only used for the user string lenth and copying functions, where the SMAP overhead on x86 drowned the actual user accesses (only noticeable on newer microarchitectures that support SMAP in the first place, of course). * user access batching branch: Use the new batched user accesses in generic user string handling Add 'unsafe' user access functions for batched accesses x86: reorganize SMAP handling in user space accesses
2016-01-21NFS: Simplify nfs_request_add_commit_list() argumentsAnna Schumaker
I noticed that all the callers of this function pass cinfo->mds->list as an argument in addition to the cinfo structure itself. Let's get rid of the extra argument, since it doesn't seem to be adding anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-21pNFS/flexfiles: Improve merging of errors in LAYOUTRETURNTrond Myklebust
When we hit 22 errors, we start to overflow the memory buffers allocated to the LAYOUTRETURN errors. The issue is that currently, RPC call reply ordering determines how successful we are in merging errors that refer to contiguous READ or WRITE requests. Fix is to use an insertion sort to help detect contiguity. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third patch-bomb from Andrew Morton: "I'm pretty much done for -rc1 now: - the rest of MM, basically - lib/ updates - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit - cpu_mask simplifications - kexec, rapidio, MAINTAINERS etc, etc. - more dma-mapping cleanups/simplifications from hch" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) MAINTAINERS: add/fix git URLs for various subsystems mm: memcontrol: add "sock" to cgroup2 memory.stat mm: memcontrol: basic memory statistics in cgroup2 memory controller mm: memcontrol: do not uncharge old page in page cache replacement Documentation: cgroup: add memory.swap.{current,max} description mm: free swap cache aggressively if memcg swap is full mm: vmscan: do not scan anon pages if memcg swap limit is hit swap.h: move memcg related stuff to the end of the file mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online mm: vmscan: pass memcg to get_scan_count() mm: memcontrol: charge swap to cgroup2 mm: memcontrol: clean up alloc, online, offline, free functions mm: memcontrol: flatten struct cg_proto mm: memcontrol: rein in the CONFIG space madness net: drop tcp_memcontrol.c mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM mm: memcontrol: allow to disable kmem accounting for cgroup2 mm: memcontrol: account "kmem" consumers in cgroup2 memory controller mm: memcontrol: move kmem accounting code to CONFIG_MEMCG mm: memcontrol: separate kmem code from legacy tcp accounting code ...
2016-01-21Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This contains several bug fixes and a new mount option 'default_permissions' that allows read-only exported NFS filesystems to be used as lower layer" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: check dentry positiveness in ovl_cleanup_whiteouts() ovl: setattr: check permissions before copy-up ovl: root: copy attr ovl: move super block magic number to magic.h ovl: use a minimal buffer in ovl_copy_xattr ovl: allow zero size xattr ovl: default permissions
2016-01-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This adds SEEK_HOLE and SEEK_DATA support in lseek" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
2016-01-21Merge tag 'pci-v4.5-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.5 merge window: Enumeration: - Simplify config space size computation (Bjorn Helgaas) - Avoid iterating through ROM outside the resource window (Edward O'Callaghan) - Support PCIe devices with short cfg_size (Jason S. McMullan) - Add Netronome vendor and device IDs (Jason S. McMullan) - Limit config space size for Netronome NFP6000 family (Jason S. McMullan) - Add Netronome NFP4000 PF device ID (Simon Horman) - Limit config space size for Netronome NFP4000 (Simon Horman) - Print warnings for all invalid expansion ROM headers (Vladis Dronov) Resource management: - Fix minimum allocation address overwrite (Christoph Biedl) PCI device hotplug: - acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King) - pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck) - shpchp: Constify hpc_ops structure (Julia Lawall) - ibmphp: Remove unneeded NULL test (Julia Lawall) Power management: - Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski) Virtualization - Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander) MSI: - Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas) - Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko) - Initialize MSI capability for all architectures (Guilherme G. Piccoli) - Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang) ARM Versatile host bridge driver: - Remove unused pci_sys_data structures (Lorenzo Pieralisi) Broadcom iProc host bridge driver: - Hide CONFIG_PCIE_IPROC (Arnd Bergmann) - Do not use 0x in front of %pap (Dmitry V. Krivenok) - Update iProc PCIe device tree binding (Ray Jui) - Add PAXC interface support (Ray Jui) - Add iProc PCIe MSI device tree binding (Ray Jui) - Add iProc PCIe MSI support (Ray Jui) Freescale i.MX6 host bridge driver: - Use gpio_set_value_cansleep() (Fabio Estevam) - Add support for active-low reset GPIO (Petr Štetiar) HiSilicon host bridge driver: - Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni) Intel VMD host bridge driver: - Export irq_domain_set_info() for module use (Keith Busch) - x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch) - Use 32 bit PCI domain numbers (Keith Busch) - Add driver for Intel Volume Management Device (VMD) (Keith Busch) Qualcomm host bridge driver: - Document PCIe devicetree bindings (Stanimir Varbanov) - Add Qualcomm PCIe controller driver (Stanimir Varbanov) - dts: apq8064: add PCIe devicetree node (Stanimir Varbanov) - dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov) Renesas R-Car host bridge driver: - Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa) - Allow DT to override default window settings (Phil Edworthy) - Convert to DT resource parsing API (Phil Edworthy) - Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy) - Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy) - Add runtime PM support to pcie-rcar (Phil Edworthy) - Add Gen2 PHY setup to pcie-rcar (Phil Edworthy) - Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman) - Add gen2 fallback compatibility string for pcie-rcar (Simon Horman) Synopsys DesignWare host bridge driver: - Simplify control flow (Bjorn Helgaas) - Make config accessor override checking symmetric (Bjorn Helgaas) - Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov) Miscellaneous: - Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann) - Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas) - Fix all whitespace issues (Bogicevic Sasa) - x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang) - Use to_pci_dev() instead of open-coding it (Geliang Tang) - Use kobj_to_dev() instead of open-coding it (Geliang Tang) - Use list_for_each_entry() to simplify code (Geliang Tang) - Fix typos in <linux/msi.h> (Thomas Petazzoni) - x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki)" * tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 PCI: Limit config space size for Netronome NFP4000 PCI: Add Netronome NFP4000 PF device ID x86/PCI: Add driver for Intel Volume Management Device (VMD) PCI/AER: Use 32 bit PCI domain numbers x86/PCI: Allow DMA ops specific to a PCI domain irqdomain: Export irq_domain_set_info() for module use PCI: host: Add of_pci_get_host_bridge_resources() stub genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains PCI: rcar: Add Gen2 PHY setup to pcie-rcar PCI: rcar: Add runtime PM support to pcie-rcar PCI: designware: Make config accessor override checking symmetric PCI: ibmphp: Remove unneeded NULL test ARM: dts: ifc6410: enable PCIe DT node for this board ARM: dts: apq8064: add PCIe devicetree node PCI: hotplug: Use list_for_each_entry() to simplify code PCI: rcar: Remove unused pci_sys_data struct from pcie-rcar PCI: hisi: Add support for HiSilicon Hip06 PCIe host controllers PCI: Avoid iterating through memory outside the resource window PCI: acpiphp_ibm: Fix null dereferences on null ibm_slot ...
2016-01-21Merge tag 'pwm/for-4.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This set of changes contains a new driver for OMAP (using the dual-mode timers) as well as an assortment of fixes all across the board" * tag 'pwm/for-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: Mark all devices as "might sleep" pwm: omap-dmtimer: Potential NULL dereference on error pwm: add HAS_IOMEM dependency to PWM_FSL_FTM pwm: Add PWM driver for OMAP using dual-mode timers pwm: rcar: Improve accuracy of frequency division setting pwm: lpc32xx: return ERANGE, if requested period is not supported pwm: lpc32xx: fix and simplify duty cycle and period calculations pwm: lpc32xx: make device usable with common clock framework pwm: lpc32xx: correct number of PWM channels from 2 to 1 dt: lpc32xx: pwm: update documentation of LPC32xx PWM device dt: lpc32xx: pwm: correct LPC32xx PWM device node example pwm: fsl-ftm: Fix clock enable/disable when using PM pwm: lpss: Rework the sequence of programming PWM_SW_UPDATE pwm: lpss: Select core part automatically pwm: lpss: Update PWM setting for Broxton pwm: bcm2835: Fix email address specification pwm: bcm2835: Prevent division by zero pwm: bcm2835: Calculate scaler in ->config() pwm: lpss: Remove ->free() callback
2016-01-21Merge tag 'cris-for-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris Pull CRIS updates from Jesper Nilsson: "Just some fixups for section mismatches from Guenter" * tag 'cris-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: cris: Fix section mismatches in architecture startup code cris: debugport: Fix section mismatches
2016-01-21Merge tag 'for-4.5' of git://git.osdn.jp/gitroot/uclinux-h8/linuxLinus Torvalds
Pull h8300 updates from Yoshinori Sato: - Add KGDB support - zImage fix - various cleanup * tag 'for-4.5' of git://git.osdn.jp/gitroot/uclinux-h8/linux: h8300: System call entry enable interrupt. h8300: show_stack cleanup h8300: Restraint of warning. h8300: Add KGDB support. irqchip: renesas-h8s: Replace ctrl_outw/ctrl_inw with writew/readw h8300: signal stack fix h8300: Add LZO compression h8300: zImage alignment fix clk: h8300: Remove "sh73a0-" part from compatible value h8300: zImage alignment fix
2016-01-21libceph: remove outdated commentIlya Dryomov
MClientMount{,Ack} are long gone. The receipt of bare monmap doesn't actually indicate a mount success as we are yet to authenticate at that point in time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-01-21libceph: kill off ceph_x_ticket_handler::validityIlya Dryomov
With it gone, no need to preserve ceph_timespec in process_one_ticket() either. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21libceph: invalidate AUTH in addition to a service ticketIlya Dryomov
If we fault due to authentication, we invalidate the service ticket we have and request a new one - the idea being that if a service rejected our authorizer, it must have expired, despite mon_client's attempts at periodic renewal. (The other possibility is that our ticket is too new and the service hasn't gotten it yet, in which case invalidating isn't necessary but doesn't hurt.) Invalidating just the service ticket is not enough, though. If we assume a failure on mon_client's part to renew a service ticket, we have to assume the same for the AUTH ticket. If our AUTH ticket is bad, we won't get any service tickets no matter how hard we try, so invalidate AUTH ticket along with the service ticket. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21libceph: fix authorizer invalidation, take 2Ilya Dryomov
Back in 2013, commit 4b8e8b5d78b8 ("libceph: fix authorizer invalidation") tried to fix authorizer invalidation issues by clearing validity field. However, nothing ever consults this field, so it doesn't force us to request any new secrets in any way and therefore we never get out of the exponential backoff mode: [ 129.973812] libceph: osd2 192.168.122.1:6810 connect authorization failure [ 130.706785] libceph: osd2 192.168.122.1:6810 connect authorization failure [ 131.710088] libceph: osd2 192.168.122.1:6810 connect authorization failure [ 133.708321] libceph: osd2 192.168.122.1:6810 connect authorization failure [ 137.706598] libceph: osd2 192.168.122.1:6810 connect authorization failure ... AFAICT this was the case at the time 4b8e8b5d78b8 was merged, too. Using timespec solely as a bool isn't nice, so introduce a new have_key flag, specifically for this purpose. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21libceph: clear messenger auth_retry flag if we faultIlya Dryomov
Commit 20e55c4cc758 ("libceph: clear messenger auth_retry flag when we authenticate") got us only half way there. We clear the flag if the second attempt succeeds, but it also needs to be cleared if that attempt fails, to allow for the exponential backoff to kick in. Otherwise, if ->should_authenticate() thinks our keys are valid, we will busy loop, incrementing auth_retry to no avail: process_connect ffff880079a63830 got BADAUTHORIZER attempt 1 process_connect ffff880079a63830 got BADAUTHORIZER attempt 2 process_connect ffff880079a63830 got BADAUTHORIZER attempt 3 process_connect ffff880079a63830 got BADAUTHORIZER attempt 4 process_connect ffff880079a63830 got BADAUTHORIZER attempt 5 ... Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21libceph: fix ceph_msg_revoke()Ilya Dryomov
There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Cc: stable@vger.kernel.org # 4.0+ Reported-by: Matt Conner <matt.conner@keepertech.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Matt Conner <matt.conner@keepertech.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-01-21libceph: use list_for_each_entry_safeGeliang Tang
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> [idryomov@gmail.com: nuke call to list_splice_init() as well] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-01-21ceph: use i_size_{read,write} to get/set i_sizeYan, Zheng
Cap message from MDS can update i_size. In that case, we don't hold i_mutex. So it's unsafe to directly access inode->i_size while holding i_mutex. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21ceph: re-send AIO write request when getting -EOLDSNAP errorYan, Zheng
When receiving -EOLDSNAP from OSD, we need to re-send corresponding write request. Due to locking issue, we can send new request inside another OSD request's complete callback. So we use worker to re-send request for AIO write. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21ceph: Asynchronous IO supportYan, Zheng
The basic idea of AIO support is simple, just call kiocb::ki_complete() in OSD request's complete callback. But there are several special cases. when IO span multiple objects, we need to wait until all OSD requests are complete, then call kiocb::ki_complete(). Error handling in this case is tricky too. For simplify, AIO both span multiple objects and extends i_size are not allowed. Another special case is check EOF for reading (other client can write to the file and extend i_size concurrently). For simplify, the direct-IO/AIO code path does do the check, fallback to normal syn read instead. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21ceph: Avoid to propagate the invalid page pointMinfei Huang
The variant pagep will still get the invalid page point, although ceph fails in function ceph_update_writeable_page. To fix this issue, Assigne the page to pagep until there is no failure in function ceph_update_writeable_page. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21ceph: fix double page_unlock() in page_mkwrite()Yan, Zheng
ceph_update_writeable_page() unlocks the page on errors, so page_mkwrite() should not unlock the page again. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21rbd: delete an unnecessary check before rbd_dev_destroy()Markus Elfring
The rbd_dev_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-01-21libceph: use list_next_entry instead of list_entry_nextGeliang Tang
list_next_entry has been defined in list.h, so I replace list_entry_next with it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-01-21ceph: ceph_frag_contains_value can be booleanYaowei Bai
This patch makes ceph_frag_contains_value return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21ceph: remove unused functions in ceph_frag.hYaowei Bai
These functions were introduced in commit 3d14c5d2b ("ceph: factor out libceph from Ceph file system"). Howover, there's no user of these functions since then, so remove them for simplicity. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-01-21IB/mlx5: Unify CQ create flags checkLeon Romanovsky
The create_cq() can receive creation flags which were used differently by two commits which added create_cq extended command and cross-channel. The merged code caused to not accept any flags at all. This patch unifies the check into one function and one return error code. Fixes: 972ecb821379 ("IB/mlx5: Add create_cq extended command") Fixes: 051f263098a9 ("IB/mlx5: Add driver cross-channel support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Expose Raw Packet QP to user space consumersmajd@mellanox.com
Added Raw Packet QP modify functionality which will enable user space consumers to use it. Since Raw Packet QP is built of SQ and RQ sub-objects, therefore Raw Packet QP state changes are implemented by changing the state of the sub-objects. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21{IB, net}/mlx5: Move the modify QP operation table to mlx5_ibmajd@mellanox.com
When modifying a QP, the desired operation was determined in the mlx5_core using a transition table that takes the current state, the final state, and returns the desired operation. Since this logic will be used for Raw Packet QP, move the operation table to the mlx5_ib. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Support setting Ethernet priority for Raw Packet QPsmajd@mellanox.com
When the user changes the Address Vector(AV) in the modify QP, he provides an SL. This SL should be translated to Ethernet Priority by taking the 3 LSB bits, and modify the QP's TIS according to this Ethernet priority. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Add Raw Packet QP query functionalitymajd@mellanox.com
Since Raw Packet QP is composed of RQ and SQ, the IB QP's state is derived from the sub-objects. Therefore we need to query each one of the sub-objects, and decide on the IB QP's state. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Add create and destroy functionality for Raw Packet QPmajd@mellanox.com
This patch adds support for Raw Packet QP for the mlx5 device. Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object. Since the SQ and RQ work-queue (WQ) buffers are not contiguous like other QPs, we allocate separate buffers in the user-space and pass the address of each one of them separately to the kernel. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP typesmajd@mellanox.com
Extract specific IB QP fields to mlx5_ib_qp_trans structure. The mlx5_core QP object resides in mlx5_ib_qp_base, which all QP types inherit from. When we need to find mlx5_ib_qp using mlx5_core QP (event handling and co), we use a pointer that resides in mlx5_ib_qp_base. In addition, we delete all redundant fields that weren't used anywhere in the code: -doorbell_qpn -sq_max_wqes_per_wr -sq_spare_wqes Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Allocate a Transport Domain for each ucontextmajd@mellanox.com
Transport Domain groups several TIS and TIR object. By grouping these object, it defines wheather local loopback packets that are sent from the TIS objects in the group are received by the TIR objects in the same group. Allocate a Transport Domain(TD) for each user context to be used in the future by Raw Packet QP for Self-Loopback Control. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21net/mlx5_core: Warn on unsupported events of QP/RQ/SQmajd@mellanox.com
When an event arrives on QP/RQ/SQ, check whether it's supported, and print a warning message otherwise. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21net/mlx5_core: Add RQ and SQ event handlingmajd@mellanox.com
RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated events are affiliated also with SQs and RQs. Since SQ, RQ and QP resource numbers do not share the same name space, a queue type field was added to the event data to specify the SW object that the event is affiliated with. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21net/mlx5_core: Export transport objectsmajd@mellanox.com
To be used by mlx5_ib in the following patches for implementing RAW PACKET QP. Add mlx5_core_ prefix to alloc and delloc transport_domain since they are exposed now. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Expose CQE version to user-spaceHaggai Abramovsky
Per user context, work with CQE version that both the user-space and the kernel support. Report this CQE version via the response of the alloc_ucontext command. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Add CQE version 1 support to user QPs and SRQsHaggai Abramovsky
Enforce working with CQE version 1 when the user supports CQE version 1 and asked to work this way. If the user still works with CQE version 0, then use the default CQE version to tell the Firmware that the user still works in the older mode. After this patch, the kernel still reports CQE version 0. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontextHaggai Abramovsky
The wrong buffer size was passed to ib_is_udata_cleared. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-21IB/sa: Fix netlink local service GFP crashKaike Wan
The rdma netlink local service registers a handler to handle RESOLVE response and another handler to handle SET_TIMEOUT request. The first thing these handlers do is to call netlink_capable() to check the access right of the received skb to make sure that the sender has root access. Under normal conditions, such responses and requests will be directly forwarded to the handlers without going through the netlink_dump pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c). However, a user application could send a RESOLVE request (not response) to the local service, which will fall into the netlink_dump pathway, where a new skb will be created without initializing the control block. This new skb will be eventually forwarded to the local service RESOLVE response handler. Unfortunately, netlink_capable() will cause general protection fault if the skb's control block is not initialized. This patch will address the problem by checking the skb first. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>