summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-08-14gpio: Cleanup kerneldocThierry Reding
Some kerneldoc has become stale or wasn't quite correct from the outset. Fix up the most serious issues to silence warnings when building the documentation. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-08-14pinctrl: move const qualifier before structMasahiro Yamada
Update subsystem wide for consistency. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-08-13Merge tag 'tty-4.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are two tty serial driver fixes for 4.13-rc5. One is a revert of a -rc1 patch that turned out to not be a good idea, and the other is a fix for the pl011 serial driver. Both have been in linux-next with no reported issues" * tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "serial: Delete dead code for CIR serial ports" tty: pl011: fix initialization order of QDF2400 E44
2017-08-13Merge tag 'staging-4.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/iio fixes from Greg KH: "Here are some Staging and IIO driver fixes for 4.13-rc5. Nothing major, just a number of small fixes for reported issues. All of these have been in linux-next for a while now with no reported issues. Full details are in the shortlog" * tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING iio: aspeed-adc: wait for initial sequence. iio: accel: bmc150: Always restore device to normal mode after suspend-resume staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0 iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits iio: accel: st_accel: add SPI-3wire support iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications" iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable iio: pressure: st_pressure_core: disable multiread by default for LPS22HB iio: light: tsl2563: use correct event code
2017-08-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull SCSI target fixes from Nicholas Bellinger: "The highlights include: - Fix iscsi-target payload memory leak during ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash) - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT (Pascal de Bruijn + Himanshu Madhani + nab) - Fix iscsi-target long-standing issue with parallel delete of a single network portal across multiple target instances (Gary Guo + nab) - Fix target dynamic se_node GPF during uncached shutdown regression (Justin Maggard + nab)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix node_acl demo-mode + uncached dynamic shutdown regression iscsi-target: Fix iscsi_np reset hung task during parallel delete qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2) cxgbit: fix sg_nents calculation iscsi-target: fix invalid flags in text response iscsi-target: fix memory leak in iscsit_setup_text_cmd() cxgbit: add missing __kfree_skb() tcmu: free old string on reconfig tcmu: Fix possible to/from address overflow when doing the memcpy
2017-08-12iio: trigger: stm32-timer: add output compare triggersFabrice Gasnier
Add output compare trigger sources available on some instances. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-08-12iio: trigger: stm32-timer: add support for STM32H7Fabrice Gasnier
Add support for STM32H7 timer triggers: - Add new valids_table - Introduce compatible, with configuration data - Extend up to 15 timers, available on STM32H7 Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-08-11udp: harden copy_linear_skb()Eric Dumazet
syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs. Issue here is that recvfrom() can be used with user buffer of Z bytes, and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following condition : Z < X < Y kernel BUG at mm/usercopy.c:72! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ #16 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000 RIP: 0010:report_usercopy mm/usercopy.c:64 [inline] RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264 RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286 RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000 RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09 RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000 FS: 0000000001360880(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: check_object_size include/linux/thread_info.h:108 [inline] check_copy_size include/linux/thread_info.h:139 [inline] copy_to_iter include/linux/uio.h:105 [inline] copy_linear_skb include/net/udp.h:371 [inline] udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793 sock_recvmsg_nosec net/socket.c:792 [inline] sock_recvmsg+0xc9/0x110 net/socket.c:799 SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788 SyS_recvfrom+0x40/0x50 net/socket.c:1760 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11net: fix compilation when busy poll is not enabledDaniel Borkmann
MIN_NAPI_ID is used in various places outside of CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set we run into build errors such as: net/core/dev.c: In function 'dev_get_by_napi_id': net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function) if (napi_id < MIN_NAPI_ID) ^~~~~~~~~~~ Thus, have MIN_NAPI_ID always defined to fix these errors. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11bonding: require speed/duplex only for 802.3ad, alb and tlbAndreas Born
The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A set of fixes that should go into this series. This contains: - Fix from Bart for blk-mq requeue queue running, preventing a continued loop of run/restart. - Fix for a bio/blk-integrity issue, in two parts. One from Christoph, fixing where verification happens, and one from Milan, for a NULL profile. - NVMe pull request, most of the changes being for nvme-fc, but also a few trivial core/pci fixes" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: fix directive command numd calculation nvme: fix nvme reset command timeout handling nvme-pci: fix CMB sysfs file removal in reset path lpfc: support nvmet_fc defer_rcv callback nvmet_fc: add defer_req callback for deferment of cmd buffer return nvme: strip trailing 0-bytes in wwid_show block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time bio-integrity: only verify integrity on the lowest stacked driver bio-integrity: Fix regression if profile verify_fn is NULL
2017-08-11Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linusJens Axboe
Pull NVMe fixes from Christoph: "A few more small fixes - the fc/lpfc update is the biggest by far."
2017-08-11Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/mm_types.h mm/huge_memory.c I removed the smp_mb__before_spinlock() like the following commit does: 8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()") and fixed up the affected commits. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11drm: Document device unplug infrastructureDaniel Vetter
While at it, also ocd and give them a consistent drm_dev_ prefix, like the other device instance functionality. Plus move the functions into the right places. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170802115604.12734-3-daniel.vetter@ffwll.ch
2017-08-11drm: Extract drm_device.hDaniel Vetter
I need this to untangle an include loop in the next patch. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170802115604.12734-2-daniel.vetter@ffwll.ch
2017-08-10Merge tag 'drm-fixes-for-v4.13-rc5' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Nothing too earth shattering here, it just seems like lots of little things all over the place. msm has probably the larger amount of changes, but they all seem fine, otherwise, some rockchip, i915, etnaviv and exynos fixes, along with one nouveau regression fix for some older GPUs" * tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits) drm/nouveau/disp/nv04: avoid creation of output paths drm: make DRM_STM default n drm/exynos: forbid creating framebuffers from too small GEM buffers drm/etnaviv: Fix off-by-one error in reloc checking drm/i915: fix backlight invert for non-zero minimum brightness drm/i915/shrinker: Wrap need_resched() inside preempt-disable drm/i915/perf: fix flex eu registers programming drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8 drm/i915/gvt: Initialize MMIO Block with HW state drm/rockchip: vop: report error when check resource error drm/rockchip: vop: round_up pitches to word align drm/rockchip: vop: fix NV12 video display error drm/rockchip: vop: fix iommu page fault when resume drm/i915/gvt: clean workload queue if error happened drm/i915/gvt: change resetting to resetting_eng drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM drm/msm/adreno: Prevent unclocked access when retrieving timestamps drm/msm: Remove __user from __u64 data types ...
2017-08-11PM / s2idle: Rename platform operations structureRafael J. Wysocki
Rename struct platform_freeze_ops to platform_s2idle_ops to make it clear that the callbacks in it are used during suspend-to-idle suspend/resume transitions and rename the related functions, variables and so on accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-11PM / s2idle: Rename ->enter_freeze to ->enter_s2idleRafael J. Wysocki
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle to make it clear that it is used for entering suspend-to-idle and rename the related functions, variables and so on accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-11PM / s2idle: Rename freeze_state enum and related itemsRafael J. Wysocki
Rename the freeze_state enum representing the suspend-to-idle state machine states to s2idle_states and rename the related variables and functions accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-11PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLERafael J. Wysocki
To make it clear that the symbol in question refers to suspend-to-idle, rename it from PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage zram: rework copy of compressor name in comp_algorithm_store() rmap: do not call mmu_notifier_invalidate_page() under ptl mm: fix list corruptions on shmem shrinklist mm/balloon_compaction.c: don't zero ballooned pages MAINTAINERS: copy virtio on balloon_compaction.c mm: fix KSM data corruption mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem mm: make tlb_flush_pending global mm: refactor TLB gathering API Revert "mm: numa: defer TLB flush for THP migration as long as possible" mm: migrate: fix barriers around tlb_flush_pending mm: migrate: prevent racy access to tlb_flush_pending fault-inject: fix wrong should_fail() decision in task context test_kmod: fix small memory leak on filesystem tests test_kmod: fix the lock in register_test_dev_kmod() test_kmod: fix bug which allows negative values on two config options test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case mm: ratelimit PFNs busy info message ...
2017-08-10mm: fix MADV_[FREE|DONTNEED] TLB flush miss problemMinchan Kim
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman: "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, forcefully, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather page table entry. I confirmed this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com [4] https://patchwork.kernel.org/patch/9861621/ [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu] Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Reported-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: make tlb_flush_pending globalMinchan Kim
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING| COMPACTION] but upcoming patches to solve subtle TLB flush batching problem will use it regardless of compaction/NUMA so this patch doesn't remove the dependency. [akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement] Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: refactor TLB gathering APIMinchan Kim
This patch is a preparatory patch for solving race problems caused by TLB batch. For that, we will increase/decrease TLB flush pending count of mm_struct whenever tlb_[gather|finish]_mmu is called. Before making it simple, this patch separates architecture specific part and rename it to arch_tlb_[gather|finish]_mmu and generic part just calls it. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: migrate: fix barriers around tlb_flush_pendingNadav Amit
Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). To deal with this case, a new parameter is added to mm_tlb_flush_pending() to indicate if it is read without the page-table lock taken, and calling smp_mb__after_unlock_lock() in this case. Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.com Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: migrate: prevent racy access to tlb_flush_pendingNadav Amit
Patch series "fixes of TLB batching races", v6. It turns out that Linux TLB batching mechanism suffers from various races. Races that are caused due to batching during reclamation were recently handled by Mel and this patch-set deals with others. The more fundamental issue is that concurrent updates of the page-tables allow for TLB flushes to be batched on one core, while another core changes the page-tables. This other core may assume a PTE change does not require a flush based on the updated PTE value, while it is unaware that TLB flushes are still pending. This behavior affects KSM (which may result in memory corruption) and MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior). A proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED. Memory corruption in KSM is harder to produce in practice, but was observed by hacking the kernel and adding a delay before flushing and replacing the KSM page. Finally, there is also one memory barrier missing, which may affect architectures with weak memory model. This patch (of 7): Setting and clearing mm->tlb_flush_pending can be performed by multiple threads, since mmap_sem may only be acquired for read in task_numa_work(). If this happens, tlb_flush_pending might be cleared while one of the threads still changes PTEs and batches TLB flushes. This can lead to the same race between migration and change_protection_range() that led to the introduction of tlb_flush_pending. The result of this race was data corruption, which means that this patch also addresses a theoretically possible data corruption. An actual data corruption was not observed, yet the race was was confirmed by adding assertion to check tlb_flush_pending is not set by two threads, adding artificial latency in change_protection_range() and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com Fixes: 20841405940e ("mm: fix TLB flush race between migration, and change_protection_range") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10Merge tag 'pci-v4.13-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Work around Renesas uPD72020x 32-bit DMA issue" * tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue PCI: Add pci_reset_function_locked()
2017-08-10Merge branch 'rdma-netlink' into k.o/merge-testDoug Ledford
Conflicts: include/rdma/ib_verbs.h - Modified a function signature adjacent to a newly added function signature from a previous merge Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-testDoug Ledford
Conflicts: drivers/infiniband/hw/mlx5/main.c - Both add new code include/rdma/ib_verbs.h - Both add new code Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10drm/i915: Implement .get_format_info() hook for CCSVille Syrjälä
SKL+ display engine can scan out certain kinds of compressed surfaces produced by the render engine. This involved telling the display engine the location of the color control surfae (CCS) which describes which parts of the main surface are compressed and which are not. The location of CCS is provided by userspace as just another plane with its own offset. By providing our own format information for the CCS formats, we should be able to make framebuffer_check() do the right thing for the CCS surface as well. Note that we'll return the same format info for both Y and Yf tiled format as that's what happens with the non-CCS Y vs. Yf as well. If desired, we could potentially return a unique pointer for each pixel_format+tiling+ccs combination, in which case we immediately be able to tell if any of that stuff changed by just comparing the pointers. But that does sound a bit wasteful space wise. v2: Drop the 'dev' argument from the hook v3: Include the description of the CCS surface layout v4: Pretend CCS tiles are regular 128 byte wide Y tiles (Jason) v5: Re-drop 'dev', fix commit message, add missing drm_fourcc.h description of CCS layout. (daniels) Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3) Reviewed-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Ville Syrjä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Stone <daniels@collabora.com>
2017-08-10Merge airlied/drm-next into drm-intel-next-queuedDaniel Vetter
Ben Widawsky/Daniel Stone need the extended modifier support from drm-misc to be able to merge CCS support for i915.ko Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-08-10irq: Make the irqentry text section unconditionalMasami Hiramatsu
Generate irqentry and softirqentry text sections without any Kconfig dependencies. This will add extra sections, but there should be no performace impact. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: David S . Miller <davem@davemloft.net> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-arch@vger.kernel.org Cc: linux-cris-kernel@axis.com Cc: mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10sched/fair: Fix wake_affine() for !NUMA_BALANCINGPeter Zijlstra
In commit: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") Rik changed wake_affine to consider NUMA information when balancing between LLC domains. There are a number of problems here which this patch tries to address: - LLC < NODE; in this case we'd use the wrong information to balance - !NUMA_BALANCING: in this case, the new code doesn't do any balancing at all - re-computes the NUMA data for every wakeup, this can mean iterating up to 64 CPUs for every wakeup. - default affine wakeups inside a cache We address these by saving the load/capacity values for each sched_domain during regular load-balance and using these values in wake_affine_llc(). The obvious down-side to using cached values is that they can be too old and poorly reflect reality. But this way we can use LLC wide information and thus not rely on assuming LLC matches NODE. We also don't rely on NUMA_BALANCING nor do we have to aggegate two nodes (or even cache domains) worth of CPUs for each wakeup. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") [ Minor readability improvements. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Apply crossrelease to completionsByungchul Park
Although wait_for_completion() and its family can cause deadlock, the lock correctness validator could not be applied to them until now, because things like complete() are usually called in a different context from the waiting context, which violates lockdep's assumption. Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep detector to those completion operations. Applied it. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-10-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Handle non(or multi)-acquisition of a crosslockByungchul Park
No acquisition might be in progress on commit of a crosslock. Completion operations enabling crossrelease are the case like: CONTEXT X CONTEXT Y --------- --------- trigger completion context complete AX commit AX wait_for_complete AX acquire AX wait where AX is a crosslock. When no acquisition is in progress, we should not perform commit because the lock does not exist, which might cause incorrect memory access. So we have to track the number of acquisitions of a crosslock to handle it. Moreover, in case that more than one acquisition of a crosslock are overlapped like: CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z --------- --------- --------- --------- acquire AX (gen_id: 1) acquire A acquire AX (gen_id: 10) acquire B commit AX acquire C commit AX where A, B and C are typical locks and AX is a crosslock. Current crossrelease code performs commits in Y and Z with gen_id = 10. However, we can use gen_id = 1 to do it, since not only 'acquire AX in X' but 'acquire AX in W' also depends on each acquisition in Y and Z until their commits. So make it use gen_id = 1 instead of 10 on their commits, which adds an additional dependency 'AX -> A' in the example above. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Detect and handle hist_lock ring buffer overwriteByungchul Park
The ring buffer can be overwritten by hardirq/softirq/work contexts. That cases must be considered on rollback or commit. For example, |<------ hist_lock ring buffer size ----->| ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii wrapped > iiiiiiiiiiiiiiiiiiiiiii.................... where 'p' represents an acquisition in process context, 'i' represents an acquisition in irq context. On irq exit, crossrelease tries to rollback idx to original position, but it should not because the entry already has been invalid by overwriting 'i'. Avoid rollback or commit for entries overwritten. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Implement the 'crossrelease' featureByungchul Park
Lockdep is a runtime locking correctness validator that detects and reports a deadlock or its possibility by checking dependencies between locks. It's useful since it does not report just an actual deadlock but also the possibility of a deadlock that has not actually happened yet. That enables problems to be fixed before they affect real systems. However, this facility is only applicable to typical locks, such as spinlocks and mutexes, which are normally released within the context in which they were acquired. However, synchronization primitives like page locks or completions, which are allowed to be released in any context, also create dependencies and can cause a deadlock. So lockdep should track these locks to do a better job. The 'crossrelease' implementation makes these primitives also be tracked. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Rework FS_RECLAIM annotationPeter Zijlstra
A while ago someone, and I cannot find the email just now, asked if we could not implement the RECLAIM_FS inversion stuff with a 'fake' lock like we use for other things like workqueues etc. I think this should be possible which allows reducing the 'irq' states and will reduce the amount of __bfs() lookups we do. Removing the 1 IRQ state results in 4 less __bfs() walks per dependency, improving lockdep performance. And by moving this annotation out of the lockdep code it becomes easier for the mm people to extend. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: iamjoonsoo.kim@lge.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking: Remove smp_mb__before_spinlock()Peter Zijlstra
Now that there are no users of smp_mb__before_spinlock() left, remove it entirely. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking: Introduce smp_mb__after_spinlock()Peter Zijlstra
Since its inception, our understanding of ACQUIRE, esp. as applied to spinlocks, has changed somewhat. Also, I wonder if, with a simple change, we cannot make it provide more. The problem with the comment is that the STORE done by spin_lock isn't itself ordered by the ACQUIRE, and therefore a later LOAD can pass over it and cross with any prior STORE, rendering the default WMB insufficient (pointed out by Alan). Now, this is only really a problem on PowerPC and ARM64, both of which already defined smp_mb__before_spinlock() as a smp_mb(). At the same time, we can get a much stronger construct if we place that same barrier _inside_ the spin_lock(). In that case we upgrade the RCpc spinlock to an RCsc. That would make all schedule() calls fully transitive against one another. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10mm, locking: Rework {set,clear,mm}_tlb_flush_pending()Peter Zijlstra
Commit: af2c1401e6f9 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates") added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we can solve the same problem without this barrier. If instead we mandate that mm_tlb_flush_pending() is used while holding the PTL we're guaranteed to observe prior set_tlb_flush_pending() instances. For this to work we need to rework migrate_misplaced_transhuge_page() a little and move the test up into do_huge_pmd_numa_page(). NOTE: this relies on flush_tlb_range() to guarantee: (1) it ensures that prior page table updates are visible to the page table walker and (2) it ensures that subsequent memory accesses are only made visible after the invalidation has completed This is required for architectures that implement TRANSPARENT_HUGEPAGE (arc, arm, arm64, mips, powerpc, s390, sparc, x86) or otherwise use mm_tlb_flush_pending() in their page-table operations (arm, arm64, x86). This appears true for: - arm (DSB ISB before and after), - arm64 (DSB ISHST before, and DSB ISH after), - powerpc (PTESYNC before and after), - s390 and x86 TLB invalidate are serializing instructions But I failed to understand the situation for: - arc, mips, sparc Now SPARC64 is a wee bit special in that flush_tlb_range() is a no-op and it flushes the TLBs using arch_{enter,leave}_lazy_mmu_mode() inside the PTL. It still needs to guarantee the PTL unlock happens _after_ the invalidate completes. Vineet, Ralf and Dave could you guys please have a look? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10jump_label: Provide hotplug context variantsMarc Zyngier
As using the normal static key API under the hotplug lock is pretty much impossible, let's provide a variant of some of them that require the hotplug lock to have already been taken. These function are only meant to be used in CPU hotplug callbacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170801080257.5056-4-marc.zyngier@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10cpuset: Make nr_cpusets privatePaolo Bonzini
Any use of key->enabled (that is static_key_enabled and static_key_count) outside jump_label_lock should handle its own serialization. In the case of cpusets_enabled_key, the key is always incremented/decremented under cpuset_mutex, and hence the same rule applies to nr_cpusets. The rule *is* respected currently, but the mutex is static so nr_cpusets should be static too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Zefan Li <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1501601046-35683-4-git-send-email-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10jump_label: Fix concurrent static_key_enable/disable()Paolo Bonzini
static_key_enable/disable are trying to cap the static key count to 0/1. However, their use of key->enabled is outside jump_label_lock so they do not really ensure that. Rewrite them to do a quick check for an already enabled (respectively, already disabled), and then recheck under the jump label lock. Unlike static_key_slow_inc/dec, a failed check under the jump label lock does not modify key->enabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1501601046-35683-2-git-send-email-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/rwsem-xadd: Add killable versions of rwsem_down_read_failed()Kirill Tkhai
Rename rwsem_down_read_failed() in __rwsem_down_read_failed_common() and teach it to abort waiting in case of pending signals and killable state argument passed. Note, that we shouldn't wake anybody up in EINTR path, as: We check for (waiter.task) under spinlock before we go to out_nolock path. Current task wasn't able to be woken up, so there are a writer, owning the sem, or a writer, which is the first waiter. In the both cases we shouldn't wake anybody. If there is a writer, owning the sem, and we were the only waiter, remove RWSEM_WAITING_BIAS, as there are no waiters anymore. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: avagin@virtuozzo.com Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: gorcunov@virtuozzo.com Cc: heiko.carstens@de.ibm.com Cc: hpa@zytor.com Cc: ink@jurassic.park.msu.ru Cc: mattst88@gmail.com Cc: rth@twiddle.net Cc: schwidefsky@de.ibm.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/149789534632.9059.2901382369609922565.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/rwsem-spinlock: Add killable versions of __down_read()Kirill Tkhai
Rename __down_read() in __down_read_common() and teach it to abort waiting in case of pending signals and killable state argument passed. Note, that we shouldn't wake anybody up in EINTR path, as: We check for signal_pending_state() after (!waiter.task) test and under spinlock. So, current task wasn't able to be woken up. It may be in two cases: a writer is owner of the sem, or a writer is a first waiter of the sem. If a writer is owner of the sem, no one else may work with it in parallel. It will wake somebody, when it call up_write() or downgrade_write(). If a writer is the first waiter, it will be woken up, when the last active reader releases the sem, and sem->count became 0. Also note, that set_current_state() may be moved down to schedule() (after !waiter.task check), as all assignments in this type of semaphore (including wake_up), occur under spinlock, so we can't miss anything. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: avagin@virtuozzo.com Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: gorcunov@virtuozzo.com Cc: heiko.carstens@de.ibm.com Cc: hpa@zytor.com Cc: ink@jurassic.park.msu.ru Cc: mattst88@gmail.com Cc: rth@twiddle.net Cc: schwidefsky@de.ibm.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/149789533283.9059.9829416940494747182.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/atomic: Fix atomic_set_release() for 'funny' architecturesPeter Zijlstra
Those architectures that have a special atomic_set implementation also need a special atomic_set_release(), because for the very same reason WRITE_ONCE() is broken for them, smp_store_release() is too. The vast majority is architectures that have spinlock hash based atomic implementation except hexagon which seems to have a hardware 'feature'. The spinlock based atomics should be SC, that is, none of them appear to place extra barriers in atomic_cmpxchg() or any of the other SC atomic primitives and therefore seem to rely on their spinlock implementation being SC (I did not fully validate all that). Therefore, the normal atomic_set() is SC and can be used at atomic_set_release(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: davem@davemloft.net Cc: james.hogan@imgtec.com Cc: jejb@parisc-linux.org Cc: rkuo@codeaurora.org Cc: vgupta@synopsys.com Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10RDMA/netlink: Export node_typeLeon Romanovsky
Add ability to get node_type for RDAM netlink users. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10RDMA/netlink: Provide port state and physical link stateLeon Romanovsky
Add port state and physical link state to the users of RDMA netlink. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>