summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-17s390/cio: Generalize the TIC handlerEric Farman
Refactor ccwchain_handle_tic() into a routine that handles a channel program address (which itself is a CCW pointer), rather than a CCW pointer that is only a TIC CCW. This will make it easier to reuse this code for other CCW commands. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190606202831.44135-4-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-06-17s390/cio: Refactor the routine that handles TIC CCWsEric Farman
Extract the "does the target of this TIC already exist?" check from ccwchain_handle_tic(), so that it's easier to refactor that function into one that cp_init() is able to use. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190606202831.44135-3-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-06-17s390/cio: Squash cp_free() and cp_unpin_free()Eric Farman
The routine cp_free() does nothing but call cp_unpin_free(), and while most places call cp_free() there is one caller of cp_unpin_free() used when the cp is guaranteed to have not been marked initialized. This seems like a dubious way to make a distinction, so let's combine these routines and make cp_free() do all the work. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190606202831.44135-2-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-06-17mmc: mediatek: fix SDIO IRQ detection issuejjian zhou
If cmd19 timeout or response crcerr occurs during execute_tuning(), it need invoke msdc_reset_hw(). Otherwise SDIO IRQ can't be detected. Signed-off-by: jjian zhou <jjian.zhou@mediatek.com> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> Signed-off-by: Yong Mao <yong.mao@mediatek.com> Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-06-17mmc: mediatek: fix SDIO IRQ interrupt handle flowjjian zhou
SDIO IRQ is triggered by low level. It need disable SDIO IRQ detected function. Otherwise the interrupt register can't be cleared. It will process the interrupt more. Signed-off-by: Jjian Zhou <jjian.zhou@mediatek.com> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> Signed-off-by: Yong Mao <yong.mao@mediatek.com> Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-06-17mmc: core: complete HS400 before checking statusWolfram Sang
We don't have a reproducible error case, yet our BSP team suggested that the mmc_switch_status() command in mmc_select_hs400() should come after the callback into the driver completing HS400 setup. It makes sense to me because we want the status of a fully setup HS400, so it will increase the reliability of the mmc_switch_status() command. Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-06-17arm64/mm: Correct the cache line size warning with non coherent deviceMasayoshi Mizuma
If the cache line size is greater than ARCH_DMA_MINALIGN (128), the warning shows and it's tainted as TAINT_CPU_OUT_OF_SPEC. However, it's not good because as discussed in the thread [1], the cpu cache line size will be problem only on non-coherent devices. Since the coherent flag is already introduced to struct device, show the warning only if the device is non-coherent device and ARCH_DMA_MINALIGN is smaller than the cpu cache size. [1] https://lore.kernel.org/linux-arm-kernel/20180514145703.celnlobzn3uh5tc2@localhost/ Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: Zhang Lei <zhang.lei@jp.fujitsu.com> [catalin.marinas@arm.com: removed 'if' block for WARN_TAINT] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-17riscv: mm: synchronize MMU after pte changeShihPo Hung
Because RISC-V compliant implementations can cache invalid entries in TLB, an SFENCE.VMA is necessary after changes to the page table. This patch adds an SFENCE.vma for the vmalloc_fault path. Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com> [paul.walmsley@sifive.com: reversed tab->whitespace conversion, wrapped comment lines] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org
2019-06-17x86/percpu: Optimize raw_cpu_xchg()Peter Zijlstra
Since raw_cpu_xchg() doesn't need to be IRQ-safe, like this_cpu_xchg(), we can use a simple load-store instead of the cmpxchg loop. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/percpu, sched/fair: Avoid local_clock()Peter Zijlstra
Nadav reported that code-gen changed because of the this_cpu_*() constraints, avoid this for select_idle_cpu() because that runs with preemption (and IRQs) disabled anyway. Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/percpu, x86/irq: Relax {set,get}_irq_regs()Peter Zijlstra
Nadav reported that since the this_cpu_*() ops got asm-volatile constraints on, code generation suffered for do_IRQ(), but since this is all with IRQs disabled we can use __this_cpu_*(). smp_x86_platform_ipi 234 222 -12,+0 smp_kvm_posted_intr_ipi 74 66 -8,+0 smp_kvm_posted_intr_wakeup_ipi 86 78 -8,+0 smp_apic_timer_interrupt 292 284 -8,+0 smp_kvm_posted_intr_nested_ipi 74 66 -8,+0 do_IRQ 195 187 -8,+0 Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/percpu: Relax smp_processor_id()Peter Zijlstra
Nadav reported that since this_cpu_read() became asm-volatile, many smp_processor_id() users generated worse code due to the extra constraints. However since smp_processor_id() is reading a stable value, we can use __this_cpu_read(). While this does reduce text size somewhat, this mostly results in code movement to .text.unlikely as a result of more/larger .cold. subfunctions. Less text on the hotpath is good for I$. $ ./compare.sh defconfig-build1 defconfig-build2 vmlinux.o setup_APIC_ibs 90 98 -12,+20 force_ibs_eilvt_setup 400 413 -57,+70 pci_serr_error 109 104 -54,+49 pci_serr_error 109 104 -54,+49 unknown_nmi_error 125 120 -76,+71 unknown_nmi_error 125 120 -76,+71 io_check_error 125 132 -97,+104 intel_thermal_interrupt 730 822 +92,+0 intel_init_thermal 951 945 -6,+0 generic_get_mtrr 301 294 -7,+0 generic_get_mtrr 301 294 -7,+0 generic_set_all 749 754 -44,+49 get_fixed_ranges 352 360 -41,+49 x86_acpi_suspend_lowlevel 369 363 -6,+0 check_tsc_sync_source 412 412 -71,+71 irq_migrate_all_off_this_cpu 662 674 -14,+26 clocksource_watchdog 748 748 -113,+113 __perf_event_account_interrupt 204 197 -7,+0 attempt_merge 1748 1741 -7,+0 intel_guc_send_ct 1424 1409 -15,+0 __fini_doorbell 235 231 -4,+0 bdw_set_cdclk 928 923 -5,+0 gen11_dsi_disable 1571 1556 -15,+0 gmbus_wait 493 488 -5,+0 md_make_request 376 369 -7,+0 __split_and_process_bio 543 536 -7,+0 delay_tsc 96 89 -7,+0 hsw_disable_pc8 696 691 -5,+0 tsc_verify_tsc_adjust 215 228 -22,+35 cpuidle_driver_unref 56 49 -7,+0 blk_account_io_completion 159 148 -11,+0 mtrr_wrmsr 95 99 -29,+33 __intel_wait_for_register_fw 401 419 +18,+0 cpuidle_driver_ref 43 36 -7,+0 cpuidle_get_driver 15 8 -7,+0 blk_account_io_done 535 528 -7,+0 irq_migrate_all_off_this_cpu 662 674 -14,+26 check_tsc_sync_source 412 412 -71,+71 irq_wait_for_poll 170 163 -7,+0 generic_end_io_acct 329 322 -7,+0 x86_acpi_suspend_lowlevel 369 363 -6,+0 nohz_balance_enter_idle 198 191 -7,+0 generic_start_io_acct 254 247 -7,+0 blk_account_io_start 341 334 -7,+0 perf_event_task_tick 682 675 -7,+0 intel_init_thermal 951 945 -6,+0 amd_e400_c1e_apic_setup 47 51 -28,+32 setup_APIC_eilvt 350 328 -22,+0 hsw_enable_pc8 1611 1605 -6,+0 total 12985947 12985892 -994,+939 Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()Peter Zijlstra
Nadav Amit reported that commit: b59167ac7baf ("x86/percpu: Fix this_cpu_read()") added a bunch of constraints to all sorts of code; and while some of that was correct and desired, some of that seems superfluous. The thing is, the this_cpu_*() operations are defined IRQ-safe, this means the values are subject to change from IRQs, and thus must be reloaded. Also, the generic form: local_irq_save() __this_cpu_read() local_irq_restore() would not allow the re-use of previous values; if by nothing else, then the barrier()s implied by local_irq_*(). Which raises the point that percpu_from_op() and the others also need that volatile. OTOH __this_cpu_*() operations are not IRQ-safe and assume external preempt/IRQ disabling and could thus be allowed more room for optimization. This makes the this_cpu_*() vs __this_cpu_*() behaviour more consistent with other architectures. $ ./compare.sh defconfig-build defconfig-build1 vmlinux.o x86_pmu_cancel_txn 80 71 -9,+0 __text_poke 919 964 +45,+0 do_user_addr_fault 1082 1058 -24,+0 __do_page_fault 1194 1178 -16,+0 do_exit 2995 3027 -43,+75 process_one_work 1008 989 -67,+48 finish_task_switch 524 505 -19,+0 __schedule_bug 103 98 -59,+54 __schedule_bug 103 98 -59,+54 __sched_setscheduler 2015 2030 +15,+0 freeze_processes 203 230 +31,-4 rcu_gp_kthread_wake 106 99 -7,+0 rcu_core 1841 1834 -7,+0 call_timer_fn 298 286 -12,+0 can_stop_idle_tick 146 139 -31,+24 perf_pending_event 253 239 -14,+0 shmem_alloc_page 209 213 +4,+0 __alloc_pages_slowpath 3284 3269 -15,+0 umount_tree 671 694 +23,+0 advance_transaction 803 798 -5,+0 con_put_char 71 51 -20,+0 xhci_urb_enqueue 1302 1295 -7,+0 xhci_urb_enqueue 1302 1295 -7,+0 tcp_sacktag_write_queue 2130 2075 -55,+0 tcp_try_undo_loss 229 208 -21,+0 tcp_v4_inbound_md5_hash 438 411 -31,+4 tcp_v4_inbound_md5_hash 438 411 -31,+4 tcp_v6_inbound_md5_hash 469 411 -33,-25 tcp_v6_inbound_md5_hash 469 411 -33,-25 restricted_pointer 434 420 -14,+0 irq_exit 162 154 -8,+0 get_perf_callchain 638 624 -14,+0 rt_mutex_trylock 169 156 -13,+0 avc_has_extended_perms 1092 1089 -3,+0 avc_has_perm_noaudit 309 306 -3,+0 __perf_sw_event 138 122 -16,+0 perf_swevent_get_recursion_context 116 102 -14,+0 __local_bh_enable_ip 93 72 -21,+0 xfrm_input 4175 4161 -14,+0 avc_has_perm 446 443 -3,+0 vm_events_fold_cpu 57 56 -1,+0 vfree 68 61 -7,+0 freeze_processes 203 230 +31,-4 _local_bh_enable 44 30 -14,+0 ip_do_fragment 1982 1944 -38,+0 do_exit 2995 3027 -43,+75 __do_softirq 742 724 -18,+0 cpu_init 1510 1489 -21,+0 account_system_time 80 79 -1,+0 total 12985281 12984819 -742,+280 Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20181206112433.GB13675@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17MAINTAINERS: Update my email address to use @kernel.orgWill Deacon
My @arm.com address will stop working at the end of August, so update to my @kernel.org address where you'll still be able to reach me. When I say "stop working" I really mean "will go to my line manager", so send patches there at your peril because they may reply with roadmaps and spreadsheets. You have been warned. Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: arm-soc <arm@kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-17locking/rwsem: Guard against making count negativeWaiman Long
The upper bits of the count field is used as reader count. When sufficient number of active readers are present, the most significant bit will be set and the count becomes negative. If the number of active readers keep on piling up, we may eventually overflow the reader counts. This is not likely to happen unless the number of bits reserved for reader count is reduced because those bits are need for other purpose. To prevent this count overflow from happening, the most significant bit is now treated as a guard bit (RWSEM_FLAG_READFAIL). Read-lock attempts will now fail for both the fast and slow paths whenever this bit is set. So all those extra readers will be put to sleep in the wait list. Wakeup will not happen until the reader count reaches 0. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-17-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Adaptive disabling of reader optimistic spinningWaiman Long
Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It makes readers relatively more preferred than writers. When a writer times out spinning on a reader-owned lock and set the nospinnable bits, there are two main reasons for that. 1) The reader critical section is long, perhaps the task sleeps after acquiring the read lock. 2) There are just too many readers contending the lock causing it to take a while to service all of them. In the former case, long reader critical section will impede the progress of writers which is usually more important for system performance. In the later case, reader optimistic spinning tends to make the reader groups that contain readers that acquire the lock together smaller leading to more of them. That may hurt performance in some cases. In other words, the setting of nonspinnable bits indicates that reader optimistic spinning may not be helpful for those workloads that cause it. Therefore, any writers that have observed the setting of the writer nonspinnable bit for a given rwsem after they fail to acquire the lock via optimistic spinning will set the reader nonspinnable bit once they acquire the write lock. Similarly, readers that observe the setting of reader nonspinnable bit at slowpath entry will also set the reader nonspinnable bit when they acquire the read lock via the wakeup path. Once the reader nonspinnable bit is on, it will only be reset when a writer is able to acquire the rwsem in the fast path or somehow a reader or writer in the slowpath doesn't observe the nonspinable bit. This is to discourage reader optmistic spinning on that particular rwsem and make writers more preferred. This adaptive disabling of reader optimistic spinning will alleviate some of the negative side effect of this feature. In addition, this patch tries to make readers in the spinning queue follow the phase-fair principle after quitting optimistic spinning by checking if another reader has somehow acquired a read lock after this reader enters the optimistic spinning queue. If so and the rwsem is still reader-owned, this reader is in the right read-phase and can attempt to acquire the lock. On a 2-socket 40-core 80-thread Skylake system, the page_fault1 test of the will-it-scale benchmark was run with various number of threads. The number of operations done before reader optimistic spinning patches, this patch and after this patch were: Threads Before rspin Before patch After patch %change ------- ------------ ------------ ----------- ------- 20 5541068 5345484 5455667 -3.5%/ +2.1% 40 10185150 7292313 9219276 -28.5%/+26.4% 60 8196733 6460517 7181209 -21.2%/+11.2% 80 9508864 6739559 8107025 -29.1%/+20.3% This patch doesn't recover all the lost performance, but it is more than half. Given the fact that reader optimistic spinning does benefit some workloads, this is a good compromise. Using the rwsem locking microbenchmark with very short critical section, this patch doesn't have too much impact on locking performance as shown by the locking rates (kops/s) below with equal numbers of readers and writers before and after this patch: # of Threads Pre-patch Post-patch ------------ --------- ---------- 2 4,730 4,969 4 4,814 4,786 8 4,866 4,815 16 4,715 4,511 32 3,338 3,500 64 3,212 3,389 80 3,110 3,044 When running the locking microbenchmark with 40 dedicated reader and writer threads, however, the reader performance is curtailed to favor the writer. Before patch: 40 readers, Iterations Min/Mean/Max = 204,026/234,309/254,816 40 writers, Iterations Min/Mean/Max = 88,515/95,884/115,644 After patch: 40 readers, Iterations Min/Mean/Max = 33,813/35,260/36,791 40 writers, Iterations Min/Mean/Max = 95,368/96,565/97,798 Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-16-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Enable time-based spinning on reader-owned rwsemWaiman Long
When the rwsem is owned by reader, writers stop optimistic spinning simply because there is no easy way to figure out if all the readers are actively running or not. However, there are scenarios where the readers are unlikely to sleep and optimistic spinning can help performance. This patch provides a simple mechanism for spinning on a reader-owned rwsem by a writer. It is a time threshold based spinning where the allowable spinning time can vary from 10us to 25us depending on the condition of the rwsem. When the time threshold is exceeded, the nonspinnable bits will be set in the owner field to indicate that no more optimistic spinning will be allowed on this rwsem until it becomes writer owned again. Not even readers is allowed to acquire the reader-locked rwsem by optimistic spinning for fairness. We also want a writer to acquire the lock after the readers hold the lock for a relatively long time. In order to give preference to writers under such a circumstance, the single RWSEM_NONSPINNABLE bit is now split into two - one for reader and one for writer. When optimistic spinning is disabled, both bits will be set. When the reader count drop down to 0, the writer nonspinnable bit will be cleared to allow writers to spin on the lock, but not the readers. When a writer acquires the lock, it will write its own task structure pointer into sem->owner and clear the reader nonspinnable bit in the process. The time taken for each iteration of the reader-owned rwsem spinning loop varies. Below are sample minimum elapsed times for 16 iterations of the loop. System Time for 16 Iterations ------ ---------------------- 1-socket Skylake ~800ns 4-socket Broadwell ~300ns 2-socket ThunderX2 (arm64) ~250ns When the lock cacheline is contended, we can see up to almost 10X increase in elapsed time. So 25us will be at most 500, 1300 and 1600 iterations for each of the above systems. With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 8-socket IvyBridge-EX system with equal numbers of readers and writers before and after this patch were as follows: # of Threads Pre-patch Post-patch ------------ --------- ---------- 2 1,759 6,684 4 1,684 6,738 8 1,074 7,222 16 900 7,163 32 458 7,316 64 208 520 128 168 425 240 143 474 This patch gives a big boost in performance for mixed reader/writer workloads. With 32 locking threads, the rwsem lock event data were: rwsem_opt_fail=79850 rwsem_opt_nospin=5069 rwsem_opt_rlock=597484 rwsem_opt_wlock=957339 rwsem_sleep_reader=57782 rwsem_sleep_writer=55663 With 64 locking threads, the data looked like: rwsem_opt_fail=346723 rwsem_opt_nospin=6293 rwsem_opt_rlock=1127119 rwsem_opt_wlock=1400628 rwsem_sleep_reader=308201 rwsem_sleep_writer=72281 So a lot more threads acquired the lock in the slowpath and more threads went to sleep. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-15-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Make rwsem->owner an atomic_long_tWaiman Long
The rwsem->owner contains not just the task structure pointer, it also holds some flags for storing the current state of the rwsem. Some of the flags may have to be atomically updated. To reflect the new reality, the owner is now changed to an atomic_long_t type. New helper functions are added to properly separate out the task structure pointer and the embedded flags. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-14-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Enable readers spinning on writerWaiman Long
This patch enables readers to optimistically spin on a rwsem when it is owned by a writer instead of going to sleep directly. The rwsem_can_spin_on_owner() function is extracted out of rwsem_optimistic_spin() and is called directly by rwsem_down_read_slowpath() and rwsem_down_write_slowpath(). With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 8-socket IvyBrige-EX system with equal numbers of readers and writers before and after the patch were as follows: # of Threads Pre-patch Post-patch ------------ --------- ---------- 4 1,674 1,684 8 1,062 1,074 16 924 900 32 300 458 64 195 208 128 164 168 240 149 143 The performance change wasn't significant in this case, but this change is required by a follow-on patch. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-13-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Clarify usage of owner's nonspinaable bitWaiman Long
Bit 1 of sem->owner (RWSEM_ANONYMOUSLY_OWNED) is used to designate an anonymous owner - readers or an anonymous writer. The setting of this anonymous bit is used as an indicator that optimistic spinning cannot be done on this rwsem. With the upcoming reader optimistic spinning patches, a reader-owned rwsem can be spinned on for a limit period of time. We still need this bit to indicate a rwsem is nonspinnable, but not setting this bit loses its meaning that the owner is known. So rename the bit to RWSEM_NONSPINNABLE to clarify its meaning. This patch also fixes a DEBUG_RWSEMS_WARN_ON() bug in __up_write(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-12-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Wake up almost all readers in wait queueWaiman Long
When the front of the wait queue is a reader, other readers immediately following the first reader will also be woken up at the same time. However, if there is a writer in between. Those readers behind the writer will not be woken up. Because of optimistic spinning, the lock acquisition order is not FIFO anyway. The lock handoff mechanism will ensure that lock starvation will not happen. Assuming that the lock hold times of the other readers still in the queue will be about the same as the readers that are being woken up, there is really not much additional cost other than the additional latency due to the wakeup of additional tasks by the waker. Therefore all the readers up to a maximum of 256 in the queue are woken up when the first waiter is a reader to improve reader throughput. This is somewhat similar in concept to a phase-fair R/W lock. With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 8-socket IvyBridge-EX system with equal numbers of readers and writers before and after this patch were as follows: # of Threads Pre-Patch Post-patch ------------ --------- ---------- 4 1,641 1,674 8 731 1,062 16 564 924 32 78 300 64 38 195 240 50 149 There is no performance gain at low contention level. At high contention level, however, this patch gives a pretty decent performance boost. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-11-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: More optimal RT task handling of null ownerWaiman Long
An RT task can do optimistic spinning only if the lock holder is actually running. If the state of the lock holder isn't known, there is a possibility that high priority of the RT task may block forward progress of the lock holder if it happens to reside on the same CPU. This will lead to deadlock. So we have to make sure that an RT task will not spin on a reader-owned rwsem. When the owner is temporarily set to NULL, there are two cases where we may want to continue spinning: 1) The lock owner is in the process of releasing the lock, sem->owner is cleared but the lock has not been released yet. 2) The lock was free and owner cleared, but another task just comes in and acquire the lock before we try to get it. The new owner may be a spinnable writer. So an RT task is now made to retry one more time to see if it can acquire the lock or continue spinning on the new owning writer. When testing on a 8-socket IvyBridge-EX system, the one additional retry seems to improve locking performance of RT write locking threads under heavy contentions. The table below shows the locking rates (in kops/s) with various write locking threads before and after the patch. Locking threads Pre-patch Post-patch --------------- --------- ----------- 4 2,753 2,608 8 2,529 2,520 16 1,727 1,918 32 1,263 1,956 64 889 1,343 Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-10-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Always release wait_lock before waking up tasksWaiman Long
With the use of wake_q, we can do task wakeups without holding the wait_lock. There is one exception in the rwsem code, though. It is when the writer in the slowpath detects that there are waiters ahead but the rwsem is not held by a writer. This can lead to a long wait_lock hold time especially when a large number of readers are to be woken up. Remediate this situation by releasing the wait_lock before waking up tasks and re-acquiring it afterward. The rwsem_try_write_lock() function is also modified to read the rwsem count directly to avoid stale count value. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-9-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Implement lock handoff to prevent lock starvationWaiman Long
Because of writer lock stealing, it is possible that a constant stream of incoming writers will cause a waiting writer or reader to wait indefinitely leading to lock starvation. This patch implements a lock handoff mechanism to disable lock stealing and force lock handoff to the first waiter or waiters (for readers) in the queue after at least a 4ms waiting period unless it is a RT writer task which doesn't need to wait. The waiting period is used to avoid discouraging lock stealing too much to affect performance. The setting and clearing of the handoff bit is serialized by the wait_lock. So racing is not possible. A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core 80-thread Skylake system with a v5.1 based kernel and 240 write_lock threads with 5us sleep critical section. Before the patch, the min/mean/max numbers of locking operations for the locking threads were 1/7,792/173,696. After the patch, the figures became 5,842/6,542/7,458. It can be seen that the rwsem became much more fair, though there was a drop of about 16% in the mean locking operations done which was a tradeoff of having better fairness. Making the waiter set the handoff bit right after the first wakeup can impact performance especially with a mixed reader/writer workload. With the same microbenchmark with short critical section and equal number of reader and writer threads (40/40), the reader/writer locking operation counts with the current patch were: 40 readers, Iterations Min/Mean/Max = 1,793/1,794/1,796 40 writers, Iterations Min/Mean/Max = 1,793/34,956/86,081 By making waiter set handoff bit immediately after wakeup: 40 readers, Iterations Min/Mean/Max = 43/44/46 40 writers, Iterations Min/Mean/Max = 43/1,263/3,191 Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-8-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Make rwsem_spin_on_owner() return owner stateWaiman Long
This patch modifies rwsem_spin_on_owner() to return four possible values to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-7-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Code cleanup after files mergingWaiman Long
After merging all the relevant rwsem code into one single file, there are a number of optimizations and cleanups that can be done: 1) Remove all the EXPORT_SYMBOL() calls for functions that are not accessed elsewhere. 2) Remove all the __visible tags as none of the functions will be called from assembly code anymore. 3) Make all the internal functions static. 4) Remove some unneeded blank lines. 5) Remove the intermediate rwsem_down_{read|write}_failed*() functions and rename __rwsem_down_{read|write}_failed_common() to rwsem_down_{read|write}_slowpath(). 6) Remove "__" prefix of __rwsem_mark_wake(). 7) Use atomic_long_try_cmpxchg_acquire() as much as possible. 8) Remove the rwsem_rtrylock and rwsem_wtrylock lock events as they are not that useful. That enables the compiler to do better optimization and reduce code size. The text+data size of rwsem.o on an x86-64 machine with gcc8 was reduced from 10237 bytes to 5030 bytes with this change. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-6-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.cWaiman Long
Now we only have one implementation of rwsem. Even though we still use xadd to handle reader locking, we use cmpxchg for writer instead. So the filename rwsem-xadd.c is not strictly correct. Also no one outside of the rwsem code need to know the internal implementation other than function prototypes for two internal functions that are called directly from percpu-rwsem.c. So the rwsem-xadd.c and rwsem.h files are now merged into rwsem.c in the following order: <upper part of rwsem.h> <rwsem-xadd.c> <lower part of rwsem.h> <rwsem.c> The rwsem.h file now contains only 2 function declarations for __up_read() and __down_read(). This is a code relocation patch with no code change at all except making __up_read() and __down_read() non-static functions so they can be used by percpu-rwsem.c. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-5-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Implement a new locking schemeWaiman Long
The current way of using various reader, writer and waiting biases in the rwsem code are confusing and hard to understand. I have to reread the rwsem count guide in the rwsem-xadd.c file from time to time to remind myself how this whole thing works. It also makes the rwsem code harder to be optimized. To make rwsem more sane, a new locking scheme similar to the one in qrwlock is now being used. The atomic long count has the following bit definitions: Bit 0 - writer locked bit Bit 1 - waiters present bit Bits 2-7 - reserved for future extension Bits 8-X - reader count (24/56 bits) The cmpxchg instruction is now used to acquire the write lock. The read lock is still acquired with xadd instruction, so there is no change here. This scheme will allow up to 16M/64P active readers which should be more than enough. We can always use some more reserved bits if necessary. With that change, we can deterministically know if a rwsem has been write-locked. Looking at the count alone, however, one cannot determine for certain if a rwsem is owned by readers or not as the readers that set the reader count bits may be in the process of backing out. So we still need the reader-owned bit in the owner field to be sure. With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) of the benchmark on a 8-socket 120-core IvyBridge-EX system before and after the patch were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 30,659 31,341 31,055 31,283 2 8,909 16,457 9,884 17,659 4 9,028 15,823 8,933 20,233 8 8,410 14,212 7,230 17,140 16 8,217 25,240 7,479 24,607 The locking rates of the benchmark on a Power8 system were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 12,963 13,647 13,275 13,601 2 7,570 11,569 7,902 10,829 4 5,232 5,516 5,466 5,435 8 5,233 3,386 5,467 3,168 The locking rates of the benchmark on a 2-socket ARM64 system were as follows: Before Patch After Patch # of Threads wlock rlock wlock rlock ------------ ----- ----- ----- ----- 1 21,495 21,046 21,524 21,074 2 5,293 10,502 5,333 10,504 4 5,325 11,463 5,358 11,631 8 5,391 11,712 5,470 11,680 The performance are roughly the same before and after the patch. There are run-to-run variations in performance. Runs with higher variances usually have higher throughput. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-4-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Remove rwsem_wake() wakeup optimizationWaiman Long
After the following commit: 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write()") the rwsem_wake() forgoes doing a wakeup if the wait_lock cannot be directly acquired and an optimistic spinning locker is present. This can help performance by avoiding spinning on the wait_lock when it is contended. With the later commit: 133e89ef5ef3 ("locking/rwsem: Enable lockless waiter wakeup(s)") the performance advantage of the above optimization diminishes as the average wait_lock hold time become much shorter. With a later patch that supports rwsem lock handoff, we can no longer relies on the fact that the presence of an optimistic spinning locker will ensure that the lock will be acquired by a task soon and rwsem_wake() will be called later on to wake up waiters. This can lead to missed wakeup and application hang. So the original 59aabfc7e959 commit has to be reverted. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-3-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNERWaiman Long
The owner field in the rw_semaphore structure is used primarily for optimistic spinning. However, identifying the rwsem owner can also be helpful in debugging as well as tracing locking related issues when analyzing crash dump. The owner field may also store state information that can be important to the operation of the rwsem. So the owner field is now made a permanent member of the rw_semaphore structure irrespective of CONFIG_RWSEM_SPIN_ON_OWNER. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-2-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/fpu: Remove the fpu__save() exportChristoph Hellwig
This function is only use by the core FPU code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604071524.12835-4-hch@lst.de
2019-06-17x86/fpu: Simplify kernel_fpu_begin()Christoph Hellwig
Merge two helpers into the main function, remove a pointless local variable and flatten a conditional. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190604071524.12835-3-hch@lst.de
2019-06-17sched/fair: Don't push cfs_bandwith slack timers forwardbsegall@google.com
When a cfs_rq sleeps and returns its quota, we delay for 5ms before waking any throttled cfs_rqs to coalesce with other cfs_rqs going to sleep, as this has to be done outside of the rq lock we hold. The current code waits for 5ms without any sleeps, instead of waiting for 5ms from the first sleep, which can delay the unthrottle more than we want. Switch this around so that we can't push this forward forever. This requires an extra flag rather than using hrtimer_active, since we need to start a new timer if the current one is in the process of finishing. Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Acked-by: Phil Auld <pauld@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/xm26a7euy6iq.fsf_-_@bsegall-linux.svl.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17sched/core: Optimize try_to_wake_up() for local wakeupsPeter Zijlstra
Jens reported that significant performance can be had on some block workloads by special casing local wakeups. That is, wakeups on the current task before it schedules out. Given something like the normal wait pattern: for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (cond) break; schedule(); } __set_current_state(TASK_RUNNING); Any wakeup (on this CPU) after set_current_state() and before schedule() would benefit from this. Normal wakeups take p->pi_lock, which serializes wakeups to the same task. By eliding that we gain concurrency on: - ttwu_stat(); we already had concurrency on rq stats, this now also brings it to task stats. -ENOCARE - tracepoints; it is now possible to get multiple instances of trace_sched_waking() (and possibly trace_sched_wakeup()) for the same task. Tracers will have to learn to cope. Furthermore, p->pi_lock is used by set_special_state(), to order against TASK_RUNNING stores from other CPUs. But since this is strictly CPU local, we don't need the lock, and set_special_state()'s disabling of IRQs is sufficient. After the normal wakeup takes p->pi_lock it issues smp_mb__after_spinlock(), in order to ensure the woken task must observe prior stores before we observe the p->state. If this is CPU local, this will be satisfied with a compiler barrier, and we rely on try_to_wake_up() being a funcation call, which implies such. Since, when 'p == current', 'p->on_rq' must be true, the normal wakeup would continue into the ttwu_remote() branch, which normally is concerned with exactly this wakeup scenario, except from a remote CPU. IOW we're waking a task that is still running. In this case, we can trivially avoid taking rq->lock, all that's left from this is to set p->state. This then yields an extremely simple and fast path for 'p == current'. Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: gkohli@codeaurora.org Cc: hch@lst.de Cc: oleg@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17sched/fair: Fix "runnable_avg_yN_inv" not used warningsQian Cai
runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was included in several other places because they need other macros all came from kernel/sched/sched-pelt.h which was generated by Documentation/scheduler/sched-pelt. As the result, it causes compilation a lot of warnings, kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] ... Silence it by appending the __maybe_unused attribute for it, so all generated variables and macros can still be kept in the same file. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17sched/fair: Clean up definition of NOHZ blocked load functionsValentin Schneider
cfs_rq_has_blocked() and others_have_blocked() are only used within update_blocked_averages(). The !CONFIG_FAIR_GROUP_SCHED version of the latter calls them within a #define CONFIG_NO_HZ_COMMON block, whereas the CONFIG_FAIR_GROUP_SCHED one calls them unconditionnally. As reported by Qian, the above leads to this warning in !CONFIG_NO_HZ_COMMON configs: kernel/sched/fair.c: In function 'update_blocked_averages': kernel/sched/fair.c:7750:7: warning: variable 'done' set but not used [-Wunused-but-set-variable] It wouldn't be wrong to keep cfs_rq_has_blocked() and others_have_blocked() as they are, but since their only current use is to figure out when we can stop calling update_blocked_averages() on fully decayed NOHZ idle CPUs, we can give them a new definition for !CONFIG_NO_HZ_COMMON. Change the definition of cfs_rq_has_blocked() and others_have_blocked() for !CONFIG_NO_HZ_COMMON so that the NOHZ-specific blocks of update_blocked_averages() become no-ops and the 'done' variable gets optimised out. While at it, remove the CONFIG_NO_HZ_COMMON block from the !CONFIG_FAIR_GROUP_SCHED definition of update_blocked_averages() by using the newly-introduced update_blocked_load_status() helper. No change in functionality intended. [ Additions by Peter Zijlstra. ] Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190603115424.7951-1-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17sched/core: Add __sched tag for io_schedule()Gao Xiang
Non-inline io_schedule() was introduced in: commit 10ab56434f2f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()") Keep in line with io_schedule_timeout(), otherwise "/proc/<pid>/wchan" will report io_schedule() rather than its callers when waiting for IO. Reported-by: Jilong Kou <koujilong@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miao Xie <miaoxie@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 10ab56434f2f ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()") Link: https://lkml.kernel.org/r/20190603091338.2695-1-gaoxiang25@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17Merge tag 'v5.2-rc5' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17perf/core: Fix perf_sample_regs_user() mm checkPeter Zijlstra
perf_sample_regs_user() uses 'current->mm' to test for the presence of userspace, but this is insufficient, consider use_mm(). A better test is: '!(current->flags & PF_KTHREAD)', exec() clears PF_KTHREAD after it sets the new ->mm but before it drops to userspace for the first time. Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reported-by: Young Xiao <92siuyang@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/atomic: Fix smp_mb__{before,after}_atomic()Peter Zijlstra
Recent probing at the Linux Kernel Memory Model uncovered a 'surprise'. Strongly ordered architectures where the atomic RmW primitive implies full memory ordering and smp_mb__{before,after}_atomic() are a simple barrier() (such as x86) fail for: *x = 1; atomic_inc(u); smp_mb__after_atomic(); r0 = *y; Because, while the atomic_inc() implies memory order, it (surprisingly) does not provide a compiler barrier. This then allows the compiler to re-order like so: atomic_inc(u); *x = 1; smp_mb__after_atomic(); r0 = *y; Which the CPU is then allowed to re-order (under TSO rules) like: atomic_inc(u); r0 = *y; *x = 1; And this very much was not intended. Therefore strengthen the atomic RmW ops to include a compiler barrier. NOTE: atomic_{or,and,xor} and the bitops already had the compiler barrier. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/lockdep: Remove unnecessary DEBUG_LOCKS_WARN_ON()Kobe Wu
DEBUG_LOCKS_WARN_ON() will turn off debug_locks and makes print_unlock_imbalance_bug() return directly. Remove a redundant whitespace. Signed-off-by: Kobe Wu <kobe-cp.wu@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <linux-mediatek@lists.infradead.org> Cc: <wsd_upstream@mediatek.com> Cc: Eason Lin <eason-yh.lin@mediatek.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/1559217575-30298-1-git-send-email-kobe-cp.wu@mediatek.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17locking/lockdep: Rename lockdep_assert_held_exclusive() -> ↵Nikolay Borisov
lockdep_assert_held_write() All callers of lockdep_assert_held_exclusive() use it to verify the correct locking state of either a semaphore (ldisc_sem in tty, mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by apparmor. Thus it makes sense to rename _exclusive to _write since that's the semantics callers care. Additionally there is already lockdep_assert_held_read(), which this new naming is more consistent with. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/jump_label: Batch jump label updatesDaniel Bristot de Oliveira
Currently, the jump label of a static key is transformed via the arch specific function: void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type) The new approach (batch mode) uses two arch functions, the first has the same arguments of the arch_jump_label_transform(), and is the function: bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_type type) Rather than transforming the code, it adds the jump_entry in a queue of entries to be updated. This functions returns true in the case of a successful enqueue of an entry. If it returns false, the caller must to apply the queue and then try to queue again, for instance, because the queue is full. This function expects the caller to sort the entries by the address before enqueueuing then. This is already done by the arch independent code, though. After queuing all jump_entries, the function: void arch_jump_label_transform_apply(void) Applies the changes in the queue. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/57b4caa654bad7e3b066301c9a9ae233dea065b5.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17jump_label: Batch updates if arch supports itDaniel Bristot de Oliveira
If the architecture supports the batching of jump label updates, use it! An easy way to see the benefits of this patch is switching the schedstats on and off. For instance: -------------------------- %< ---------------------------- #!/bin/sh while [ true ]; do sysctl -w kernel.sched_schedstats=1 sleep 2 sysctl -w kernel.sched_schedstats=0 sleep 2 done -------------------------- >% ---------------------------- while watching the IPI count: -------------------------- %< ---------------------------- # watch -n1 "cat /proc/interrupts | grep Function" -------------------------- >% ---------------------------- With the current mode, it is possible to see +- 168 IPIs each 2 seconds, while with this patch the number of IPIs goes to 3 each 2 seconds. Regarding the performance impact of this patch set, I made two measurements: The time to update a key (the task that is causing the change) The time to run the int3 handler (the side effect on a thread that hits the code being changed) The schedstats static key was chosen as the key to being switched on and off. The reason being is that it is used in more than 56 places, in a hot path. The change in the schedstats static key will be done with the following command: while [ true ]; do sysctl -w kernel.sched_schedstats=1 usleep 500000 sysctl -w kernel.sched_schedstats=0 usleep 500000 done In this way, they key will be updated twice per second. To force the hit of the int3 handler, the system will also run a kernel compilation with two jobs per CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor @2.27GHz. Regarding the update part, on average, the regular kernel takes 57 ms to update the schedstats key, while the kernel with the batch updates takes just 1.4 ms on average. Although it seems to be too good to be true, it makes sense: the schedstats key is used in 56 places, so it was expected that it would take around 56 times to update the keys with the current implementation, as the IPIs are the most expensive part of the update. Regarding the int3 handler, the non-batch handler takes 45 ns on average, while the batch version takes around 180 ns. At first glance, it seems to be a high value. But it is not, considering that it is doing 56 updates, rather than one! It is taking four times more, only. This gain is possible because the patch uses a binary search in the vector: log2(56)=5.8. So, it was expected to have an overhead within four times. (voice of tv propaganda) But, that is not all! As the int3 handler keeps on for a shorter period (because the update part is on for a shorter time), the number of hits in the int3 handler decreased by 10%. The question then is: Is it worth paying the price of "135 ns" more in the int3 handler? Considering that, in this test case, we are saving the handling of 53 IPIs, that takes more than these 135 ns, it seems to be a meager price to be paid. Moreover, the test case was forcing the hit of the int3, in practice, it does not take that often. While the IPI takes place on all CPUs, hitting the int3 handler or not! For instance, in an isolated CPU with a process running in user-space (nohz_full use-case), the chances of hitting the int3 handler is barely zero, while there is no way to avoid the IPIs. By bounding the IPIs, we are improving a lot this scenario. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/acc891dbc2dbc9fd616dd680529a2337b1d1274c.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/alternative: Batch of patch operationsDaniel Bristot de Oliveira
Currently, the patch of an address is done in three steps: -- Pseudo-code #1 - Current implementation --- 1) add an int3 trap to the address that will be patched sync cores (send IPI to all other CPUs) 2) update all but the first byte of the patched range sync cores (send IPI to all other CPUs) 3) replace the first byte (int3) by the first byte of replacing opcode sync cores (send IPI to all other CPUs) -- Pseudo-code #1 --- When a static key has more than one entry, these steps are called once for each entry. The number of IPIs then is linear with regard to the number 'n' of entries of a key: O(n*3), which is O(n). This algorithm works fine for the update of a single key. But we think it is possible to optimize the case in which a static key has more than one entry. For instance, the sched_schedstats jump label has 56 entries in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in which the thread that is enabling the key is _not_ running. With this patch, rather than receiving a single patch to be processed, a vector of patches is passed, enabling the rewrite of the pseudo-code #1 in this way: -- Pseudo-code #2 - This patch --- 1) for each patch in the vector: add an int3 trap to the address that will be patched sync cores (send IPI to all other CPUs) 2) for each patch in the vector: update all but the first byte of the patched range sync cores (send IPI to all other CPUs) 3) for each patch in the vector: replace the first byte (int3) by the first byte of replacing opcode sync cores (send IPI to all other CPUs) -- Pseudo-code #2 - This patch --- Doing the update in this way, the number of IPI becomes O(3) with regard to the number of keys, which is O(1). The batch mode is done with the function text_poke_bp_batch(), that receives two arguments: a vector of "struct text_to_poke", and the number of entries in the vector. The vector must be sorted by the addr field of the text_to_poke structure, enabling the binary search of a handler in the poke_int3_handler function (a fast path). Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/ca506ed52584c80f64de23f6f55ca288e5d079de.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17jump_label: Sort entries of the same key by the codeDaniel Bristot de Oliveira
In the batching mode, all the entries of a given key are updated at once. During the update of a key, a hit in the int3 handler will check if the hitting code address belongs to one of these keys. To optimize the search of a given code in the vector of entries being updated, a binary search is used. The binary search relies on the order of the entries of a key by its code. Hence the keys need to be sorted by the code too, so sort the entries of a given key by the code. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f57ae83e0592418ba269866bb7ade570fc8632e0.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17x86/jump_label: Add a __jump_label_set_jump_code() helperDaniel Bristot de Oliveira
Move the definition of the code to be written from __jump_label_transform() to a specialized function. No functional change. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/d2f52a0010ecd399cf9b02a65bcf5836571b9e52.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17jump_label: Add a jump_label_can_update() helperDaniel Bristot de Oliveira
Move the check if a jump_entry is valid to a function. No functional change. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris von Recklinghausen <crecklin@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/56b69bd3f8e644ed64f2dbde7c088030b8cbe76b.1560325897.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17Merge tag 'v5.2-rc5' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17Merge tag 'v5.2-rc5' into x86/asm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>