summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2017-11-08cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freqViresh Kumar
'cached_raw_freq' is used to get the next frequency quickly but should always be in sync with sg_policy->next_freq. There is a case where it is not and in such cases it should be reset to avoid switching to incorrect frequencies. Consider this case for example: - policy->cur is 1.2 GHz (Max) - New request comes for 780 MHz and we store that in cached_raw_freq. - Based on 780 MHz, we calculate the effective frequency as 800 MHz. - We then see the CPU wasn't idle recently and choose to keep the next freq as 1.2 GHz. - Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz. - Now if the utilization doesn't change in then next request, then the next target frequency will still be 780 MHz and it will match with cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here. Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.12+ <stable@vger.kernel.org> # 4.12+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08PM / s2idle: Clear the events_check_enabled flagRajat Jain
Problem: This flag does not get cleared currently in the suspend or resume path in the following cases: * In case some driver's suspend routine returns an error. * Successful s2idle case * etc? Why is this a problem: What happens is that the next suspend attempt could fail even though the user did not enable the flag by writing to /sys/power/wakeup_count. This is 1 use case how the issue can be seen (but similar use case with driver suspend failure can be thought of): 1. Read /sys/power/wakeup_count 2. echo count > /sys/power/wakeup_count 3. echo freeze > /sys/power/wakeup_count 4. Let the system suspend, and wakeup the system using some wake source that calls pm_wakeup_event() e.g. power button or something. 5. Note that the combined wakeup count would be incremented due to the pm_wakeup_event() in the resume path. 6. After resuming the events_check_enabled flag is still set. At this point if the user attempts to freeze again (without writing to /sys/power/wakeup_count), the suspend would fail even though there has been no wake event since the past resume. Address that by clearing the flag just before a resume is completed, so that it is always cleared for the corner cases mentioned above. Signed-off-by: Rajat Jain <rajatja@google.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08stop using '%pK' for /proc/kallsyms pointer valuesLinus Torvalds
Not only is it annoying to have one single flag for all pointers, as if that was a global choice and all kernel pointers are the same, but %pK can't get the 'access' vs 'open' time check right anyway. So make the /proc/kallsyms pointer value code use logic specific to that particular file. We do continue to honor kptr_restrict, but the default (which is unrestricted) is changed to instead take expected users into account, and restrict access by default. Right now the only actual expected user is kernel profiling, which has a separate sysctl flag for kernel profile access. There may be others. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-08module: export module signature enforcement statusBruno E. O. Meneguele
A static variable sig_enforce is used as status var to indicate the real value of CONFIG_MODULE_SIG_FORCE, once this one is set the var will hold true, but if the CONFIG is not set the status var will hold whatever value is present in the module.sig_enforce kernel cmdline param: true when =1 and false when =0 or not present. Considering this cmdline param take place over the CONFIG value when it's not set, other places in the kernel could misbehave since they would have only the CONFIG_MODULE_SIG_FORCE value to rely on. Exporting this status var allows the kernel to rely in the effective value of module signature enforcement, being it from CONFIG value or cmdline param. Signed-off-by: Bruno E. O. Meneguele <brdeoliv@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2017-11-08rcu: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Lockdep now has an integrated IRQs disabled/enabled sanity check. Just use it instead of the ad-hoc RCU version. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-15-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-13-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-12-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08irq_work: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-11-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08irq/timings: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-10-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08perf/core: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-9-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08smp/core: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-7-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08timers/hrtimer: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-6-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08timers/nohz: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-5-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08workqueue: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1509980490-4285-4-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08irq/softirqs: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-3-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08locking/pvqspinlock: Implement hybrid PV queued/unfair locksWaiman Long
Currently, all the lock waiters entering the slowpath will do one lock stealing attempt to acquire the lock. That helps performance, especially in VMs with over-committed vCPUs. However, the current pvqspinlocks still don't perform as good as unfair locks in many cases. On the other hands, unfair locks do have the problem of lock starvation that pvqspinlocks don't have. This patch combines the best attributes of an unfair lock and a pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair mode. A lock waiter goes into the unfair mode when there are waiters in the wait queue but the pending bit isn't set. Otherwise, it will go into the queued mode waiting in the queue for its turn. On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build (make -j<n>) was done in a VM with unpinned vCPUs 3 times with the best time selected and <n> is the number of vCPUs available. The build times of the original pvqspinlock, hybrid pvqspinlock and unfair lock with various number of vCPUs are as follows: vCPUs pvqlock hybrid pvqlock unfair lock ----- ------- -------------- ----------- 30 342.1s 329.1s 329.1s 36 314.1s 305.3s 307.3s 45 345.0s 302.1s 306.6s 54 365.4s 308.6s 307.8s 72 358.9s 293.6s 303.9s 108 343.0s 285.9s 304.2s The hybrid pvqspinlock performs better or comparable to the unfair lock. By turning on QUEUED_LOCK_STAT, the table below showed the number of lock acquisitions in unfair mode and queue mode after a kernel build with various number of vCPUs. vCPUs queued mode unfair mode ----- ----------- ----------- 30 9,130,518 294,954 36 10,856,614 386,809 45 8,467,264 11,475,373 54 6,409,987 19,670,855 72 4,782,063 25,712,180 It can be seen that as the VM became more and more over-committed, the ratio of locks acquired in unfair mode increases. This is all done automatically to get the best overall performance as possible. Using a kernel locking microbenchmark with number of locking threads equals to the number of vCPUs available on the same machine, the minimum, average and maximum (min/avg/max) numbers of locking operations done per thread in a 5-second testing interval are shown below: vCPUs hybrid pvqlock unfair lock ----- -------------- ----------- 36 822,135/881,063/950,363 75,570/313,496/ 690,465 54 542,435/581,664/625,937 35,460/204,280/ 457,172 72 397,500/428,177/499,299 17,933/150,679/ 708,001 108 257,898/288,150/340,871 3,085/181,176/1,257,109 It can be seen that the hybrid pvqspinlocks are more fair and performant than the unfair locks in this test. The table below shows the kernel build times on a smaller 2-socket 16-core 32-thread E5-2620 v4 system. vCPUs pvqlock hybrid pvqlock unfair lock ----- ------- -------------- ----------- 16 436.8s 433.4s 435.6s 36 366.2s 364.8s 364.5s 48 423.6s 376.3s 370.2s 64 433.1s 376.6s 376.8s Again, the performance of the hybrid pvqspinlock was comparable to that of the unfair lock. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pagesTom Lendacky
In order for memory pages to be properly mapped when SEV is active, it's necessary to use the PAGE_KERNEL protection attribute as the base protection. This ensures that memory mapping of, e.g. ACPI tables, receives the proper mapping attributes. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: kvm@vger.kernel.org Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171020143059.3291-11-brijesh.singh@amd.com
2017-11-07resource: Provide resource struct in resource walk callbackTom Lendacky
In preperation for a new function that will need additional resource information during the resource walk, update the resource walk callback to pass the resource structure. Since the current callback start and end arguments are pulled from the resource structure, the callback functions can obtain them from the resource structure directly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lkml.kernel.org/r/20171020143059.3291-10-brijesh.singh@amd.com
2017-11-07resource: Consolidate resource walking codeTom Lendacky
The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range() functions each have much of the same code. Create a new function that consolidates the common code from these functions in one place to reduce the amount of duplicated code. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20171020143059.3291-9-brijesh.singh@amd.com
2017-11-07locking/rwlocks: Fix commentsCheng Jian
- fix the list of locking API headers in kernel/locking/spinlock.c - fix an #endif comment Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: huawei.libin@huawei.com Cc: xiexiuqi@huawei.com Link: http://lkml.kernel.org/r/1509706788-152547-1-git-send-email-cj.chengjian@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07kprobes, x86/alternatives: Use text_mutex to protect smp_alt_modulesZhou Chengming
We use alternatives_text_reserved() to check if the address is in the fixed pieces of alternative reserved, but the problem is that we don't hold the smp_alt mutex when call this function. So the list traversal may encounter a deleted list_head if another path is doing alternatives_smp_module_del(). One solution is that we can hold smp_alt mutex before call this function, but the difficult point is that the callers of this functions, arch_prepare_kprobe() and arch_prepare_optimized_kprobe(), are called inside the text_mutex. So we must hold smp_alt mutex before we go into these arch dependent code. But we can't now, the smp_alt mutex is the arch dependent part, only x86 has it. Maybe we can export another arch dependent callback to solve this. But there is a simpler way to handle this problem. We can reuse the text_mutex to protect smp_alt_modules instead of using another mutex. And all the arch dependent checks of kprobes are inside the text_mutex, so it's safe now. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Fixes: 2cfa197 "ftrace/alternatives: Introducing *_text_reserved functions" Link: http://lkml.kernel.org/r/1509585501-79466-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07Merge branch 'linus' into x86/apic, to resolve conflictsIngo Molnar
Conflicts: arch/x86/include/asm/x2apic.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07Merge branch 'linus' into perf/core, to fix conflictsIngo Molnar
Conflicts: tools/perf/arch/arm/annotate/instructions.c tools/perf/arch/arm64/annotate/instructions.c tools/perf/arch/powerpc/annotate/instructions.c tools/perf/arch/s390/annotate/instructions.c tools/perf/arch/x86/tests/intel-cqm.c tools/perf/ui/tui/progress.c tools/perf/util/zlib.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-06Merge branch 'for-4.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Another fix for a really old bug. It only affects drain_workqueue() which isn't used often and even then triggers only during a pretty small race window, so it isn't too surprising that it stayed hidden for so long. The fix is straight-forward and low-risk. Kudos to Li Bin for reporting and fixing the bug" * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix NULL pointer dereference
2017-11-06cgroup: export list of cgroups v2 features using sysfsRoman Gushchin
The active development of cgroups v2 sometimes leads to a creation of interfaces, which are not turned on by default (to provide backward compatibility). It's handy to know from userspace, which cgroup v2 features are supported without calculating it based on the kernel version. So, let's export the list of such features using /sys/kernel/cgroup/features pseudo-file. The list is hardcoded and has to be extended when new functionality is added. Each feature is printed on a new line. Example: $ cat /sys/kernel/cgroup/features nsdelegate Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-06cgroup: export list of delegatable control files using sysfsRoman Gushchin
Delegatable cgroup v2 control files may require special handling (e.g. chowning), and the exact list of such files varies between kernel versions (and likely to be extended in the future). To guarantee correctness of this list and simplify the life of userspace (systemd, first of all), let's export the list via /sys/kernel/cgroup/delegate pseudo-file. Format is siple: each control file name is printed on a new line. Example: $ cat /sys/kernel/cgroup/delegate cgroup.procs cgroup.subtree_control Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-06workqueue: Fix comment for unbound workqueue's attrbutesWang Long
Signed-off-by: Wang Long <wanglong19@meituan.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes: - synchronize kernel and tooling headers - cgroup support fix - two tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Synchronize kernel ABI headers perf/cgroup: Fix perf cgroup hierarchy support perf tools: Unwind properly location after REJECT perf symbols: Fix memory corruption because of zero length symbols
2017-11-05bpf, cgroup: implement eBPF-based device controller for cgroup v2Roman Gushchin
Cgroup v2 lacks the device controller, provided by cgroup v1. This patch adds a new eBPF program type, which in combination of previously added ability to attach multiple eBPF programs to a cgroup, will provide a similar functionality, but with some additional flexibility. This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: remove old offload/analyzerJakub Kicinski
Thanks to the ability to load a program for a specific device, running verifier twice is no longer needed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05cls_bpf: allow attaching programs loaded for specific deviceJakub Kicinski
If TC program is loaded with skip_sw flag, we should allow the device-specific programs to be accepted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05xdp: allow attaching programs loaded for specific deviceJakub Kicinski
Pass the netdev pointer to bpf_prog_get_type(). This way BPF code can decide whether the device matches what the code was loaded/translated for. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: report offload info to user spaceJakub Kicinski
Extend struct bpf_prog_info to contain information about program being bound to a device. Since the netdev may get destroyed while program still exists we need a flag to indicate the program is loaded for a device, even if the device is gone. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: offload: add infrastructure for loading programs for a specific netdevJakub Kicinski
The fact that we don't know which device the program is going to be used on is quite limiting in current eBPF infrastructure. We have to reverse or limit the changes which kernel makes to the loaded bytecode if we want it to be offloaded to a networking device. We also have to invent new APIs for debugging and troubleshooting support. Make it possible to load programs for a specific netdev. This helps us to bring the debug information closer to the core eBPF infrastructure (e.g. we will be able to reuse the verifer log in device JIT). It allows device JITs to perform translation on the original bytecode. __bpf_prog_get() when called to get a reference for an attachment point will now refuse to give it if program has a device assigned. Following patches will add a version of that function which passes the expected netdev in. @type argument in __bpf_prog_get() is renamed to attach_type to make it clearer that it's only set on attachment. All calls to ndo_bpf are protected by rtnl, only verifier callbacks are not. We need a wait queue to make sure netdev doesn't get destroyed while verifier is still running and calling its driver. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04cpufreq: schedutil: Examine the correct CPU when we update utilChris Redpath
After commit 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks) we stopped to always read the utilization for the CPU we are running the governor on, and instead we read it for the CPU which we've been told has updated utilization. This is stored in sugov_cpu->cpu. The value is set in sugov_register() but we clear it in sugov_start() which leads to always looking at the utilization of CPU0 instead of the correct one. Fix this by consolidating the initialization code into sugov_start(). Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks) Signed-off-by: Chris Redpath <chris.redpath@arm.com> Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com> Reviewed-by: Brendan Jackman <brendan.jackman@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-04Merge branch 'linus' into core/urgent, to pick up dependent commitsIngo Molnar
We want to fix an objtool build warning that got introduced in the latest upstream kernel. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03arm64/sve: Add prctl controls for userspace vector length managementDave Martin
This patch adds two arm64-specific prctls, to permit userspace to control its vector length: * PR_SVE_SET_VL: set the thread's SVE vector length and vector length inheritance mode. * PR_SVE_GET_VL: get the same information. Although these prctls resemble instruction set features in the SVE architecture, they provide additional control: the vector length inheritance mode is Linux-specific and nothing to do with the architecture, and the architecture does not permit EL0 to set its own vector length directly. Both can be used in portable tools without requiring the use of SVE instructions. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> [will: Fixed up prctl constants to avoid clash with PDEATHSIG] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxHerbert Xu
Merge 4.14-rc3 in order to pick up the new timer_setup function.
2017-11-03Revert "workqueue: respect isolated cpus when queueing an unbound work"Tejun Heo
This reverts commit b5149873a0c299195b5346fe4dc2c5b04ae2f995. It conflicts with the following isolcpus change from the sched branch. edb9382175c3 ("sched/isolation: Move isolcpus= handling to the housekeeping code") Let's revert for now. Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-03Merge branch 'linus' into perf/urgent, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-03bpf: fix verifier NULL pointer dereferenceCraig Gallek
do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 27062 Comm: syz-executor5 Not tainted 4.14.0-rc7+ #106 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801c2c74700 task.stack: ffff8801c3e28000 RIP: 0010:free_verifier_state kernel/bpf/verifier.c:347 [inline] RIP: 0010:bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: 0018:ffff8801c3e2f5c8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 00000000fffffff4 RCX: 0000000000000000 RDX: 0000000000000070 RSI: ffffffff817d5aa9 RDI: 0000000000000380 RBP: ffff8801c3e2f668 R08: 0000000000000000 R09: 1ffff100387c5d9f R10: 00000000218c4e80 R11: ffffffff85b34380 R12: ffff8801c4dc6a28 R13: 0000000000000000 R14: ffff8801c4dc6a00 R15: ffff8801c4dc6a20 FS: 00007f311079b700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d4a24 CR3: 00000001cbcd0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: bpf_prog_load+0xcbb/0x18e0 kernel/bpf/syscall.c:1166 SYSC_bpf kernel/bpf/syscall.c:1690 [inline] SyS_bpf+0xae9/0x4620 kernel/bpf/syscall.c:1652 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x452869 RSP: 002b:00007f311079abe8 EFLAGS: 00000212 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452869 RDX: 0000000000000030 RSI: 0000000020168000 RDI: 0000000000000005 RBP: 00007f311079aa20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: 00000000004b7550 R13: 00007f311079ab58 R14: 00000000004b7560 R15: 0000000000000000 Code: df 48 c1 ea 03 80 3c 02 00 0f 85 e6 0b 00 00 4d 8b 6e 20 48 b8 00 00 00 00 00 fc ff df 49 8d bd 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b6 0b 00 00 49 8b bd 80 03 00 00 e8 d6 0c 26 RIP: free_verifier_state kernel/bpf/verifier.c:347 [inline] RSP: ffff8801c3e2f5c8 RIP: bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: ffff8801c3e2f5c8 ---[ end trace c8d37f339dc64004 ]--- Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption") Fixes: 1969db47f8d0 ("bpf: fix verifier memory leaks") Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03bpf: fix out-of-bounds access warning in bpf_checkArnd Bergmann
The bpf_verifer_ops array is generated dynamically and may be empty depending on configuration, which then causes an out of bounds access: kernel/bpf/verifier.c: In function 'bpf_check': kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds] This adds a check to the start of the function as a workaround. I would assume that the function is never called in that configuration, so the warning is probably harmless. Fixes: 00176a34d9e2 ("bpf: remove the verifier ops from program structure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03bpf: fix link error without CONFIG_NETArnd Bergmann
I ran into this link error with the latest net-next plus linux-next trees when networking is disabled: kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops' kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops' It seems that the code was written to deal with varying contents of the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops and xdp_analyzer_ops are defined in the core networking code, so adding a check for CONFIG_NET seems appropriate here, and I've verified this with many randconfig builds Fixes: 4f9218aaf8a4 ("bpf: move knowledge about post-translation offsets out of verifier") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02rcu: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-02Merge tag 'irqchip-4.15-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull the second batch of irqchip updates for 4.15 from marc Zyngier: - A number of MIPS GIC updates and cleanups - One GICv4 update - Another firmware workaround for GICv2 - Support for Mason8 GPIOs - Tiny documentation fix
2017-11-02Merge tag 'spdx_identifiers-4.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull initial SPDX identifiers from Greg KH: "License cleanup: add SPDX license identifiers to some files Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: License cleanup: add SPDX license identifier to uapi header files with a license License cleanup: add SPDX license identifier to uapi header files with no license License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02Merge tag 'v4.14-rc3' into irq/irqchip-4.15Marc Zyngier
Required merge to get mainline irqchip updates. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>