summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-06-07cgroup: let a symlink too be created with a cftype fileAngelo Ruocco
This commit enables a cftype to have a symlink (of any name) that points to the file associated with the cftype. Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-06bpf: fix unconnected udp hooksDaniel Borkmann
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e3779 ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-05cgroup: css_task_iter_skip()'d iterators must be advanced before accessedTejun Heo
b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") introduced css_task_iter_skip() which is used to fix task iterations skipping dying threadgroup leaders with live threads. Skipping is implemented as a subportion of full advancing but css_task_iter_next() forgot to fully advance a skipped iterator before determining the next task to visit causing it to return invalid task pointers. Fix it by making css_task_iter_next() fully advance the iterator if it has been skipped since the previous iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()")
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436Thomas Gleixner
Based on 1 normalized pattern(s): distributed under the terms of the gnu gpl version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 2 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.032570679@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430Thomas Gleixner
Based on 1 normalized pattern(s): distribute under gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 8 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190114.475576622@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner
Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 363Thomas Gleixner
Based on 1 normalized pattern(s): released under terms in gpl version 2 see copying extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 5 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081035.689962394@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 136 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282Thomas Gleixner
Based on 1 normalized pattern(s): this software is licensed under the terms of the gnu general public license version 2 as published by the free software foundation and may be copied distributed and modified under those terms this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 285 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141900.642774971@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05signal: improve commentsChristian Brauner
Improve the comments for pidfd_send_signal(). First, the comment still referred to a file descriptor for a process as a "task file descriptor" which stems from way back at the beginning of the discussion. Replace this with "pidfd" for consistency. Second, the wording for the explanation of the arguments to the syscall was a bit inconsistent, e.g. some used the past tense some used present tense. Make the wording more consistent. Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-03x86/power: Fix 'nosmt' vs hibernation triple fault during resumeJiri Kosina
As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ <stable@vger.kernel.org> # v4.19+ Debugged-by: Thomas Gleixner <tglx@linutronix.de> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-03PM: sleep: Add kerneldoc comments to some functionsRafael J. Wysocki
Add kerneldoc comments to pm_suspend_via_firmware(), pm_resume_via_firmware() and pm_suspend_via_s2idle() to explain what they do. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's a bunch of ring-buffer ordering fixes for a reproducible bug, plus a PEBS constraints regression fix. Plus tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers UAPI: Sync kvm.h headers with the kernel sources perf record: Fix s390 missing module symbol and warning for non-root users perf machine: Read also the end of the kernel perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms perf session: Add missing swap ops for namespace events perf namespace: Protect reading thread's namespace tools headers UAPI: Sync drm/drm.h with the kernel tools headers UAPI: Sync drm/i915_drm.h with the kernel tools headers UAPI: Sync linux/fs.h with the kernel tools headers UAPI: Sync linux/sched.h with the kernel tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel perf data: Fix 'strncat may truncate' build failure with recent gcc perf/ring-buffer: Use regular variables for nesting perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data perf/ring_buffer: Add ordering to rb->nest increment perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
2019-06-02Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stacktrace fix from Ingo Molnar: "Fix a stack_trace_save_tsk_reliable() regression" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stacktrace: Unbreak stack_trace_save_tsk_reliable()
2019-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Various fixes and followups" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, compaction: make sure we isolate a valid PFN include/linux/generic-radix-tree.h: fix kerneldoc comment kernel/signal.c: trace_signal_deliver when signal_group_exit drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used spdxcheck.py: fix directory structures kasan: initialize tag to 0xff in __kasan_kmalloc z3fold: fix sheduling while atomic scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set mm/gup: continue VM_FAULT_RETRY processing even for pre-faults ocfs2: fix error path kobject memory leak memcg: make it work on sparse non-0-node systems mm, memcg: consider subtrees in memory.events prctl_set_mm: downgrade mmap_sem to read lock prctl_set_mm: refactor checks from validate_prctl_map kernel/fork.c: make max_threads symbol static arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAK mm/vmalloc.c: fix typo in comment lib/sort.c: fix kernel-doc notation warnings mm: fix Documentation/vm/hmm.rst Sphinx warnings
2019-06-01kernel/signal.c: trace_signal_deliver when signal_group_exitZhenliang Wei
In the fixes commit, removing SIGKILL from each thread signal mask and executing "goto fatal" directly will skip the call to "trace_signal_deliver". At this point, the delivery tracking of the SIGKILL signal will be inaccurate. Therefore, we need to add trace_signal_deliver before "goto fatal" after executing sigdelset. Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info. Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT") Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com> Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ivan Delalande <colona@arista.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01mm, memcg: consider subtrees in memory.eventsChris Down
memory.stat and other files already consider subtrees in their output, and we should too in order to not present an inconsistent interface. The current situation is fairly confusing, because people interacting with cgroups expect hierarchical behaviour in the vein of memory.stat, cgroup.events, and other files. For example, this causes confusion when debugging reclaim events under low, as currently these always read "0" at non-leaf memcg nodes, which frequently causes people to misdiagnose breach behaviour. The same confusion applies to other counters in this file when debugging issues. Aggregation is done at write time instead of at read-time since these counters aren't hot (unlike memory.stat which is per-page, so it does it at read time), and it makes sense to bundle this with the file notifications. After this patch, events are propagated up the hierarchy: [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events low 0 high 0 max 0 oom 0 oom_kill 0 [root@ktst ~]# systemd-run -p MemoryMax=1 true Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events low 0 high 0 max 7 oom 1 oom_kill 1 As this is a change in behaviour, this can be reverted to the old behaviour by mounting with the `memory_localevents' flag set. However, we use the new behaviour by default as there's a lack of evidence that there are any current users of memory.events that would find this change undesirable. akpm: this is a behaviour change, so Cc:stable. THis is so that forthcoming distros which use cgroup v2 are more likely to pick up the revised behaviour. Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01prctl_set_mm: downgrade mmap_sem to read lockMichal Koutný
The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap semaphore taken.") added synchronization of reading argument/environment boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided the coarse use of mmap_sem in similar situations. But there still remained two places that (mis)use mmap_sem. get_cmdline should also use arg_lock instead of mmap_sem when it reads the boundaries. The second place that should use arg_lock is in prctl_set_mm. By protecting the boundaries fields with the arg_lock, we can downgrade mmap_sem to reader lock (analogous to what we already do in prctl_set_mm_map). [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Co-developed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01prctl_set_mm: refactor checks from validate_prctl_mapMichal Koutný
Despite comment of validate_prctl_map claims there are no capability checks, it is not completely true since commit 4d28df6152aa ("prctl: Allow local CAP_SYS_ADMIN changing exe_file"). Extract the check out of the function and make the function perform purely arithmetic checks. This patch should not change any behavior, it is mere refactoring for following patch. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-2-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01kernel/fork.c: make max_threads symbol staticKefeng Wang
Fix build warning, kernel/fork.c:125:5: warning: symbol 'max_threads' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20190516015118.140561-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-31cgroup: Include dying leaders with live threads in PROCS iterationsTejun Heo
CSS_TASK_ITER_PROCS currently iterates live group leaders; however, this means that a process with dying leader and live threads will be skipped. IOW, cgroup.procs might be empty while cgroup.threads isn't, which is confusing to say the least. Fix it by making cset track dying tasks and include dying leaders with live threads in PROCS iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com>
2019-05-31cgroup: Implement css_task_iter_skip()Tejun Heo
When a task is moved out of a cset, task iterators pointing to the task are advanced using the normal css_task_iter_advance() call. This is fine but we'll be tracking dying tasks on csets and thus moving tasks from cset->tasks to (to be added) cset->dying_tasks. When we remove a task from cset->tasks, if we advance the iterators, they may move over to the next cset before we had the chance to add the task back on the dying list, which can allow the task to escape iteration. This patch separates out skipping from advancing. Skipping only moves the affected iterators to the next pointer rather than fully advancing it and the following advancing will recognize that the cursor has already been moved forward and do the rest of advancing. This ensures that when a task moves from one list to another in its cset, as long as it moves in the right direction, it's always visible to iteration. This doesn't cause any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2019-05-31cgroup: Call cgroup_release() before __exit_signal()Tejun Heo
cgroup_release() calls cgroup_subsys->release() which is used by the pids controller to uncharge its pid. We want to use it to manage iteration of dying tasks which requires putting it before __unhash_process(). Move cgroup_release() above __exit_signal(). While this makes it uncharge before the pid is freed, pid is RCU freed anyway and the window is very narrow. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2019-05-31Merge tag 'pm-5.2-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix three issues in the system-wide suspend and hibernation area related to PCI device PM handling by suspend-to-idle, device wakeup optimizations and arbitrary differences between suspend and hiberantion. Specifics: - Modify the PCI bus type's PM code to avoid putting devices left by their drivers in D0 on purpose during suspend to idle into low-power states as doing that may confuse the system resume callbacks of the drivers in question (Rafael Wysocki). - Avoid checking ACPI wakeup configuration during system-wide suspend for suspended devices that do not use ACPI-based wakeup to allow them to stay in suspend more often (Rafael Wysocki). - The last phase of hibernation is analogous to system-wide suspend also because on platforms with ACPI it passes control to the platform firmware to complete the transision, so make it indicate that by calling pm_set_suspend_via_firmware() to allow the drivers that care about this to do the right thing (Rafael Wysocki)" * tag 'pm-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI: PM: Avoid possible suspend-to-idle issue ACPI: PM: Call pm_set_suspend_via_firmware() during hibernation ACPI/PCI: PM: Add missing wakeup.flags.valid checks
2019-05-31Merge tag 'spdx-5.2-rc3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Here is another set of reviewed patches that adds SPDX tags to different kernel files, based on a set of rules that are being used to parse the comments to try to determine that the license of the file is "GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of these matches are included here, a number of "non-obvious" variants of text have been found but those have been postponed for later review and analysis. There is also a patch in here to add the proper SPDX header to a bunch of Kbuild files that we have missed in the past due to new files being added and forgetting that Kbuild uses two different file names for Makefiles. This issue was reported by the Kbuild maintainer. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers" * tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits) treewide: Add SPDX license identifier - Kbuild treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 ...
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 107 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.615055994@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170Thomas Gleixner
Based on 1 normalized pattern(s): this file is release under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.216732358@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157Thomas Gleixner
Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1105 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFOEric W. Biederman
Recently syzbot in conjunction with KMSAN reported that ptrace_peek_siginfo can copy an uninitialized siginfo to userspace. Inspecting ptrace_peek_siginfo confirms this. The problem is that off when initialized from args.off can be initialized to a negaive value. At which point the "if (off >= 0)" test to see if off became negative fails because off started off negative. Prevent the core problem by adding a variable found that is only true if a siginfo is found and copied to a temporary in preparation for being copied to userspace. Prevent args.off from being truncated when being assigned to off by testing that off is <= the maximum possible value of off. Convert off to an unsigned long so that we should not have to truncate args.off, we have well defined overflow behavior so if we add another check we won't risk fighting undefined compiler behavior, and so that we have a type whose maximum value is easy to test for. Cc: Andrei Vagin <avagin@gmail.com> Cc: stable@vger.kernel.org Reported-by: syzbot+0d602a1b0d8c95bdf299@syzkaller.appspotmail.com Fixes: 84c751bd4aeb ("ptrace: add ability to retrieve signals without removing from a queue (v4)") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-29Merge tag 'trace-v5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This fixes a memory leak from the error path in the event filter logic" * tag 'trace-v5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Avoid memory leak in predicate_parse()
2019-05-28Merge tag 'perf-urgent-for-mingo-5.2-20190528' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes: BPF: Jiri Olsa: - Fixup determination of end of kernel map, to avoid having BPF programs, that are after the kernel headers and just before module texts mixed up in the kernel map. tools UAPI header copies: Arnaldo Carvalho de Melo: - Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls. - Sync cpufeatures.h, sched.h, fs.h, drm.h, i915_drm.h and kvm.h headers. Namespaces: Namhyung Kim: - Add missing byte swap ops for namespace events when processing records from perf.data files that could have been recorded in a arch with a different endianness. - Fix access to the thread namespaces list by using the namespaces_lock. perf data: Shawn Landden: - Fix 'strncat may truncate' build failure with recent gcc. s/390 Thomas Richter: - Fix s390 missing module symbol and warning for non-root users in 'perf record'. arm64: Vitaly Chikunov: - Fix mksyscalltbl when system kernel headers are ahead of the kernel. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-28tracing: Avoid memory leak in predicate_parse()Tomas Bortoli
In case of errors, predicate_parse() goes to the out_free label to free memory and to return an error code. However, predicate_parse() does not free the predicates of the temporary prog_stack array, thence leaking them. Link: http://lkml.kernel.org/r/20190528154338.29976-1-tomasbortoli@gmail.com Cc: stable@vger.kernel.org Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: syzbot+6b8e0fb820e570c59e19@syzkaller.appspotmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> [ Added protection around freeing prog_stack[i].pred ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-27ACPI: PM: Call pm_set_suspend_via_firmware() during hibernationRafael J. Wysocki
On systems with ACPI platform firmware the last stage of hibernation is analogous to system suspend to S3 (suspend-to-RAM), so it should be handled analogously. In particular, pm_suspend_via_firmware() should return 'true' in that stage to let the callers of it know that control will be passed to the platform firmware going forward, so pm_set_suspend_via_firmware() needs to be called then in analogy with acpi_suspend_begin(). However, the platform hibernation ->begin() callback is invoked during the "freeze" transition (before creating a snapshot image of system memory) as well as during the "hibernate" transition which is the last stage of it and pm_set_suspend_via_firmware() should be invoked by that callback in the latter stage only. In order to implement that redefine the hibernation ->begin() callback to take a pm_message_t argument to indicate which stage of hibernation is taking place and rework acpi_hibernation_begin() and acpi_hibernation_begin_old() to take it into account as needed. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-26Merge tag 'trace-v5.2-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing warning fix from Steven Rostedt: "Make the GCC 9 warning for sub struct memset go away. GCC 9 now warns about calling memset() on partial structures when it goes across multiple fields. This adds a helper for the place in tracing that does this type of clearing of a structure" * tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Silence GCC 9 array bounds warning
2019-05-25tracing: Silence GCC 9 array bounds warningMiguel Ojeda
Starting with GCC 9, -Warray-bounds detects cases when memset is called starting on a member of a struct but the size to be cleared ends up writing over further members. Such a call happens in the trace code to clear, at once, all members after and including `seq` on struct trace_iterator: In function 'memset', inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3: ./include/linux/string.h:344:9: warning: '__builtin_memset' offset [8505, 8560] from the object at 'iter' is out of the bounds of referenced subobject 'seq' with type 'struct trace_seq' at offset 4368 [-Warray-bounds] 344 | return __builtin_memset(p, c, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid GCC complaining about it, we compute the address ourselves by adding the offsetof distance instead of referring directly to the member. Since there are two places doing this clear (trace.c and trace_kdb.c), take the chance to move the workaround into a single place in the internal header. Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [ Removed unnecessary parenthesis around "iter" ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25Merge tag 'trace-v5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Tom Zanussi sent me some small fixes and cleanups to the histogram code and I forgot to incorporate them. I also added a small clean up patch that was sent to me a while ago and I just noticed it" * tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kernel/trace/trace.h: Remove duplicate header of trace_seq.h tracing: Add a check_val() check before updating cond_snapshot() track_val tracing: Check keys for variable references in expressions too tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
2019-05-24Merge tag 'spdx-5.2-rc2-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pule more SPDX updates from Greg KH: "Here is another set of reviewed patches that adds SPDX tags to different kernel files, based on a set of rules that are being used to parse the comments to try to determine that the license of the file is "GPL-2.0-or-later". Only the "obvious" versions of these matches are included here, a number of "non-obvious" variants of text have been found but those have been postponed for later review and analysis. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers" * tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 ...
2019-05-24locking/lock_events: Use this_cpu_add() when necessaryWaiman Long
The kernel test robot has reported that the use of __this_cpu_add() causes bug messages like: BUG: using __this_cpu_add() in preemptible [00000000] code: ... Given the imprecise nature of the count and the possibility of resetting the count and doing the measurement again, this is not really a big problem to use the unprotected __this_cpu_*() functions. To make the preemption checking code happy, the this_cpu_*() functions will be used if CONFIG_DEBUG_PREEMPT is defined. The imprecise nature of the locking counts are also documented with the suggestion that we should run the measurement a few times with the counts reset in between to get a better picture of what is going on under the hood. Fixes: a8654596f0371 ("locking/rwsem: Enable lock event counting") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-24kheaders: Do not regenerate archive if config is not changedJoel Fernandes (Google)
Linus reported an issue that doing an allmodconfig was causing the kheaders archive to be regenerated even though the config is the same. This patch fixes the issue by ignoring the config-related header files for "knowing when to regenerate based on timestamps". Instead, if the CONFIG_X_Y option really changes, then we there are the include/config/X/Y.h which will already tells us "if a config really changed". So we don't really need these files for regeneration detection anyway, and ignoring them fixes Linus's issue. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24kheaders: Move from proc to sysfsJoel Fernandes (Google)
The kheaders archive consisting of the kernel headers used for compiling bpf programs is in /proc. However there is concern that moving it here will make it permanent. Let us move it to /sys/kernel as discussed [1]. [1] https://lore.kernel.org/patchwork/patch/1067310/#1265969 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 38Thomas Gleixner
Based on 1 normalized pattern(s): this file is released under the gplv2 and any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.732920462@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public licence as published by the free software foundation either version 2 of the licence or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 114 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24perf/ring-buffer: Use regular variables for nestingPeter Zijlstra
While the IRQ/NMI will nest, the nest-count will be invariant over the actual exception, since it will decrement equal to increment. This means we can -- carefully -- use a regular variable since the typical LOAD-STORE race doesn't exist (similar to preempt_count). This optimizes the ring-buffer for all LOAD-STORE architectures, since they need to use atomic ops to implement local_t. Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mark.rutland@arm.com Cc: namhyung@kernel.org Cc: yabinc@google.com Link: http://lkml.kernel.org/r/20190517115418.481392777@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-24perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page dataPeter Zijlstra
We must use {READ,WRITE}_ONCE() on rb->user_page data such that concurrent usage will see whole values. A few key sites were missing this. Suggested-by: Yabin Cui <yabinc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mark.rutland@arm.com Cc: namhyung@kernel.org Fixes: 7b732a750477 ("perf_counter: new output ABI - part 1") Link: http://lkml.kernel.org/r/20190517115418.394192145@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-24perf/ring_buffer: Add ordering to rb->nest incrementPeter Zijlstra
Similar to how decrementing rb->next too early can cause data_head to (temporarily) be observed to go backward, so too can this happen when we increment too late. This barrier() ensures the rb->head load happens after the increment, both the one in the 'goto again' path, as the one from perf_output_get_handle() -- albeit very unlikely to matter for the latter. Suggested-by: Yabin Cui <yabinc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mark.rutland@arm.com Cc: namhyung@kernel.org Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables") Link: http://lkml.kernel.org/r/20190517115418.309516009@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-24perf/ring_buffer: Fix exposing a temporarily decreased data_headYabin Cui
In perf_output_put_handle(), an IRQ/NMI can happen in below location and write records to the same ring buffer: ... local_dec_and_test(&rb->nest) ... <-- an IRQ/NMI can happen here rb->user_page->data_head = head; ... In this case, a value A is written to data_head in the IRQ, then a value B is written to data_head after the IRQ. And A > B. As a result, data_head is temporarily decreased from A to B. And a reader may see data_head < data_tail if it read the buffer frequently enough, which creates unexpected behaviors. This can be fixed by moving dec(&rb->nest) to after updating data_head, which prevents the IRQ/NMI above from updating data_head. [ Split up by peterz. ] Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: mark.rutland@arm.com Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables") Link: http://lkml.kernel.org/r/20190517115418.224478157@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>