summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-02-02tracing: Remove definition of DEBUG in trace_mmiotrace.cTom Rix
Defining DEBUG should only be done in development. So remove DEBUG. Link: https://lkml.kernel.org/r/20210115153348.131791-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Fix a kernel doc warningBean Huo
Add description for trace_array_put() parameter. kernel/trace/trace.c:464: warning: Function parameter or member 'this_tr' not described in 'trace_array_put' Link: https://lkml.kernel.org/r/20210112111202.23508-1-huobean@gmail.com Signed-off-by: Bean Huo <beanhuo@micron.com> [ Merged as one of the original fixes was already fixed by someone else ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Fix spelling of controlling in uprobesBhaskar Chowdhury
s/controling/controlling/p Link: https://lkml.kernel.org/r/20210112045008.29834-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Fix spelling mistake in Kconfig "infinit" -> "infinite"Colin Ian King
There is a spelling mistake in the Kconfig help text. Fix it. Link: https://lkml.kernel.org/r/20201216114051.12056-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Use in_serving_softirq() to deduct softirq status.Sebastian Andrzej Siewior
PREEMPT_RT does not report "serving softirq" because the tracing core looks at the preemption counter while PREEMPT_RT does not update it while processing softirqs in order to remain preemptible. The information is stored somewhere else. The in_serving_softirq() macro and the SOFTIRQ_OFFSET define are still working but not on the preempt-counter. Use in_serving_softirq() macro which works on PREEMPT_RT. On !PREEMPT_RT the compiler (gcc-10 / clang-11) is smart enough to optimize the in_serving_softirq() related read of the preemption counter away. The only difference I noticed by using in_serving_softirq() on !PREEMPT_RT is that gcc-10 implemented tracing_gen_ctx_flags() as reading FLAG, jmp _tracing_gen_ctx_flags(). Without in_serving_softirq() it inlined _tracing_gen_ctx_flags() into tracing_gen_ctx_flags(). Link: https://lkml.kernel.org/r/20210125194511.3924915-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Inline tracing_gen_ctx_flags()Sebastian Andrzej Siewior
Inline tracing_gen_ctx_flags(). This allows to have one ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT. This requires to move `trace_flag_type' so tracing_gen_ctx_flags() can use it. Link: https://lkml.kernel.org/r/20210125194511.3924915-3-bigeasy@linutronix.de Suggested-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20210125140323.6b1ff20c@gandalf.local.home Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Merge irqflags + preempt counter.Sebastian Andrzej Siewior
The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02ring-buffer: Drop unneeded check in ring_buffer_resize()Qiujun Huang
Remove the cpumask check, as we has done it at the beginning of the function. Also fix a typo. s/also the on the/also on the/ Link: https://lkml.kernel.org/r/20201224144634.3210-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02ring-buffer: Remove cpu_buffer argument from the rb_inc_page()Qiujun Huang
The cpu_buffer argument is not used inside the rb_inc_page() after commit 3adc54fa82a6 ("ring-buffer: make the buffer a true circular link list"). And cpu_buffer argument is not used inside the two functions too, rb_is_head_page/rb_set_list_to_head. Link: https://lkml.kernel.org/r/20201225140356.23008-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Remove get/put_cpu() from function_trace_initQiujun Huang
Since commit b6f11df26fdc ("trace: Call tracing_reset_online_cpus before tracer->init()"), get/put_cpu() are not needed anymore. We can use raw_smp_processor_id() instead. Link: https://lkml.kernel.org/r/20201230140521.31920-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02tracing: Update trace_ignore_this_task() kernel-doc commentQiujun Huang
Update kernel-doc parameter after commit b3b1e6ededa4 ("ftrace: Create set_ftrace_notrace_pid to not trace tasks") added @filtered_no_pids. Link: https://lkml.kernel.org/r/20201231153558.4804-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02Merge tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fix from Christoph Hellwig: "Fix a kernel crash in the new dma-mapping benchmark test (Barry Song)" * tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: benchmark: fix kernel crash when dma_map_single fails
2021-02-02Merge tag 'net-5.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc7, including fixes from bpf and mac80211 trees. Current release - regressions: - ip_tunnel: fix mtu calculation - mlx5: fix function calculation for page trees Previous releases - regressions: - vsock: fix the race conditions in multi-transport support - neighbour: prevent a dead entry from updating gc_list - dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Previous releases - always broken: - bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. - bpf: add missing fput() in BPF inode storage map update helper - udp: ipv4: manipulate network header of NATed UDP GRO fraglist - mac80211: fix station rate table updates on assoc - r8169: work around RTL8125 UDP HW bug - igc: report speed and duplex as unknown when device is runtime suspended - rxrpc: fix deadlock around release of dst cached on udp tunnel" * tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary net: ipa: fix two format specifier errors net: ipa: use the right accessor in ipa_endpoint_status_skip() net: ipa: be explicit about endianness net: ipa: add a missing __iomem attribute net: ipa: pass correct dma_handle to dma_free_coherent() r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net: mvpp2: TCAM entry enable should be written after SRAM data net: lapb: Copy the skb before sending a packet net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc udp: ipv4: manipulate network header of NATed UDP GRO fraglist net: ip_tunnel: fix mtu calculation vsock: fix the race conditions in multi-transport support net: sched: replaced invalid qdisc tree flush helper in qdisc_replace ibmvnic: device remove has higher precedence over reset ...
2021-02-01perf/core: Add PERF_SAMPLE_WEIGHT_STRUCTKan Liang
Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the cost of an action represented by the sample. This allows the profiler to scale the samples to be more informative to the programmer. It could also help to locate a hotspot, e.g., when profiling by memory latencies, the expensive load appear higher up in the histograms. But current PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This could be a problem, if users want two or more factors to contribute to the weight. For example, Golden Cove core PMU can provide both the instruction latency and the cache Latency information as factors for the memory profiling. For current X86 platforms, although meminfo::latency is defined as a u64, only the lower 32 bits include the valid data in practice (No memory access could last than 4G cycles). The higher 32 bits can be used to store new factors. Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new sample weight structure. It shares the same space as the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type. - For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT sample type. PowerPC can re-struct the weight field similarly later. - For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now. The following patches will apply the new factors for the PERF_SAMPLE_WEIGHT_STRUCT sample type. The field in the union perf_sample_weight should be shared among different architectures. A generic name is required, but it's hard to abstract a name that applies to all architectures. For example, on X86, the fields are to store all kinds of latency. While on PowerPC, it stores MMCRA[TECX/TECM], which should not be latency. So a general name prefix 'var$NUM' is used here. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-2-git-send-email-kan.liang@linux.intel.com
2021-01-31Merge tag 'core-urgent-2021-01-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull single stepping fix from Thomas Gleixner: "A single fix for the single step reporting regression caused by getting the condition wrong when moving SYSCALL_EMU away from TIF flags" [ There's apparently another problem too, fix pending ] * tag 'core-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Unbreak single step reporting behaviour
2021-01-30genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is setMarc Zyngier
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
2021-01-29kretprobe: Avoid re-registration of the same kretprobe earlierWang ShaoBo
Our system encountered a re-init error when re-registering same kretprobe, where the kretprobe_instance in rp->free_instances is illegally accessed after re-init. Implementation to avoid re-registration has been introduced for kprobe before, but lags for register_kretprobe(). We must check if kprobe has been re-registered before re-initializing kretprobe, otherwise it will destroy the data struct of kretprobe registered, which can lead to memory leak, system crash, also some unexpected behaviors. We use check_kprobe_rereg() to check if kprobe has been re-registered before running register_kretprobe()'s body, for giving a warning message and terminate registration process. Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com Cc: stable@vger.kernel.org Fixes: 1f0ab40976460 ("kprobes: Prevent re-registration of the same kprobe") [ The above commit should have been done for kretprobes too ] Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-29Merge tag 'pm-5.11-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a deadlock in the 'kexec jump' code and address a possible hibernation image creation issue. Specifics: - Fix a deadlock caused by attempting to acquire the same mutex twice in a row in the "kexec jump" code (Baoquan He) - Modify the hibernation image saving code to flush the unwritten data to the swap storage later so as to avoid failing to write the image signature which is possible in some cases (Laurent Badel)" * tag 'pm-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: hibernate: flush swap writer after marking kernel: kexec: remove the lock operation of system_transition_mutex
2021-01-29tracing/kprobe: Fix to support kretprobe events on unloaded modulesMasami Hiramatsu
Fix kprobe_on_func_entry() returns error code instead of false so that register_kretprobe() can return an appropriate error code. append_trace_kprobe() expects the kprobe registration returns -ENOENT when the target symbol is not found, and it checks whether the target module is unloaded or not. If the target module doesn't exist, it defers to probe the target symbol until the module is loaded. However, since register_kretprobe() returns -EINVAL instead of -ENOENT in that case, it always fail on putting the kretprobe event on unloaded modules. e.g. Kprobe event: /sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events [ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue. Kretprobe event: (p -> r) /sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # cat error_log [ 41.122514] trace_kprobe: error: Failed to register probe event Command: r xfs:xfs_end_io ^ To fix this bug, change kprobe_on_func_entry() to detect symbol lookup failure and return -ENOENT in that case. Otherwise it returns -EINVAL or 0 (succeeded, given address is on the entry). Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly") Reported-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-29tracing: Use pause-on-trace with the latency tracersViktor Rosendahl
Eaerlier, tracing was disabled when reading the trace file. This behavior was changed with: commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file"). This doesn't seem to work with the latency tracers. The above mentioned commit dit not only change the behavior but also added an option to emulate the old behavior. The idea with this patch is to enable this pause-on-trace option when the latency tracers are used. Link: https://lkml.kernel.org/r/20210119164344.37500-2-Viktor.Rosendahl@bmw.de Cc: stable@vger.kernel.org Fixes: 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file") Signed-off-by: Viktor Rosendahl <Viktor.Rosendahl@bmw.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-29fgraph: Initialize tracing_graph_pause at task creationSteven Rostedt (VMware)
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by: pierre.gondois@arm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-29locking/rwsem: Remove empty rwsem.hNikolay Borisov
This is a leftover from 7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20210126101721.976027-1-nborisov@suse.com
2021-01-28Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-01-29 1) Fix two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. 2) Fix a missing fput() in BPF inode storage map update helper, from Pan Bian. 3) Fix a build error on unresolved symbols on disabled networking / keys LSM hooks, from Mikko Ylinen. 4) Fix preload BPF prog build when the output directory from make points to a relative path, from Quentin Monnet. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, preload: Fix build when $(O) points to a relative path bpf: Drop disabled LSM hooks from the sleepable set bpf, inode_storage: Put file handler if no storage was found bpf, cgroup: Fix problematic bounds check bpf, cgroup: Fix optlen WARN_ON_ONCE toctou ==================== Link: https://lore.kernel.org/r/20210129001556.6648-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29fs: Remove dcookies supportViresh Kumar
The dcookies stuff was only used by the kernel's old oprofile code. Now that oprofile's support is removed from the kernel, there is no need for dcookies as well. Remove it. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2021-01-29bpf: Simplify cases in bpf_base_func_protoTobias Klauser
!perfmon_capable() is checked before the last switch(func_id) in bpf_base_func_proto. Thus, the cases BPF_FUNC_trace_printk and BPF_FUNC_snprintf_btf can be moved to that last switch(func_id) to omit the inline !perfmon_capable() checks. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127174615.3038-1-tklauser@distanz.ch
2021-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/dev.c b552766c872f ("can: dev: prevent potential information leak in can_fill_info()") 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir") 0a042c6ec991 ("can: dev: move netlink related code into seperate file") Code move. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down") 214baf22870c ("net/mlx5e: Support HTB offload") Adjacent code changes net/switchdev/switchdev.c 20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP") ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers") bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes") Transaction parameter gets dropped otherwise keep the fix. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28entry: Unbreak single step reporting behaviourYuxuan Shui
The move of TIF_SYSCALL_EMU to SYSCALL_WORK_SYSCALL_EMU broke single step reporting. The original code reported the single step when TIF_SINGLESTEP was set and TIF_SYSCALL_EMU was not set. The SYSCALL_WORK conversion got the logic wrong and now the reporting only happens when both bits are set. Restore the original behaviour. [ tglx: Massaged changelog and dropped the pointless double negation ] Fixes: 64eb35f701f0 ("ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag") Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/877do3gaq9.fsf@m5Zedd9JOGzJrf0
2021-01-28locking/rtmutex: Add missing kernel-doc markupAlex Shi
To fix the following issues: kernel/locking/rtmutex.c:1612: warning: Function parameter or member 'lock' not described in '__rt_mutex_futex_unlock' kernel/locking/rtmutex.c:1612: warning: Function parameter or member 'wake_q' not described in '__rt_mutex_futex_unlock' kernel/locking/rtmutex.c:1675: warning: Function parameter or member 'name' not described in '__rt_mutex_init' kernel/locking/rtmutex.c:1675: warning: Function parameter or member 'key' not described in '__rt_mutex_init' [ tglx: Change rt lock to rt_mutex for consistency sake ] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1605257895-5536-2-git-send-email-alex.shi@linux.alibaba.com
2021-01-28futex: Remove unneeded gotosJangwoong Kim
Get rid of gotos that do not contain any cleanup. These were not removed in commit 9180bd467f9a ("futex: Remove put_futex_key()"). Signed-off-by: Jangwoong Kim <6812skiii@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201230122953.10473-1-6812skiii@gmail.com
2021-01-28futex: Change utime parameter to be 'const ... *'Alejandro Colomar
futex(2) says that 'utime' is a pointer to 'const'. The implementation doesn't use 'const'; however, it _never_ modifies the contents of utime. - futex() either uses 'utime' as a pointer to struct or as a 'u32'. - In case it's used as a 'u32', it makes a copy of it, and of course it is not dereferenced. - In case it's used as a 'struct __kernel_timespec __user *', the pointer is not dereferenced inside the futex() definition, and it is only passed to a function: get_timespec64(), which accepts a 'const struct __kernel_timespec __user *'. [ tglx: Make the same change to the compat syscall and fixup the prototypes. ] Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201128123945.4592-1-alx.manpages@gmail.com
2021-01-28genirq: Use new tasklet API for resend_taskletEmil Renner Berthing
This converts the resend_tasklet to use the new API in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") The new API changes the argument passed to the callback function, but fortunately the argument isn't used so it is straight forward to use DECLARE_TASKLET() rather than DECLARE_TASKLET_OLD(). Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210123182456.6521-1-esmil@mailme.dk
2021-01-27audit: Make audit_filter_syscall() return voidYang Yang
No invoker uses the return value of audit_filter_syscall(). So make it return void, and amend the comment of audit_filter_syscall(). Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> [PM: removed the changelog from the description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-27bpf: Allow rewriting to ports under ip_unprivileged_port_startStanislav Fomichev
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port to the privileged ones (< ip_unprivileged_port_start), but it will be rejected later on in the __inet_bind or __inet6_bind. Let's add another return value to indicate that CAP_NET_BIND_SERVICE check should be ignored. Use the same idea as we currently use in cgroup/egress where bit #1 indicates CN. Instead, for cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should be bypassed. v5: - rename flags to be less confusing (Andrey Ignatov) - rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags and accept BPF_RET_SET_CN (no behavioral changes) v4: - Add missing IPv6 support (Martin KaFai Lau) v3: - Update description (Martin KaFai Lau) - Fix capability restore in selftest (Martin KaFai Lau) v2: - Switch to explicit return code (Martin KaFai Lau) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com
2021-01-27bpf: Change 'BPF_ADD' to 'BPF_AND' in print_bpf_insn()Menglong Dong
This 'BPF_ADD' is duplicated, and I belive it should be 'BPF_AND'. Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20210127022507.23674-1-dong.menglong@zte.com.cn
2021-01-27PM: sleep: No need to check PF_WQ_WORKER in thaw_kernel_threads()Zqiang
Because PF_KTHREAD is set for all wq worker threads, it is not necessary to check PF_WQ_WORKER in addition to it in thaw_kernel_threads(), so stop doing that. Signed-off-by: Zqiang <qiang.zhang@windriver.com> [ rjw: Subject and changelog rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-27sched/fair: Move avg_scan_cost calculations under SIS_PROPMel Gorman
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP check and while we are at it, exclude the cost of initialising the CPU mask from the average scan cost. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210125085909.4600-3-mgorman@techsingularity.net
2021-01-27sched/fair: Remove SIS_AVG_CPUMel Gorman
SIS_AVG_CPU was introduced as a means of avoiding a search when the average search cost indicated that the search would likely fail. It was a blunt instrument and disabled by commit 4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive") and later replaced with a proportional search depth by commit 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()"). While there are corner cases where SIS_AVG_CPU is better, it has now been disabled for almost three years. As the intent of SIS_PROP is to reduce the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus on SIS_PROP as a throttling mechanism. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210125085909.4600-2-mgorman@techsingularity.net
2021-01-27sched/topology: Make sched_init_numa() use a set for the deduplicating sortValentin Schneider
The deduplicating sort in sched_init_numa() assumes that the first line in the distance table contains all unique values in the entire table. I've been trying to pen what this exactly means for the topology, but it's not straightforward. For instance, topology.c uses this example: node 0 1 2 3 0: 10 20 20 30 1: 20 10 20 20 2: 20 20 10 20 3: 30 20 20 10 0 ----- 1 | / | | / | | / | 2 ----- 3 Which works out just fine. However, if we swap nodes 0 and 1: 1 ----- 0 | / | | / | | / | 2 ----- 3 we get this distance table: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 30 2: 20 20 10 20 3: 20 30 20 10 Which breaks the deduplicating sort (non-representative first line). In this case this would just be a renumbering exercise, but it so happens that we can have a deduplicating sort that goes through the whole table in O(n²) at the extra cost of a temporary memory allocation (i.e. any form of set). The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following this, implement the set as a 256-bits bitmap. Should this not be satisfactory (i.e. we want to support 32-bit values), then we'll have to go for some other sparse set implementation. This has the added benefit of letting us allocate just the right amount of memory for sched_domains_numa_distance[], rather than an arbitrary (nr_node_ids + 1). Note: DT binding equivalent (distance-map) decodes distances as 32-bit values. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
2021-01-27sched/eas: Don't update misfit status if the task is pinnedQais Yousef
If the task is pinned to a cpu, setting the misfit status means that we'll unnecessarily continuously attempt to migrate the task but fail. This continuous failure will cause the balance_interval to increase to a high value, and eventually cause unnecessary significant delays in balancing the system when real imbalance happens. Caught while testing uclamp where rt-app calibration loop was pinned to cpu 0, shortly after which we spawn another task with high util_clamp value. The task was failing to migrate after over 40ms of runtime due to balance_interval unnecessary expanded to a very high value from the calibration loop. Not done here, but it could be useful to extend the check for pinning to verify that the affinity of the task has a cpu that fits. We could end up in a similar situation otherwise. Fixes: 3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Quentin Perret <qperret@google.com> Acked-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210119120755.2425264-1-qais.yousef@arm.com
2021-01-27dma-mapping: benchmark: fix kernel crash when dma_map_single failsBarry Song
if dma_map_single() fails, kernel will give the below oops since task_struct has been destroyed and we are running into the memory corruption due to use-after-free in kthread_stop(): [ 48.095310] Unable to handle kernel paging request at virtual address 000000c473548040 [ 48.095736] Mem abort info: [ 48.095864] ESR = 0x96000004 [ 48.096025] EC = 0x25: DABT (current EL), IL = 32 bits [ 48.096268] SET = 0, FnV = 0 [ 48.096401] EA = 0, S1PTW = 0 [ 48.096538] Data abort info: [ 48.096659] ISV = 0, ISS = 0x00000004 [ 48.096820] CM = 0, WnR = 0 [ 48.097079] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104639000 [ 48.098099] [000000c473548040] pgd=0000000000000000, p4d=0000000000000000 [ 48.098832] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 48.099232] Modules linked in: [ 48.099387] CPU: 0 PID: 2 Comm: kthreadd Tainted: G W [ 48.099887] Hardware name: linux,dummy-virt (DT) [ 48.100078] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 48.100516] pc : __kmalloc_node+0x214/0x368 [ 48.100944] lr : __kmalloc_node+0x1f4/0x368 [ 48.101458] sp : ffff800011f0bb80 [ 48.101843] x29: ffff800011f0bb80 x28: ffff0000c0098ec0 [ 48.102330] x27: 0000000000000000 x26: 00000000001d4600 [ 48.102648] x25: ffff0000c0098ec0 x24: ffff800011b6a000 [ 48.102988] x23: 00000000ffffffff x22: ffff0000c0098ec0 [ 48.103333] x21: ffff8000101d7a54 x20: 0000000000000dc0 [ 48.103657] x19: ffff0000c0001e00 x18: 0000000000000000 [ 48.104069] x17: 0000000000000000 x16: 0000000000000000 [ 48.105449] x15: 000001aa0304e7b9 x14: 00000000000003b1 [ 48.106401] x13: ffff8000122d5000 x12: ffff80001228d000 [ 48.107296] x11: ffff0000c0154340 x10: 0000000000000000 [ 48.107862] x9 : ffff80000fffffff x8 : ffff0000c473527f [ 48.108326] x7 : ffff800011e62f58 x6 : ffff0000c01c8ed8 [ 48.108778] x5 : ffff0000c0098ec0 x4 : 0000000000000000 [ 48.109223] x3 : 00000000001d4600 x2 : 0000000000000040 [ 48.109656] x1 : 0000000000000001 x0 : ff0000c473548000 [ 48.110104] Call trace: [ 48.110287] __kmalloc_node+0x214/0x368 [ 48.110493] __vmalloc_node_range+0xc4/0x298 [ 48.110805] copy_process+0x2c8/0x15c8 [ 48.111133] kernel_clone+0x5c/0x3c0 [ 48.111373] kernel_thread+0x64/0x90 [ 48.111604] kthreadd+0x158/0x368 [ 48.111810] ret_from_fork+0x10/0x30 [ 48.112336] Code: 17ffffe9 b9402a62 b94008a1 11000421 (f8626802) [ 48.112884] ---[ end trace d4890e21e75419d5 ]--- Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-27workqueue: Use %s instead of function nameStephen Zhang
It is better to replace the function name with %s, in case the function name changes. Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-01-26bpf, preload: Fix build when $(O) points to a relative pathQuentin Monnet
Building the kernel with CONFIG_BPF_PRELOAD, and by providing a relative path for the output directory, may fail with the following error: $ make O=build bindeb-pkg ... /.../linux/tools/scripts/Makefile.include:5: *** O=build does not exist. Stop. make[7]: *** [/.../linux/kernel/bpf/preload/Makefile:9: kernel/bpf/preload/libbpf.a] Error 2 make[6]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf/preload] Error 2 make[5]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf] Error 2 make[4]: *** [/.../linux/Makefile:1799: kernel] Error 2 make[4]: *** Waiting for unfinished jobs.... In the case above, for the "bindeb-pkg" target, the error is produced by the "dummy" check in Makefile.include, called from libbpf's Makefile. This check changes directory to $(PWD) before checking for the existence of $(O). But at this step we have $(PWD) pointing to "/.../linux/build", and $(O) pointing to "build". So the Makefile.include tries in fact to assert the existence of a directory named "/.../linux/build/build", which does not exist. Note that the error does not occur for all make targets and architectures combinations. This was observed on x86 for "bindeb-pkg", or for a regular build for UML [0]. Here are some details. The root Makefile recursively calls itself once, after changing directory to $(O). The content for the variable $(PWD) is preserved across recursive calls to make, so it is unchanged at this step. For "bindeb-pkg", $(PWD) is eventually updated because the target writes a new Makefile (as debian/rules) and calls it indirectly through dpkg-buildpackage. This script does not preserve $(PWD), which is reset to the current working directory when the target in debian/rules is called. Although not investigated, it seems likely that something similar causes UML to change its value for $(PWD). Non-trivial fixes could be to remove the use of $(PWD) from the "dummy" check, or to make sure that $(PWD) and $(O) are preserved or updated to always play well and form a valid $(PWD)/$(O) path across the different targets and architectures. Instead, we take a simpler approach and just update $(O) when calling libbpf's Makefile, so it points to an absolute path which should always resolve for the "dummy" check run (through includes) by that Makefile. David Gow previously posted a slightly different version of this patch as a RFC [0], two months ago or so. [0] https://lore.kernel.org/bpf/20201119085022.3606135-1-davidgow@google.com/t/#u Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Reported-by: David Gow <davidgow@google.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/bpf/20210126161320.24561-1-quentin@isovalent.com
2021-01-26bpf: Drop disabled LSM hooks from the sleepable setMikko Ylinen
Some networking and keys LSM hooks are conditionally enabled and when building the new sleepable BPF LSM hooks with those LSM hooks disabled, the following build error occurs: BTFIDS vmlinux FAILED unresolved symbol bpf_lsm_socket_socketpair To fix the error, conditionally add the relevant networking/keys LSM hooks to the sleepable set. Fixes: 423f16108c9d8 ("bpf: Augment the set of sleepable LSM hooks") Signed-off-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210125063936.89365-1-mikko.ylinen@linux.intel.com
2021-01-26futex: Handle faults correctly for PI futexesThomas Gleixner
fixup_pi_state_owner() tries to ensure that the state of the rtmutex, pi_state and the user space value related to the PI futex are consistent before returning to user space. In case that the user space value update faults and the fault cannot be resolved by faulting the page in via fault_in_user_writeable() the function returns with -EFAULT and leaves the rtmutex and pi_state owner state inconsistent. A subsequent futex_unlock_pi() operates on the inconsistent pi_state and releases the rtmutex despite not owning it which can corrupt the RB tree of the rtmutex and cause a subsequent kernel stack use after free. It was suggested to loop forever in fixup_pi_state_owner() if the fault cannot be resolved, but that results in runaway tasks which is especially undesired when the problem happens due to a programming error and not due to malice. As the user space value cannot be fixed up, the proper solution is to make the rtmutex and the pi_state consistent so both have the same owner. This leaves the user space value out of sync. Any subsequent operation on the futex will fail because the 10th rule of PI futexes (pi_state owner and user space value are consistent) has been violated. As a consequence this removes the inept attempts of 'fixing' the situation in case that the current task owns the rtmutex when returning with an unresolvable fault by unlocking the rtmutex which left pi_state::owner and rtmutex::owner out of sync in a different and only slightly less dangerous way. Fixes: 1b7558e457ed ("futexes: fix fault handling in futex_lock_pi") Reported-by: gzobqq@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26futex: Simplify fixup_pi_state_owner()Thomas Gleixner
Too many gotos already and an upcoming fix would make it even more unreadable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26futex: Use pi_state_update_owner() in put_pi_state()Thomas Gleixner
No point in open coding it. This way it gains the extra sanity checks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26rtmutex: Remove unused argument from rt_mutex_proxy_unlock()Thomas Gleixner
Nothing uses the argument. Remove it as preparation to use pi_state_update_owner(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26futex: Provide and use pi_state_update_owner()Thomas Gleixner
Updating pi_state::owner is done at several places with the same code. Provide a function for it and use that at the obvious places. This is also a preparation for a bug fix to avoid yet another copy of the same code or alternatively introducing a completely unpenetratable mess of gotos. Originally-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26futex: Replace pointless printk in fixup_owner()Thomas Gleixner
If that unexpected case of inconsistent arguments ever happens then the futex state is left completely inconsistent and the printk is not really helpful. Replace it with a warning and make the state consistent. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2021-01-26futex: Ensure the correct return value from futex_lock_pi()Thomas Gleixner
In case that futex_lock_pi() was aborted by a signal or a timeout and the task returned without acquiring the rtmutex, but is the designated owner of the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to establish consistent state. In that case it invokes fixup_pi_state_owner() which in turn tries to acquire the rtmutex again. If that succeeds then it does not propagate this success to fixup_owner() and futex_lock_pi() returns -EINTR or -ETIMEOUT despite having the futex locked. Return success from fixup_pi_state_owner() in all cases where the current task owns the rtmutex and therefore the futex and propagate it correctly through fixup_owner(). Fixup the other callsite which does not expect a positive return value. Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org