summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-12net: hns3: split out hclge_dbg_dump_qos_buf_cfg()Jian Shen
hclge_dbg_dump_qos_buf_cfg() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclgevf_get_rss_tuple()Jian Shen
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclgevf_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_get_rss_tuple()Jian Shen
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclge_get_rss_tuple(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_set_vf_vlan_common()Peng Li
To improve code readability and maintainability, separate the command handling part and the status parsing part from bloated hclge_set_vf_vlan_common(). Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: use ipv6_addr_any() helperJiaran Zhang
Use common ipv6_addr_any() to determine if an addr is ipv6 any addr. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: clean up hns3_dbg_cmd_write()Peng Li
As more commands are added, hns3_dbg_cmd_write() is going to get more bloated, so move the part about command check into a separate function. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclgevf_cmd_convert_err_code()Peng Li
To improve code readability and maintainability, refactor hclgevf_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: hns3: refactor out hclge_cmd_convert_err_code()Peng Li
To improve code readability and maintainability, refactor hclge_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12tools/resolve_btfids: Add /libbpf to .gitignoreStanislav Fomichev
This is what I see after compiling the kernel: # bpf-next...bpf-next/master ?? tools/bpf/resolve_btfids/libbpf/ Fixes: fc6b48f692f8 ("tools/resolve_btfids: Build libbpf and libsubcmd in separate directories") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212010053.668700-1-sdf@google.com
2021-02-12io-wq: clear out worker ->fs and ->filesJens Axboe
By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12Merge branch 'introduce bpf_iter for task_vma'Alexei Starovoitov
Song Liu says: ==================== This set introduces bpf_iter for task_vma, which can be used to generate information similar to /proc/pid/maps. Patch 4/4 adds an example that mimics /proc/pid/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). Changes v6 => v7: 1. Let BPF iter program use bpf_d_path without specifying sleepable. (Alexei) Changes v5 => v6: 1. Add more comments for task_vma_seq_get_next() to explain the logic of find_vma() calls. (Alexei) 2. Skip vma found by find_vma() when both vm_start and vm_end matches prev_vm_[start|end]. Previous versions only compares vm_start. IOW, if vma of [4k, 8k] is replaced by [4k, 12k] after relocking mmap_lock, v5 will skip the new vma, while v6 will process it. Changes v4 => v5: 1. Fix a refcount leak on task_struct. (Yonghong) 2. Fix the selftest. (Yonghong) Changes v3 => v4: 1. Avoid skipping vma by assigning invalid prev_vm_start in task_vma_seq_stop(). (Yonghong) 2. Move "again" label in task_vma_seq_get_next() save a check. (Yonghong) Changes v2 => v3: 1. Rewrite 1/4 so that we hold mmap_lock while calling BPF program. This enables the BPF program to access the real vma with BTF. (Alexei) 2. Fix the logic when the control is returned to user space. (Yonghong) 3. Revise commit log and cover letter. (Yonghong) Changes v1 => v2: 1. Small fixes in task_iter.c and the selftests. (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12selftests/bpf: Add test for bpf_iter_task_vmaSong Liu
The test dumps information similar to /proc/pid/maps. The first line of the output is compared against the /proc file to make sure they match. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-4-songliubraving@fb.com
2021-02-12bpf: Allow bpf_d_path in bpf_iter programSong Liu
task_file and task_vma iter programs have access to file->f_path. Enable bpf_d_path to print paths of these file. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
2021-02-12bpf: Introduce task_vma bpf_iterSong Liu
Introduce task_vma bpf_iter to print memory information of a process. It can be used to print customized information similar to /proc/<pid>/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). To access the vma safely in the BPF program, task_vma iterator holds target mmap_lock while calling the BPF program. If the mmap_lock is contended, task_vma unlocks mmap_lock between iterations to unblock the writer(s). This lock contention avoidance mechanism is similar to the one used in show_smaps_rollup(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com
2021-02-12jffs2: check the validity of dstlen in jffs2_zlib_compress()Yang Yang
KASAN reports a BUG when download file in jffs2 filesystem.It is because when dstlen == 1, cpage_out will write array out of bounds. Actually, data will not be compressed in jffs2_zlib_compress() if data's length less than 4. [ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281 [ 393.809166] Write of size 1 by task tftp/2918 [ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1 [ 393.823173] Hardware name: LS1043A RDB Board (DT) [ 393.827870] Call trace: [ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0 [ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20 [ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0 [ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80 [ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8 [ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40 [ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58 [ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0 [ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0 [ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478 [ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450 [ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0 [ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280 [ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228 [ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288 [ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238 [ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238 [ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110 [ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4 [ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64 [ 393.944197] Allocated: [ 393.946552] PID = 2918 [ 393.948913] save_stack_trace_tsk+0x0/0x220 [ 393.953096] save_stack_trace+0x18/0x20 [ 393.956932] kasan_kmalloc+0xd8/0x188 [ 393.960594] __kmalloc+0x144/0x238 [ 393.963994] jffs2_selected_compress+0x48/0x2a0 [ 393.968524] jffs2_compress+0x58/0x478 [ 393.972273] jffs2_write_inode_range+0x13c/0x450 [ 393.976889] jffs2_write_end+0x2a8/0x4a0 [ 393.980810] generic_perform_write+0x1c0/0x280 [ 393.985251] __generic_file_write_iter+0x1c4/0x228 [ 393.990040] generic_file_write_iter+0x138/0x288 [ 393.994655] __vfs_write+0x1b4/0x238 [ 393.998228] vfs_write+0xd0/0x238 [ 394.001543] SyS_write+0xa0/0x110 [ 394.004856] __sys_trace_return+0x0/0x4 [ 394.008684] Freed: [ 394.010691] PID = 2918 [ 394.013051] save_stack_trace_tsk+0x0/0x220 [ 394.017233] save_stack_trace+0x18/0x20 [ 394.021069] kasan_slab_free+0x88/0x188 [ 394.024902] kfree+0x6c/0x1d8 [ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880 [ 394.032486] jffs2_do_reserve_space+0x198/0x598 [ 394.037016] jffs2_reserve_space+0x3f8/0x4d8 [ 394.041286] jffs2_write_inode_range+0xf0/0x450 [ 394.045816] jffs2_write_end+0x2a8/0x4a0 [ 394.049737] generic_perform_write+0x1c0/0x280 [ 394.054179] __generic_file_write_iter+0x1c4/0x228 [ 394.058968] generic_file_write_iter+0x138/0x288 [ 394.063583] __vfs_write+0x1b4/0x238 [ 394.067157] vfs_write+0xd0/0x238 [ 394.070470] SyS_write+0xa0/0x110 [ 394.073783] __sys_trace_return+0x0/0x4 [ 394.077612] Memory state around the buggy address: [ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 394.104056] ^ [ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 394.121718] ================================================================== Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix off-by-one errorSascha Hauer
An inode is allowed to have ubifs_xattr_max_cnt() xattrs, so we must complain only when an inode has more xattrs, having exactly ubifs_xattr_max_cnt() xattrs is fine. With this the maximum number of xattrs can be created without hitting the "has too many xattrs" warning when removing it. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: replay: Fix high stack usage, againArnd Bergmann
An earlier commit moved out some functions to not be inlined by gcc, but after some other rework to remove one of those, clang started inlining the other one and ran into the same problem as gcc did before: fs/ubifs/replay.c:1174:5: error: stack frame size of 1152 bytes in function 'ubifs_replay_journal' [-Werror,-Wframe-larger-than=] Mark the function as noinline_for_stack to ensure it doesn't happen again. Fixes: f80df3851246 ("ubifs: use crypto_shash_tfm_digest()") Fixes: eb66eff6636d ("ubifs: replay: Fix high stack usage") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubifs: Fix memleak in ubifs_init_authenticationDinghao Liu
When crypto_shash_digestsize() fails, c->hmac_tfm has not been freed before returning, which leads to memleak. Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12jffs2: fix use after free in jffs2_sum_write_data()Tom Rix
clang static analysis reports this problem fs/jffs2/summary.c:794:31: warning: Use of memory after it is freed c->summary->sum_list_head = temp->u.next; ^~~~~~~~~~~~ In jffs2_sum_write_data(), in a loop summary data is handles a node at a time. When it has written out the node it is removed the summary list, and the node is deleted. In the corner case when a JFFS2_FEATURE_RWCOMPAT_COPY is seen, a call is made to jffs2_sum_disable_collecting(). jffs2_sum_disable_collecting() deletes the whole list which conflicts with the loop's deleting the list by parts. To preserve the old behavior of stopping the write midway, bail out of the loop after disabling summary collection. Fixes: 6171586a7ae5 ("[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubi: eba: Delete useless kfree codeZheng Yongjun
The parameter of kfree function is NULL, so kfree code is useless, delete it. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12ubi: remove dead code in validate_vid_hdr()Jubin Zhong
data_size is already checked against zero when vol_type matches UBI_VID_STATIC. Remove the following dead code. Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: irq.h: include <asm-generic/irq.h>Johannes Berg
This will get the (no-op) definition of irq_canonicalize() which some code might want. We could define that ourselves, but it seems like we'd likely want generic extensions in the future, if any. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: io.h: include <linux/types.h>Johannes Berg
This may be needed for size_t if something doesn't get it included elsewhere before including <asm/io.h>, so add the include. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: add a pseudo RTCJohannes Berg
Add a pseudo RTC that simply is able to send an alarm signal waking up the system at a given time in the future. Since apparently timerfd_create() FDs don't support SIGIO, we use the sigio-creating helper thread, which just learned to do suspend/resume properly in the previous patch. For time-travel mode, OTOH, just add an event at the specified time in the future, and that's already sufficient to wake up the system at that point in time since suspend will just be in an "endless wait". For s2idle support also call pm_system_wakeup(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: remove process stub VMAJohannes Berg
This mostly reverts the old commit 3963333fe676 ("uml: cover stubs with a VMA") which had added a VMA to the existing PTEs. However, there's no real reason to have the PTEs in the first place and the VMA cannot be 'fixed' in place, which leads to bugs that userspace could try to unmap them and be forcefully killed, or such. Also, there's a bit of an ugly hole in userspace's address space. Simplify all this: just install the stub code/page at the top of the (inner) address space, i.e. put it just above TASK_SIZE. The pages are simply hard-coded to be mapped in the userspace process we use to implement an mm context, and they're out of reach of the inner mmap/munmap/mprotect etc. since they're above TASK_SIZE. Getting rid of the VMA also makes vma_merge() no longer hit one of the VM_WARN_ON()s there because we installed a VMA while the code assumes the stack VMA is the first one. It also removes a lockdep warning about mmap_sem usage since we no longer have uml_setup_stubs() and thus no longer need to do any manipulation that would require mmap_sem in activate_mm(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: rework userspace stubs to not hard-code stub locationJohannes Berg
The userspace stacks mostly have a stack (and in the case of the syscall stub we can just set their stack pointer) that points to the location of the stub data page already. Rework the stubs to use the stack pointer to derive the start of the data page, rather than requiring it to be hard-coded. In the clone stub, also integrate the int3 into the stack remap, since we really must not use the stack while we remap it. This prepares for putting the stub at a variable location that's not part of the normal address space of the userspace processes running inside the UML machine. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: separate child and parent errors in clone stubJohannes Berg
If the two are mixed up, then it looks as though the parent returned an error if the child failed (before) the mmap(), and then the resulting process never gets killed. Fix this by splitting the child and parent errors, reporting and using them appropriately. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: defer killing userspace on page table update failuresJohannes Berg
In some cases we can get to fix_range_common() with mmap_sem held, and in others we get there without it being held. For example, we get there with it held from sys_mprotect(), and without it held from fork_handler(). Avoid any issues in this and simply defer killing the task until it runs the next time. Do it on the mm so that another task that shares the same mm can't continue running afterwards. Cc: stable@vger.kernel.org Fixes: 468f65976a8d ("um: Fix hung task in fix_range_common()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: mm: check more comprehensively for stub changesJohannes Berg
If userspace tries to change the stub, we need to kill it, because otherwise it can escape the virtual machine. In a few cases the stub checks weren't good, e.g. if userspace just tries to mmap(0x100000 - 0x1000, 0x3000, ...) it could succeed to get a new private/anonymous mapping replacing the stubs. Fix this by checking everywhere, and checking for _overlap_, not just direct changes. Cc: stable@vger.kernel.org Fixes: 3963333fe676 ("uml: cover stubs with a VMA") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: print register names in wait_for_stubJohannes Berg
Since we're basically debugging the userspace (it runs in ptrace) it's useful to dump out the registers - but they're not readable, so if something goes wrong it's hard to say what. Print the names of registers in the register dump so it's easier to look at. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: hostfs: use a kmem cache for inodesJohannes Berg
This collects all of them together and makes it possible to e.g. exclude it from slub debugging. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12mm: Remove arch_remap() and mm-arch-hooks.hChristophe Leroy
powerpc was the last provider of arch_remap() and the last user of mm-arch-hooks.h. Since commit 526a9c4a7234 ("powerpc/vdso: Provide vdso_remap()"), arch_remap() hence mm-arch-hooks.h are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: fix spelling mistake in Kconfig "privleges" -> "privileges"Colin Ian King
There is a spelling mistake in the Kconfig help text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: virtio: allow devices to be configured for wakeupJohannes Berg
With all the IRQ machinery being in place, we can allow virtio devices to additionally be configured as wakeup sources, in which case basically any interrupt from them wakes us up. Note that this requires a call FD because the VQs are all disabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: time-travel: rework interrupt handling in ext modeJohannes Berg
In external time-travel mode, where time is controlled via the controller application socket, interrupt handling is a little tricky. For example on virtio, the following happens: * we receive a message (that requires an ACK) on the vhost-user socket * we add a time-travel event to handle the interrupt (this causes communication on the time socket) * we ACK the original vhost-user message * we then handle the interrupt once the event is triggered This protocol ensures that the sender of the interrupt only continues to run in the simulation when the time-travel event has been added. So far, this was only done in the virtio driver, but it was actually wrong, because only virtqueue interrupts were handled this way, and config change interrupts were handled immediately. Additionally, the messages were actually handled in the real Linux interrupt handler, but Linux interrupt handlers are part of the simulation and shouldn't run while there's no time event. To really do this properly and only handle all kinds of interrupts in the time-travel event when we are scheduled to run in the simulation, rework this to plug in to the lower interrupt layers in UML directly: Add a um_request_irq_tt() function that let's a time-travel aware driver request an interrupt with an additional timetravel_handler() that is called outside of the context of the simulation, to handle the message only. It then adds an event to the time-travel calendar if necessary, and no "real" Linux code runs outside of the time simulation. This also hooks in with suspend/resume properly now, since this new timetravel_handler() can run while Linux is suspended and interrupts are disabled, and decide to wake up (or not) the system based on the message it received. Importantly in this case, it ACKs the message before the system even resumes and interrupts are re-enabled, thus allowing the simulation to progress properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: virtio: disable VQs during suspendJohannes Berg
If the system is suspended, the device shouldn't be able to send anything to it. Disable virtqueues in suspend to simulate this, and as we might be only using s2idle (kernel services are still on), prevent sending anything on them as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: virtio: fix handling of messages without payloadJohannes Berg
If we have a message without payload, we call full_read() with len set to 0, which causes it to return -ECONNRESET. Catch this case and explicitly return 0 for it so we can actually use the zero-size config-changed message. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12um: virtio: clean up a commentJohannes Berg
There's no 'simtime' device, because implementing that through virtio was just too much complexity. Clean up the comment that still refers to it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12bpf: selftests: Add non function pointer test to struct_opsMartin KaFai Lau
This patch adds a "void *owner" member. The existing bpf_tcp_ca test will ensure the bpf_cubic.o and bpf_dctcp.o can be loaded. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021037.267278-1-kafai@fb.com
2021-02-12libbpf: Ignore non function pointer member in struct_opsMartin KaFai Lau
When libbpf initializes the kernel's struct_ops in "bpf_map__init_kern_struct_ops()", it enforces all pointer types must be a function pointer and rejects others. It turns out to be too strict. For example, when directly using "struct tcp_congestion_ops" from vmlinux.h, it has a "struct module *owner" member and it is set to NULL in a bpf_tcp_cc.o. Instead, it only needs to ensure the member is a function pointer if it has been set (relocated) to a bpf-prog. This patch moves the "btf_is_func_proto(kern_mtype)" check after the existing "if (!prog) { continue; }". The original debug message in "if (!prog) { continue; }" is also removed since it is no longer valid. Beside, there is a later debug message to tell which function pointer is set. The "btf_is_func_proto(mtype)" has already been guaranteed in "bpf_object__collect_st_ops_relos()" which has been run before "bpf_map__init_kern_struct_ops()". Thus, this check is removed. v2: - Remove outdated debug message (Andrii) Remove because there is a later debug message to tell which function pointer is set. - Following mtype->type is no longer needed. Remove: "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)" - Do "if (!prog)" test before skip_mods_and_typedefs. Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
2021-02-12Merge tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Revert of a patch from this release that caused a regression" * tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block: Revert "io_uring: don't take fs for recvmsg/sendmsg"
2021-02-12libbpf: Use AF_LOCAL instead of AF_INET in xsk.cStanislav Fomichev
We have the environments where usage of AF_INET is prohibited (cgroup/sock_create returns EPERM for AF_INET). Let's use AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com
2021-02-12Merge tag 'drm-fixes-2021-02-12' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes for final, there is a ttm regression fix, dp-mst fix, one amdgpu revert, two i915 fixes, and some misc fixes for sun4i, xlnx, and vc4. All pretty quiet and don't think we have any known outstanding regressions. ttm: - page pool regression fix. dp_mst: - don't report un-attached ports as connected amdgpu: - blank screen fix i915: - ensure Type-C FIA is powered when initializing - fix overlay frontbuffer tracking sun4i: - tcon1 sync polarity fix - always set HDMI clock rate - fix H6 HDMI PHY config - fix H6 max frequency vc4: - fix buffer overflow xlnx: - fix memory leak" * tag 'drm-fixes-2021-02-12' of git://anongit.freedesktop.org/drm/drm: drm/ttm: make sure pool pages are cleared drm/sun4i: dw-hdmi: Fix max. frequency for H6 drm/sun4i: Fix H6 HDMI PHY configuration drm/sun4i: dw-hdmi: always set clock rate drm/sun4i: tcon: set sync polarity for tcon1 channel drm/i915: Fix overlay frontbuffer tracking Revert "drm/amd/display: Update NV1x SR latency values" drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it drm/dp_mst: Don't report ports connected if nothing is attached to them drm/xlnx: fix kmemleak by sending vblank_event in atomic_disable drm/vc4: hvs: Fix buffer overflow with the dlist handling
2021-02-12Merge tag 'trace-v5.11-rc7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix buffer overflow in trace event filter. It was reported that if an trace event was larger than a page and was filtered, that it caused memory corruption. The reason is that filtered events first go into a buffer to test the filter before being written into the ring buffer. Unfortunately, this write did not check the size" * tag 'trace-v5.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Check length before giving out the filter buffer
2021-02-12Merge tag 'for-linus-5.11-rc8-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for an issue introduced this development cycle: when running as a Xen guest on Arm systems the kernel will hang during boot" * tag 'for-linus-5.11-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: arm/xen: Don't probe xenbus as part of an early initcall
2021-02-12Merge tag 'riscv-for-linus-5.11-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Palmer Dabbelt: "A single fix this week: the removal of the GPIO reset method for the Ethernet phy on the HiFive Unleashed. This returns to relying on the bootloader's phy reset sequence, which we'll have to continue doing until we can sort out how to get the Linux phy driver to perform the special reset dance required for this phy" * tag 'riscv-for-linus-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Revert "dts: phy: add GPIO number and active state used for phy reset"
2021-02-12Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix PTRACE_PEEKMTETAGS access to an mmapped region before the first write" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
2021-02-12io_uring: optimise io_init_req() flags settingPavel Begunkov
Invalid req->flags are tolerated by free/put well, avoid this dancing needlessly presetting it to zero, and then not even resetting but modifying it, i.e. "|=". Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: clean io_req_find_next() fast checkPavel Begunkov
Indirectly io_req_find_next() is called for every request, optimise the check by testing flags as it was long before -- __io_req_find_next() tolerates false-positives well (i.e. link==NULL), and those should be really rare. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-12io_uring: don't check PF_EXITING from syscallPavel Begunkov
io_sq_thread_acquire_mm_files() can find a PF_EXITING task only when it's called from task_work context. Don't check it in all other cases, that are when we're in io_uring_enter(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>