summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-10libbpf: Add per-program log buffer setter and getterAndrii Nakryiko
Allow to set user-provided log buffer on a per-program basis ([0]). This gives great deal of flexibility in terms of which programs are loaded with logging enabled and where corresponding logs go. Log buffer set with bpf_program__set_log_buf() overrides kernel_log_buf and kernel_log_size settings set at bpf_object open time through bpf_object_open_opts, if any. Adjust bpf_object_load_prog_instance() logic to not perform own log buf allocation and load retry if custom log buffer is provided by the user. [0] Closes: https://github.com/libbpf/libbpf/issues/418 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-8-andrii@kernel.org
2021-12-10libbpf: Preserve kernel error code and remove kprobe prog type guessingAndrii Nakryiko
Instead of rewriting error code returned by the kernel of prog load with libbpf-sepcific variants pass through the original error. There is now also no need to have a backup generic -LIBBPF_ERRNO__LOAD fallback error as bpf_prog_load() guarantees that errno will be properly set no matter what. Also drop a completely outdated and pretty useless BPF_PROG_TYPE_KPROBE guess logic. It's not necessary and neither it's helpful in modern BPF applications. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-7-andrii@kernel.org
2021-12-10libbpf: Improve logging around BPF program loadingAndrii Nakryiko
Add missing "prog '%s': " prefixes in few places and use consistently markers for beginning and end of program load logs. Here's an example of log output: libbpf: prog 'handler': BPF program load failed: Permission denied libbpf: -- BEGIN PROG LOAD LOG --- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 ; out1 = in1; 0: (18) r1 = 0xffffc9000cdcc000 2: (61) r1 = *(u32 *)(r1 +0) ... 81: (63) *(u32 *)(r4 +0) = r5 R1_w=map_value(id=0,off=16,ks=4,vs=20,imm=0) R4=map_value(id=0,off=400,ks=4,vs=16,imm=0) invalid access to map value, value_size=16 off=400 size=4 R4 min value is outside of the allowed memory range processed 63 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: failed to load program 'handler' libbpf: failed to load object 'test_skeleton' The entire verifier log, including BEGIN and END markers are now always youtput during a single print callback call. This should make it much easier to post-process or parse it, if necessary. It's not an explicit API guarantee, but it can be reasonably expected to stay like that. Also __bpf_object__open is renamed to bpf_object_open() as it's always an adventure to find the exact function that implements bpf_object's open phase, so drop the double underscored and use internal libbpf naming convention. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-6-andrii@kernel.org
2021-12-10libbpf: Allow passing user log setting through bpf_object_open_optsAndrii Nakryiko
Allow users to provide their own custom log_buf, log_size, and log_level at bpf_object level through bpf_object_open_opts. This log_buf will be used during BTF loading. Subsequent patch will use same log_buf during BPF program loading, unless overriden at per-bpf_program level. When such custom log_buf is provided, libbpf won't be attempting retrying loading of BTF to try to provide its own log buffer to capture kernel's error log output. User is responsible to provide big enough buffer, otherwise they run a risk of getting -ENOSPC error from the bpf() syscall. See also comments in bpf_object_open_opts regarding log_level and log_buf interactions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-5-andrii@kernel.org
2021-12-10libbpf: Allow passing preallocated log_buf when loading BTF into kernelAndrii Nakryiko
Add libbpf-internal btf_load_into_kernel() that allows to pass preallocated log_buf and custom log_level to be passed into kernel during BPF_BTF_LOAD call. When custom log_buf is provided, btf_load_into_kernel() won't attempt an retry with automatically allocated internal temporary buffer to capture BTF validation log. It's important to note the relation between log_buf and log_level, which slightly deviates from stricter kernel logic. From kernel's POV, if log_buf is specified, log_level has to be > 0, and vice versa. While kernel has good reasons to request such "sanity, this, in practice, is a bit unconvenient and restrictive for libbpf's high-level bpf_object APIs. So libbpf will allow to set non-NULL log_buf and log_level == 0. This is fine and means to attempt to load BTF without logging requested, but if it failes, retry the load with custom log_buf and log_level 1. Similar logic will be implemented for program loading. In practice this means that users can provide custom log buffer just in case error happens, but not really request slower verbose logging all the time. This is also consistent with libbpf behavior when custom log_buf is not set: libbpf first tries to load everything with log_level=0, and only if error happens allocates internal log buffer and retries with log_level=1. Also, while at it, make BTF validation log more obvious and follow the log pattern libbpf is using for dumping BPF verifier log during BPF_PROG_LOAD. BTF loading resulting in an error will look like this: libbpf: BTF loading error: -22 libbpf: -- BEGIN BTF LOAD LOG --- magic: 0xeb9f version: 1 flags: 0x0 hdr_len: 24 type_off: 0 type_len: 1040 str_off: 1040 str_len: 2063598257 btf_total_size: 1753 Total section length too long -- END BTF LOAD LOG -- libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring. This makes it much easier to find relevant parts in libbpf log output. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-4-andrii@kernel.org
2021-12-10libbpf: Add OPTS-based bpf_btf_load() APIAndrii Nakryiko
Similar to previous bpf_prog_load() and bpf_map_create() APIs, add bpf_btf_load() API which is taking optional OPTS struct. Schedule bpf_load_btf() for deprecation in v0.8 ([0]). This makes naming consistent with BPF_BTF_LOAD command, sets up an API for extensibility in the future, moves options parameters (log-related fields) into optional options, and also allows to pass log_level directly. It also removes log buffer auto-allocation logic from low-level API (consistent with bpf_prog_load() behavior), but preserves a special treatment of log_level == 0 with non-NULL log_buf, which matches low-level bpf_prog_load() and high-level libbpf APIs for BTF and program loading behaviors. [0] Closes: https://github.com/libbpf/libbpf/issues/419 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-3-andrii@kernel.org
2021-12-10libbpf: Fix bpf_prog_load() log_buf logic for log_level 0Andrii Nakryiko
To unify libbpf APIs behavior w.r.t. log_buf and log_level, fix bpf_prog_load() to follow the same logic as bpf_btf_load() and high-level bpf_object__load() API will follow in the subsequent patches: - if log_level is 0 and non-NULL log_buf is provided by a user, attempt load operation initially with no log_buf and log_level set; - if successful, we are done, return new FD; - on error, retry the load operation with log_level bumped to 1 and log_buf set; this way verbose logging will be requested only when we are sure that there is a failure, but will be fast in the common/expected success case. Of course, user can still specify log_level > 0 from the very beginning to force log collection. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-2-andrii@kernel.org
2021-12-10Merge tag 'acpi-5.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Create the output directory for the ACPI tools during build if it has not been present before and prevent the compilation from failing in that case (Chen Yu)" * tag 'acpi-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: tools: Fix compilation when output directory is not present
2021-12-10Merge tag 'pm-5.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a kernedoc comment that doesn't match the behavior of the function documented by it" * tag 'pm-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: runtime: Fix pm_runtime_active() kerneldoc comment
2021-12-10Merge tag 'hwmon-for-v5.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - In the pwm-fan driver, ensure that the internal pwm state matches the state assumed by the pwm code. - Avoid EREMOTEIO errors in sht4 driver - In the nct6775 driver, make it explicit that the register value passed to nct6775_asuswmi_read() is an 8-bit value - Avoid WARNing in dell-smm driver removal after failing to create /proc/i8k - Stop using a plain integer as NULL pointer in corsair-psu driver * tag 'hwmon-for-v5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (pwm-fan) Ensure the fan going on in .probe() hwmon: (sht4x) Fix EREMOTEIO errors hwmon: (nct6775) mask out bank number in nct6775_wmi_read_value() hwmon: (dell-smm) Fix warning on /proc/i8k creation error hwmon: (corsair-psu) fix plain integer used as NULL pointer
2021-12-10Merge tag 'trace-v5.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Tracing, ftrace and tracefs fixes: - Have tracefs honor the gid mount option - Have new files in tracefs inherit the parent ownership - Have direct_ops unregister when it has no more functions - Properly clean up the ops when unregistering multi direct ops - Add a sample module to test the multiple direct ops - Fix memory leak in error path of __create_synth_event()" * tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix possible memory leak in __create_synth_event() error path ftrace/samples: Add module to test multi direct modify interface ftrace: Add cleanup to unregister_ftrace_direct_multi ftrace: Use direct_ops hash in unregister_ftrace_direct tracefs: Set all files to the same group ownership as the mount option tracefs: Have new files inherit the ownership of their parent
2021-12-10Merge tag 'aio-poll-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull aio poll fixes from Eric Biggers: "Fix three bugs in aio poll, and one issue with POLLFREE more broadly: - aio poll didn't handle POLLFREE, causing a use-after-free. - aio poll could block while the file is ready. - aio poll called eventfd_signal() when it isn't allowed. - POLLFREE didn't handle multiple exclusive waiters correctly. This has been tested with the libaio test suite, as well as with test programs I wrote that reproduce the first two bugs. I am sending this pull request myself as no one seems to be maintaining this code" * tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: aio: Fix incorrect usage of eventfd_signal_allowed() aio: fix use-after-free due to missing POLLFREE handling aio: keep poll requests on waitqueue until completed signalfd: use wake_up_pollfree() binder: use wake_up_pollfree() wait: add wake_up_pollfree()
2021-12-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "More x86 fixes: - Logic bugs in CR0 writes and Hyper-V hypercalls - Don't use Enlightened MSR Bitmap for L3 - Remove user-triggerable WARN Plus a few selftest fixes and a regression test for the user-triggerable WARN" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/O KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode selftests: KVM: avoid failures due to reserved HyperTransport region KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall KVM: x86: selftests: svm_int_ctl_test: fix intercept calculation KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
2021-12-10i2c: mpc: Use atomic read and fix break conditionChris Packham
Maxime points out that the polling code in mpc_i2c_isr should use the _atomic API because it is called in an irq context and that the behaviour of the MCF bit is that it is 1 when the byte transfer is complete. All of this means the original code was effectively a udelay(100). Fix this by using readb_poll_timeout_atomic() and removing the negation of the break condition. Fixes: 4a8ac5e45cda ("i2c: mpc: Poll for MCF") Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-12-10io-wq: check for wq exit after adding new worker task_workJens Axboe
We check IO_WQ_BIT_EXIT before attempting to create a new worker, and wq exit cancels pending work if we have any. But it's possible to have a race between the two, where creation checks exit finding it not set, but we're in the process of exiting. The exit side will cancel pending creation task_work, but there's a gap where we add task_work after we've canceled existing creations at exit time. Fix this by checking the EXIT bit post adding the creation task_work. If it's set, run the same cancelation that exit does. Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10io_uring: ensure task_work gets run as part of cancelationsJens Axboe
If we successfully cancel a work item but that work item needs to be processed through task_work, then we can be sleeping uninterruptibly in io_uring_cancel_generic() and never process it. Hence we don't make forward progress and we end up with an uninterruptible sleep warning. While in there, correct a comment that should be IFF, not IIF. Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10Merge tag 'pci-v5.16-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Revert emulation of Marvell Armada A3720 expansion ROM because it doesn't work as expected (Marek Behún) - Assert PERST# in Apple M1 driver to fix initialization when booting from bootloaders using PCIe, such as U-Boot (Marc Zyngier) - Describe PERST# as active low in Apple T8103 DT and update driver to match (Marc Zyngier) * tag 'pci-v5.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: apple: Fix PERST# polarity arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT PCI: apple: Follow the PCIe specifications when resetting the port Revert "PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge"
2021-12-10Merge tag 'mmc-v5.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - mtk-sd: Fix memory leak during tuning - renesas_sdhi: Initialize variable properly when tuning * tag 'mmc-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mediatek: free the ext_csd when mmc_get_ext_csd success mmc: renesas_sdhi: initialize variable properly when tuning
2021-12-10Merge tag 'libata-5.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull libata fixes from Damien Le Moal: - Fix a sparse warning in the ahci_ceva driver (me) - Disable the ASMedia 1092 non-functional device (Hannes) * tag 'libata-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: libata: add horkage for ASMedia 1092 ata: ahci_ceva: Fix id array access in ceva_ahci_read_id()
2021-12-10Merge tag 'sound-5.16-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Another collection of small fixes. It's still not quite calm yet, but nothing looks scary. ALSA core got a few fixes for covering the issues detected by fuzzer and the 32bit compat problem of control API, while the rest are all device-specific small fixes, including the continued fixes for Tegra" * tag 'sound-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits) ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform ALSA: usb-audio: Reorder snd_djm_devices[] entries ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1 ALSA: ctl: Fix copy of updated id with element read/write ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() ALSA: pcm: oss: Limit the period size to 16MB ALSA: pcm: oss: Fix negative period/buffer sizes ASoC: codecs: wsa881x: fix return values from kcontrol put ASoC: codecs: wcd934x: return correct value from mixer put ASoC: codecs: wcd934x: handle channel mappping list correctly ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer ASoC: SOF: Intel: Retry codec probing if it fails ASoC: amd: fix uninitialized variable in snd_acp6x_probe() ASoC: rockchip: i2s_tdm: Dup static DAI template ASoC: rt5682s: Fix crash due to out of scope stack vars ASoC: rt5682: Fix crash due to out of scope stack vars ASoC: tegra: Use normal system sleep for ADX ASoC: tegra: Use normal system sleep for AMX ASoC: tegra: Use normal system sleep for Mixer ASoC: tegra: Use normal system sleep for MVC ...
2021-12-10Merge tag 'drm-fixes-2021-12-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes, pretty small overall, couple of core fixes, two i915 and two amdgpu, hopefully it stays this quiet. ttm: - fix ttm_bo_swapout syncobj: - fix fence find bug with signalled fences i915: - fix error pointer deref in gem execbuffer - fix for GT init with GuC/HuC on ICL amdgpu: - DPIA fix - eDP fix" * tag 'drm-fixes-2021-12-10' of git://anongit.freedesktop.org/drm/drm: drm/i915/gen11: Moving WAs to icl_gt_workarounds_init() drm/amd/display: prevent reading unitialized links drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset drm/i915: Fix error pointer dereference in i915_gem_do_execbuffer() drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence. drm/ttm: fix ttm_bo_swapout
2021-12-10selftests: mptcp: remove duplicate include in mptcp_inq.cYe Guojin
'sys/ioctl.h' included in 'mptcp_inq.c' is duplicated. Reported-by: ZealRobot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20211210071424.425773-1-ye.guojin@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10Revert "mtd_blkdevs: don't scan partitions for plain mtdblock"Jens Axboe
This reverts commit 776b54e97a7d993ba23696e032426d5dea5bbe70. Looks like a last minute edit snuck into this patch, and as a result, it doesn't even compile. Revert the change for now. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)Davidlohr Bueso
do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.16 Pull MD fixes from Song. * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix double free of mddev->private in autorun_array() md: fix update super 1.0 on rdev size change
2021-12-10selftests/bpf: Tests for state pruning with u32 spill/fillPaul Chaignon
This patch adds tests for the verifier's tracking for spilled, <8B registers. The first two test cases ensure the verifier doesn't incorrectly prune states in case of <8B spill/fills. The last one simply checks that a filled u64 register is marked unknown if the register spilled in the same slack slot was less than 8B. The map value access at the end of the first program is only incorrect for the path R6=32. If the precision bit for register R8 isn't backtracked through the u32 spill/fill, the R6=32 path is pruned at instruction 9 and the program is incorrectly accepted. The second program is a variation of the same with u32 spills and a u64 fill. The additional instructions to introduce the first pruning point may be a bit fragile as they depend on the heuristics for pruning points in the verifier (currently at least 8 instructions and 2 jumps). If the heuristics are changed, the pruning point may move (e.g., to the subsequent jump) or disappear, which would cause the test to always pass. Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-10bpf: Fix incorrect state pruning for <8B spill/fillPaul Chaignon
Commit 354e8f1970f8 ("bpf: Support <8-byte scalar spill and refill") introduced support in the verifier to track <8B spill/fills of scalars. The backtracking logic for the precision bit was however skipping spill/fills of less than 8B. That could cause state pruning to consider two states equivalent when they shouldn't be. As an example, consider the following bytecode snippet: 0: r7 = r1 1: call bpf_get_prandom_u32 2: r6 = 2 3: if r0 == 0 goto pc+1 4: r6 = 3 ... 8: [state pruning point] ... /* u32 spill/fill */ 10: *(u32 *)(r10 - 8) = r6 11: r8 = *(u32 *)(r10 - 8) 12: r0 = 0 13: if r8 == 3 goto pc+1 14: r0 = 1 15: exit The verifier first walks the path with R6=3. Given the support for <8B spill/fills, at instruction 13, it knows the condition is true and skips instruction 14. At that point, the backtracking logic kicks in but stops at the fill instruction since it only propagates the precision bit for 8B spill/fill. When the verifier then walks the path with R6=2, it will consider it safe at instruction 8 because R6 is not marked as needing precision. Instruction 14 is thus never walked and is then incorrectly removed as 'dead code'. It's also possible to lead the verifier to accept e.g. an out-of-bound memory access instead of causing an incorrect dead code elimination. This regression was found via Cilium's bpf-next CI where it was causing a conntrack map update to be silently skipped because the code had been removed by the verifier. This commit fixes it by enabling support for <8B spill/fills in the bactracking logic. In case of a <8B spill/fill, the full 8B stack slot will be marked as needing precision. Then, in __mark_chain_precision, any tracked register spilled in a marked slot will itself be marked as needing precision, regardless of the spill size. This logic makes two assumptions: (1) only 8B-aligned spill/fill are tracked and (2) spilled registers are only tracked if the spill and fill sizes are equal. Commit ef979017b837 ("bpf: selftest: Add verifier tests for <8-byte scalar spill and refill") covers the first assumption and the next commit in this patchset covers the second. Fixes: 354e8f1970f8 ("bpf: Support <8-byte scalar spill and refill") Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-10md: fix double free of mddev->private in autorun_array()zhangyue
In driver/md/md.c, if the function autorun_array() is called, the problem of double free may occur. In function autorun_array(), when the function do_md_run() returns an error, the function do_md_stop() will be called. The function do_md_run() called function md_run(), but in function md_run(), the pointer mddev->private may be freed. The function do_md_stop() called the function __md_stop(), but in function __md_stop(), the pointer mddev->private also will be freed without judging null. At this time, the pointer mddev->private will be double free, so it needs to be judged null or not. Signed-off-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-12-10md: fix update super 1.0 on rdev size changeMarkus Hochholdinger
The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-12-10nfsd: fix use-after-free due to delegation raceJ. Bruce Fields
A delegation break could arrive as soon as we've called vfs_setlease. A delegation break runs a callback which immediately (in nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we then exit nfs4_set_delegation without hashing the delegation, it will be freed as soon as the callback is done with it, without ever being removed from del_recall_lru. Symptoms show up later as use-after-free or list corruption warnings, usually in the laundromat thread. I suspect aba2072f4523 "nfsd: grant read delegations to clients holding writes" made this bug easier to hit, but I looked as far back as v3.0 and it looks to me it already had the same problem. So I'm not sure where the bug was introduced; it may have been there from the beginning. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-12-10nfsd: Fix nsfd startup race (again)Alexander Sverdlin
Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-12-10clocksource/drivers/arm_arch_timer: Force inlining of ↵Marc Zyngier
erratum_set_next_event_generic() With some specific kernel configuration and Clang, the kernel fails to like with something like: ld.lld: error: undefined symbol: __compiletime_assert_200 >>> referenced by arch_timer.h:156 (./arch/arm64/include/asm/arch_timer.h:156) >>> clocksource/arm_arch_timer.o:(erratum_set_next_event_generic) in archive drivers/built-in.a ld.lld: error: undefined symbol: __compiletime_assert_197 >>> referenced by arch_timer.h:133 (./arch/arm64/include/asm/arch_timer.h:133) >>> clocksource/arm_arch_timer.o:(erratum_set_next_event_generic) in archive drivers/built-in.a make: *** [Makefile:1161: vmlinux] Error 1 These are due to the BUILD_BUG() macros contained in the low-level accessors (arch_timer_reg_{write,read}_cp15) being emitted, as the access type wasn't known at compile time. Fix this by making erratum_set_next_event_generic() __force_inline, resulting in the 'access' parameter to be resolved at compile time, similarly to what is already done for set_next_event(). Fixes: 4775bc63f880 ("Add build-time guards for unhandled register accesses") Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20211117113532.3895208-1-maz@kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2021-12-10clocksource/drivers/dw_apb_timer_of: Fix probe failureAlexey Sheplyakov
The driver refuses to probe with -EINVAL since the commit 5d9814df0aec ("clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available"). Before the driver used to probe successfully if either "clock-freq" or "clock-frequency" properties has been specified in the device tree. That commit changed if (A && B) panic("No clock nor clock-frequency property"); into if (!A && !B) return 0; That's a bug: the reverse of `A && B` is '!A || !B', not '!A && !B' Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru> Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru> Fixes: 5d9814df0aec56a6 ("clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available"). Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20211109153401.157491-1-asheplyakov@basealt.ru Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2021-12-10xfrm: add net device refcount tracker to struct xfrm_state_offloadEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/20211209154451.4184050-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10sch_cake: do not call cake_destroy() from cake_init()Eric Dumazet
qdiscs are not supposed to call their own destroy() method from init(), because core stack already does that. syzbot was able to trigger use after free: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline] WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740 Modules linked in: CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline] RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740 Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8 RSP: 0018:ffffc9000627f290 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44 RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000 FS: 0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0 Call Trace: <TASK> tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810 tcf_block_put_ext net/sched/cls_api.c:1381 [inline] tcf_block_put_ext net/sched/cls_api.c:1376 [inline] tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394 cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695 qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f1bb06badb9 Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f. RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9 RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003 R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688 R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2 </TASK> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20211210142046.698336-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10s390: enable switchdev support in defconfigNiklas Schnelle
The HiperSockets Converged Interface (HSCI) introduced with commit 4e20e73e631a ("s390/qeth: Switchdev event handler") requires CONFIG_SWITCHDEV=y to be usable. Similarly when using Linux controlled SR-IOV capable PF devices with the mlx5_core driver CONFIG_SWITCHDEV=y as well as CONFIG_MLX5_ESWITCH=y are necessary to actually get link on the created VFs. So let's add these to the defconfig to make both types of devices usable. Note also that these options are already enabled in most current distribution kernels. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add()Alexander Egorenkov
Starting with gcc 11.3, the C compiler will generate PLT-relative function calls even if they are local and do not require it. Later on during linking, the linker will replace all PLT-relative calls to local functions with PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is not being linked as a regular executable or shared library would have been, and therefore, all PLT-relative addresses remain in the generated purgatory object code unresolved. This leads to the situation where the purgatory code is being executed during kdump with all PLT-relative addresses unresolved. And this results in endless loops within the purgatory code. Furthermore, the clang C compiler has always behaved like described above and this commit should fix kdump for kernels built with the latter. Because the purgatory code is no regular executable or shared library, contains only calls to local functions and has no PLT, all R_390_PLT32DBL relocation entries can be resolved just like a R_390_PC32DBL one. * https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html#AEN1699 Relocation entries of purgatory code generated with gcc 11.3 ------------------------------------------------------------ $ readelf -r linux/arch/s390/purgatory/purgatory.o Relocation section '.rela.text' at offset 0x370 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000005c 000c00000013 R_390_PC32DBL 0000000000000000 purgatory_sha_regions + 2 00000000007a 000d00000014 R_390_PLT32DBL 0000000000000000 sha256_update + 2 00000000008c 000e00000014 R_390_PLT32DBL 0000000000000000 sha256_final + 2 000000000092 000800000013 R_390_PC32DBL 0000000000000000 .LC0 + 2 0000000000a0 000f00000014 R_390_PLT32DBL 0000000000000000 memcmp + 2 Relocation entries of purgatory code generated with gcc 11.2 ------------------------------------------------------------ $ readelf -r linux/arch/s390/purgatory/purgatory.o Relocation section '.rela.text' at offset 0x368 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000005c 000c00000013 R_390_PC32DBL 0000000000000000 purgatory_sha_regions + 2 00000000007a 000d00000013 R_390_PC32DBL 0000000000000000 sha256_update + 2 00000000008c 000e00000013 R_390_PC32DBL 0000000000000000 sha256_final + 2 000000000092 000800000013 R_390_PC32DBL 0000000000000000 .LC0 + 2 0000000000a0 000f00000013 R_390_PC32DBL 0000000000000000 memcmp + 2 Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reported-by: Tao Liu <ltao@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211209073817.82196-1-egorenar@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10s390/ftrace: remove preempt_disable()/preempt_enable() pairJerome Marchand
It looks like commit ce5e48036c9e76a2 ("ftrace: disable preemption when recursion locked") missed a spot in kprobe_ftrace_handler() in arch/s390/kernel/ftrace.c. Remove the superfluous preempt_disable/enable_notrace() there too. Fixes: ce5e48036c9e76a2 ("ftrace: disable preemption when recursion locked") Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Link: https://lore.kernel.org/r/20211208151503.1510381-1-jmarchan@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10s390/kexec_file: fix error handling when applying relocationsPhilipp Rudo
arch_kexec_apply_relocations_add currently ignores all errors returned by arch_kexec_do_relocs. This means that every unknown relocation is silently skipped causing unpredictable behavior while the relocated code runs. Fix this by checking for errors and fail kexec_file_load if an unknown relocation type is encountered. The problem was found after gcc changed its behavior and used R_390_PLT32DBL relocations for brasl instruction and relied on ld to resolve the relocations in the final link in case direct calls are possible. As the purgatory code is only linked partially (option -r) ld didn't resolve the relocations leaving them for arch_kexec_do_relocs. But arch_kexec_do_relocs doesn't know how to handle R_390_PLT32DBL relocations so they were silently skipped. This ultimately caused an endless loop in the purgatory as the brasl instructions kept branching to itself. Fixes: 71406883fd35 ("s390/kexec_file: Add kexec_file_load system call") Reported-by: Tao Liu <ltao@redhat.com> Signed-off-by: Philipp Rudo <prudo@redhat.com> Link: https://lore.kernel.org/r/20211208130741.5821-3-prudo@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10s390/kexec_file: print some more error messagesPhilipp Rudo
Be kind and give some more information on what went wrong. Signed-off-by: Philipp Rudo <prudo@redhat.com> Link: https://lore.kernel.org/r/20211208130741.5821-2-prudo@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-10Merge branch 'net-netns-refcount-tracking-base-series'Jakub Kicinski
Eric Dumazet says: ==================== net: netns refcount tracking, base series We have 100+ syzbot reports about netns being dismantled too soon, still unresolved as of today. We think a missing get_net() or an extra put_net() is the root cause. In order to find the bug(s), and be able to spot future ones, this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers to precisely pair all put_net() with corresponding get_net(). To use these helpers, each data structure owning a refcount should also use a "netns_tracker" to pair the get() and put(). Small sections of codes where the get()/put() are in sight do not need to have a tracker, because they are short lived, but in theory it is also possible to declare an on-stack tracker. v2: Include core networking patches only. ==================== Link: https://lore.kernel.org/r/20211210074426.279563-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10ppp: add netns refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10l2tp: add netns refcount tracker to l2tp_dfs_seq_dataEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: sched: add netns refcount tracker to struct tcf_extsEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: add netns refcount tracker to struct seq_net_privateEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: add netns refcount tracker to struct sockEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: add networking namespace refcount trackerEric Dumazet
We have 100+ syzbot reports about netns being dismantled too soon, still unresolved as of today. We think a missing get_net() or an extra put_net() is the root cause. In order to find the bug(s), and be able to spot future ones, this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers to precisely pair all put_net() with corresponding get_net(). To use these helpers, each data structure owning a refcount should also use a "netns_tracker" to pair the get and put. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10selftests: KVM: Add test to verify KVM doesn't explode on "bad" I/OSean Christopherson
Add an x86 selftest to verify that KVM doesn't WARN or otherwise explode if userspace modifies RCX during a userspace exit to handle string I/O. This is a regression test for a user-triggerable WARN introduced by commit 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211025201311.1881846-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exitSean Christopherson
Replace a WARN with a comment to call out that userspace can modify RCX during an exit to userspace to handle string I/O. KVM doesn't actually support changing the rep count during an exit, i.e. the scenario can be ignored, but the WARN needs to go as it's trivial to trigger from userspace. Cc: stable@vger.kernel.org Fixes: 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211025201311.1881846-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-10KVM: X86: Raise #GP when clearing CR0_PG in 64 bit modeLai Jiangshan
In the SDM: If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an attempt to clear CR0.PG causes a general-protection exception (#GP). Software should transition to compatibility mode and clear CR4.PCIDE before attempting to disable paging. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>