linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-16	random: order timer entropy functions below interrupt functions	Jason A. Donenfeld
	There are no code changes here; this is just a reordering of functions, so that in subsequent commits, the timer entropy functions can call into the interrupt ones. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-15	irqchip/gic-v3: Fix priority mask handling	Mark Rutland
	When a kernel is built with CONFIG_ARM64_PSEUDO_NMI=y and pseudo-NMIs are enabled at runtime, GICv3's gic_handle_irq() can leave DAIF and ICC_PMR_EL1 in an unexpected state in some cases, breaking subsequent usage of local_irq_enable() and resulting in softirqs being run with IRQs erroneously masked (possibly resulting in deadlocks). This can happen when an IRQ exception is taken from a context where regular IRQs were unmasked, and either: (1) ICC_IAR1_EL1 indicates a special INTID (e.g. as a result of an IRQ being withdrawn since the IRQ exception was taken). (2) ICC_IAR1_EL1 and ICC_RPR_EL1 indicate an NMI was acknowledged. When an NMI is taken from a context where regular IRQs were masked, there is no problem. When CONFIG_ARM64_DEBUG_PRIORITY_MASKING=y, this can be detected with perf, e.g. \| # ./perf record -a -g -e cycles:k ls -alR / > /dev/null 2>&1 \| ------------[ cut here ]------------ \| WARNING: CPU: 0 PID: 14 at arch/arm64/include/asm/irqflags.h:32 arch_local_irq_enable+0x4c/0x6c \| Modules linked in: \| CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 5.18.0-rc5-00004-g876c38e3d20b #12 \| Hardware name: linux,dummy-virt (DT) \| pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : arch_local_irq_enable+0x4c/0x6c \| lr : __do_softirq+0x110/0x5d8 \| sp : ffff8000080bbbc0 \| pmr_save: 000000f0 \| x29: ffff8000080bbbc0 x28: ffff316ac3a6ca40 x27: 0000000000000000 \| x26: 0000000000000000 x25: ffffa04611c06008 x24: ffffa04611c06008 \| x23: 0000000040400005 x22: 0000000000000200 x21: ffff8000080bbe20 \| x20: ffffa0460fe10320 x19: 0000000000000009 x18: 0000000000000000 \| x17: ffff91252dfa9000 x16: ffff800008004000 x15: 0000000000004000 \| x14: 0000000000000028 x13: ffffa0460fe17578 x12: ffffa0460fed4294 \| x11: ffffa0460fedc168 x10: ffffffffffffff80 x9 : ffffa0460fe10a70 \| x8 : ffffa0460fedc168 x7 : 000000000000b762 x6 : 00000000057c3bdf \| x5 : ffff8000080bbb18 x4 : 0000000000000000 x3 : 0000000000000001 \| x2 : ffff91252dfa9000 x1 : 0000000000000060 x0 : 00000000000000f0 \| Call trace: \| arch_local_irq_enable+0x4c/0x6c \| __irq_exit_rcu+0x180/0x1ac \| irq_exit_rcu+0x1c/0x44 \| el1_interrupt+0x4c/0xe4 \| el1h_64_irq_handler+0x18/0x24 \| el1h_64_irq+0x74/0x78 \| smpboot_thread_fn+0x68/0x2c0 \| kthread+0x124/0x130 \| ret_from_fork+0x10/0x20 \| irq event stamp: 193241 \| hardirqs last enabled at (193240): [<ffffa0460fe10a9c>] __do_softirq+0x10c/0x5d8 \| hardirqs last disabled at (193241): [<ffffa0461102ffe4>] el1_dbg+0x24/0x90 \| softirqs last enabled at (193234): [<ffffa0460fe10e00>] __do_softirq+0x470/0x5d8 \| softirqs last disabled at (193239): [<ffffa0460fea9944>] __irq_exit_rcu+0x180/0x1ac \| ---[ end trace 0000000000000000 ]--- The necessary manipulation of DAIF and ICC_PMR_EL1 depends on the interrupted context, but the structure of gic_handle_irq() makes this also depend on whether the GIC reports an IRQ, NMI, or special INTID: * When the interrupted context had regular IRQs masked (and hence the interrupt must be an NMI), the entry code performs the NMI entry/exit and gic_handle_irq() should return with DAIF and ICC_PMR_EL1 unchanged. This is handled correctly today. * When the interrupted context had regular IRQs unmasked, the entry code performs IRQ entry/exit, but expects gic_handle_irq() to always update ICC_PMR_EL1 and DAIF.IF to unmask NMIs (but not regular IRQs) prior to returning (which it must do prior to invoking any regular IRQ handler). This unbalanced calling convention is necessary because we don't know whether an NMI has been taken until acknowledged by a read from ICC_IAR1_EL1, and so we need to perform the read with NMI masked in case an NMI has been taken (and needs to be handled with NMIs masked). Unfortunately, this is not handled consistently: - When ICC_IAR1_EL1 reports a special INTID, gic_handle_irq() returns immediately without manipulating ICC_PMR_EL1 and DAIF. - When RPR_EL1 indicates an NMI, gic_handle_irq() calls gic_handle_nmi() to invoke the NMI handler, then returns without manipulating ICC_PMR_EL1 and DAIF. - For regular IRQs, gic_handle_irq() manipulates ICC_PMR_EL1 and DAIF prior to invoking the IRQ handler. There were related problems with special INTID handling in the past, where if an exception was taken from a context with regular IRQs masked and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously unmask NMIs in NMI context permitted an unexpected nested NMI. That case specifically was fixed by commit: a97709f563a078e2 ("irqchip/gic-v3: Do not enable irqs when handling spurious interrups") ... but unfortunately that commit added an inverse problem, where if an exception was taken from a context with regular IRQs unmasked and ICC_IAR_EL1 reported a special INTID, gic_handle_irq() would erroneously fail to unmask NMIs (and consequently regular IRQs could not be unmasked during softirq processing). Before and after that commit, if an NMI was taken from a context with regular IRQs unmasked gic_handle_irq() would not unmask NMIs prior to returning, leading to the same problem with softirq handling. This patch fixes this by restructuring gic_handle_irq(), splitting it into separate irqson/irqsoff helper functions which consistently perform the DAIF + ICC_PMR1_EL1 manipulation based upon the interrupted context, regardless of the event indicated by ICC_IAR1_EL1. The special INTID handling is moved into the low-level IRQ/NMI handler invocation helper functions, so that early returns don't prevent the required manipulation of DAIF + ICC_PMR_EL1. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-4-mark.rutland@arm.com
2022-05-15	irqchip/gic-v3: Refactor ISB + EOIR at ack time	Mark Rutland
	There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. To fix this, we place an ISB between a read of IAR and the subsequent invocation of an IRQ handler. When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking its handler, and we have a write to EOIR for this. As this write to EOIR requires an ISB, and this is provided by the gic_write_eoir() helper, we omit the usual ISB in this case, with the logic being: \| if (static_branch_likely(&supports_deactivate_key)) \| gic_write_eoir(irqnr); \| else \| isb(); This is somewhat opaque, and it would be a little clearer if there were an unconditional ISB, with only the write to EOIR being conditional, e.g. \| if (static_branch_likely(&supports_deactivate_key)) \| write_gicreg(irqnr, ICC_EOIR1_EL1); \| \| isb(); This patch rewrites the code that way, with this logic factored into a new helper function with comments explaining what the ISB is for, as were originally laid out in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Note that since then, we removed the IAR polling in commit: 342677d70ab92142 ("irqchip/gic-v3: Remove acknowledge loop") ... which removed one of the two race conditions. For consistency, other portions of the driver are made to manipulate EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir() helper function is removed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com
2022-05-15	irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling	Mark Rutland
	There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. We identified and fixes this for regular IRQs in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Unfortunately, we forgot to do the same for psuedo-NMIs when support for those was added in commit: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Which means that when pseudo-NMIs are used for PMU support, we'll hit the same problem. Apply the same fix as for regular IRQs. Note that when EOI mode 1 is in use, the call to gic_write_eoir() will provide an ISB. Fixes: f32c926651dcd168 ("irqchip/gic-v3: Handle pseudo-NMIs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-2-mark.rutland@arm.com
2022-05-15	Merge tag 'driver-core-5.18-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here is one fix, and three documentation updates for 5.18-rc7. The fix is for the firmware loader which resolves a long-reported problem where the credentials of the firmware loader could be set to a userspace process without enough permissions to actually load the firmware image. Many Android vendors have been reporting this for quite some time. The documentation updates are for the embargoed-hardware-issues.rst file to add a new entry, change an existing one, and sort the list to make changes easier in the future. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation/process: Update ARM contact for embargoed hardware issues Documentation/process: Add embargoed HW contact for Ampere Computing Documentation/process: Make groups alphabetical and use tabs consistently firmware_loader: use kernel credentials when reading firmware
2022-05-15	Merge tag 'char-misc-5.18-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are two small driver fixes for 5.18-rc7 that resolve reported problems: - slimbus driver irq bugfix - interconnect sync state bugfix Both of these have been in linux-next with no reported problems" * tag 'char-misc-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: slimbus: qcom: Fix IRQ check in qcom_slim_probe interconnect: Restore sync state by ignoring ipa-virt in provider count
2022-05-15	Merge tag 'tty-5.18-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty n_gsm and serial driver fixes for 5.18-rc7 that resolve reported problems. They include: - n_gsm fixes for reported issues - 8250_mtk driver fixes for some platforms - fsl_lpuart driver fix for reported problem. - digicolor driver fix for reported problem. All have been in linux-next for a while with no reported problems" * tag 'tty-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: fsl_lpuart: Don't enable interrupts too early tty: n_gsm: fix invalid gsmtty_write_room() result tty: n_gsm: fix mux activation issues in gsm_config() tty: n_gsm: fix buffer over-read in gsm_dlci_data() serial: 8250_mtk: Fix register address for XON/XOFF character serial: 8250_mtk: Make sure to select the right FEATURE_SEL serial: 8250_mtk: Fix UART_EFR register address tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe()
2022-05-15	Merge tag 'usb-5.18-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small fixes for reported issues with some USB drivers. They include: - xhci fixes for xhci-mtk platform driver - typec driver fixes for reported problems. - cdc-wdm read-stuck fix - gadget driver fix for reported race condition - new usb-serial driver ids All of these have been in linux-next with no reported problems" * tag 'usb-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: xhci-mtk: remove bandwidth budget table usb: xhci-mtk: fix fs isoc's transfer error usb: gadget: fix race when gadget driver register via ioctl usb: typec: tcpci_mt6360: Update for BMC PHY setting usb: gadget: uvc: allow for application to cleanly shutdown usb: typec: tcpci: Don't skip cleanup in .remove() on error usb: cdc-wdm: fix reading stuck on device close USB: serial: qcserial: add support for Sierra Wireless EM7590 USB: serial: option: add Fibocom MA510 modem USB: serial: option: add Fibocom L610 modem USB: serial: pl2303: add device id for HP LM930 Display
2022-05-15	Merge tag 'powerpc-5.18-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - Fix KVM PR on 32-bit, which was broken by some MMU code refactoring. Thanks to: Alexander Graf, and Matt Evans. * tag 'powerpc-5.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
2022-05-15	Merge tag 'x86-urgent-2022-05-15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for the handling of unpopulated sub-pmd spaces. The copy & pasta from the corresponding s390 code screwed up the address calculation for marking the sub-pmd ranges via memset by omitting the ALIGN_DOWN() to calculate the proper start address. It's a mystery why this code is not generic and shared because there is nothing architecture specific in there, but that's too intrusive for a backportable fix" * tag 'x86-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix marking of unused sub-pmd ranges
2022-05-15	Merge tag 'sched-urgent-2022-05-15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "The recent expansion of the sched switch tracepoint inserted a new argument in the middle of the arguments. This reordering broke BPF programs which relied on the old argument list. While tracepoints are not considered stable ABI, it's not trivial to make BPF cope with such a change, but it's being worked on. For now restore the original argument order and move the new argument to the end of the argument list" * tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/tracing: Append prev_state to tp args instead
2022-05-15	Merge tag 'irq-urgent-2022-05-15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a recent (introduced in 5.16) regression in the core interrupt code. The consolidation of the interrupt handler invocation code added an unconditional warning when generic_handle_domain_irq() is invoked from outside hard interrupt context. That's overbroad as the requirement for invoking these handlers in hard interrupt context is only required for certain interrupt types. The subsequently called code already contains a warning which triggers conditionally for interrupt chips which indicate this requirement in their properties. Remove the overbroad one" * tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
2022-05-15	efi: Do not import certificates from UEFI Secure Boot for T2 Macs	Aditya Garg
	On Apple T2 Macs, when Linux attempts to read the db and dbx efi variables at early boot to load UEFI Secure Boot certificates, a page fault occurs in Apple firmware code and EFI runtime services are disabled with the following logs: [Firmware Bug]: Page fault caused by firmware at PA: 0xffffb1edc0068000 WARNING: CPU: 3 PID: 104 at arch/x86/platform/efi/quirks.c:735 efi_crash_gracefully_on_page_fault+0x50/0xf0 (Removed some logs from here) Call Trace: <TASK> page_fault_oops+0x4f/0x2c0 ? search_bpf_extables+0x6b/0x80 ? search_module_extables+0x50/0x80 ? search_exception_tables+0x5b/0x60 kernelmode_fixup_or_oops+0x9e/0x110 __bad_area_nosemaphore+0x155/0x190 bad_area_nosemaphore+0x16/0x20 do_kern_addr_fault+0x8c/0xa0 exc_page_fault+0xd8/0x180 asm_exc_page_fault+0x1e/0x30 (Removed some logs from here) ? __efi_call+0x28/0x30 ? switch_mm+0x20/0x30 ? efi_call_rts+0x19a/0x8e0 ? process_one_work+0x222/0x3f0 ? worker_thread+0x4a/0x3d0 ? kthread+0x17a/0x1a0 ? process_one_work+0x3f0/0x3f0 ? set_kthread_struct+0x40/0x40 ? ret_from_fork+0x22/0x30 </TASK> ---[ end trace 1f82023595a5927f ]--- efi: Froze efi_rts_wq and disabled EFI Runtime Services integrity: Couldn't get size: 0x8000000000000015 integrity: MODSIGN: Couldn't get UEFI db list efi: EFI Runtime Services are disabled! integrity: Couldn't get size: 0x8000000000000015 integrity: Couldn't get UEFI dbx list integrity: Couldn't get size: 0x8000000000000015 integrity: Couldn't get mokx list integrity: Couldn't get size: 0x80000000 So we avoid reading these UEFI variables and thus prevent the crash. Cc: stable@vger.kernel.org Signed-off-by: Aditya Garg <gargaditya08@live.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-05-15	KVM: arm64: Don't hypercall before EL2 init	Quentin Perret
	Will reported the following splat when running with Protected KVM enabled: [ 2.427181] ------------[ cut here ]------------ [ 2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac [ 2.428424] Modules linked in: [ 2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1 [ 2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac [ 2.431091] lr : create_hyp_exec_mappings+0x40/0x80 [ 2.431377] sp : ffff80000803baf0 [ 2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000 [ 2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40 [ 2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000 [ 2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8 [ 2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000 [ 2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004 [ 2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b [ 2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005 [ 2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11 [ 2.436149] Call trace: [ 2.436600] __create_hyp_private_mapping+0x118/0x1ac [ 2.437576] create_hyp_exec_mappings+0x40/0x80 [ 2.438180] kvm_init_vector_slots+0x180/0x194 [ 2.458941] kvm_arch_init+0x80/0x274 [ 2.459220] kvm_init+0x48/0x354 [ 2.459416] arm_init+0x20/0x2c [ 2.459601] do_one_initcall+0xbc/0x238 [ 2.459809] do_initcall_level+0x94/0xb4 [ 2.460043] do_initcalls+0x54/0x94 [ 2.460228] do_basic_setup+0x1c/0x28 [ 2.460407] kernel_init_freeable+0x110/0x178 [ 2.460610] kernel_init+0x20/0x1a0 [ 2.460817] ret_from_fork+0x10/0x20 [ 2.461274] ---[ end trace 0000000000000000 ]--- Indeed, the Protected KVM mode promotes __create_hyp_private_mapping() to a hypercall as EL1 no longer has access to the hypervisor's stage-1 page-table. However, the call from kvm_init_vector_slots() happens after pKVM has been initialized on the primary CPU, but before it has been initialized on secondaries. As such, if the KVM initcall procedure is migrated from one CPU to another in this window, the hypercall may end up running on a CPU for which EL2 has not been initialized. Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the vectors in the private range, so the hypercall in question is in fact superfluous. Skip it when pKVM is enabled. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> [maz: simplified the checks slightly] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com
2022-05-15	KVM: arm64: vgic-v3: Consistently populate ID_AA64PFR0_EL1.GIC	Marc Zyngier
	When adding support for the slightly wonky Apple M1, we had to populate ID_AA64PFR0_EL1.GIC==1 to present something to the guest, as the HW itself doesn't advertise the feature. However, we gated this on the in-kernel irqchip being created. This causes some trouble for QEMU, which snapshots the state of the registers before creating a virtual GIC, and then tries to restore these registers once the GIC has been created. Obviously, between the two stages, ID_AA64PFR0_EL1.GIC has changed value, and the write fails. The fix is to actually emulate the HW, and always populate the field if the HW is capable of it. Fixes: 562e530fd770 ("KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3") Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Oliver Upton <oupton@google.com> Link: https://lore.kernel.org/r/20220503211424.3375263-1-maz@kernel.org
2022-05-15	random: do not pretend to handle premature next security model	Jason A. Donenfeld
	Per the thread linked below, "premature next" is not considered to be a realistic threat model, and leads to more serious security problems. "Premature next" is the scenario in which: - Attacker compromises the current state of a fully initialized RNG via some kind of infoleak. - New bits of entropy are added directly to the key used to generate the /dev/urandom stream, without any buffering or pooling. - Attacker then, somehow having read access to /dev/urandom, samples RNG output and brute forces the individual new bits that were added. - Result: the RNG never "recovers" from the initial compromise, a so-called violation of what academics term "post-compromise security". The usual solutions to this involve some form of delaying when entropy gets mixed into the crng. With Fortuna, this involves multiple input buckets. With what the Linux RNG was trying to do prior, this involves entropy estimation. However, by delaying when entropy gets mixed in, it also means that RNG compromises are extremely dangerous during the window of time before the RNG has gathered enough entropy, during which time nonces may become predictable (or repeated), ephemeral keys may not be secret, and so forth. Moreover, it's unclear how realistic "premature next" is from an attack perspective, if these attacks even make sense in practice. Put together -- and discussed in more detail in the thread below -- these constitute grounds for just doing away with the current code that pretends to handle premature next. I say "pretends" because it wasn't doing an especially great job at it either; should we change our mind about this direction, we would probably implement Fortuna to "fix" the "problem", in which case, removing the pretend solution still makes sense. This also reduces the crng reseed period from 5 minutes down to 1 minute. The rationale from the thread might lead us toward reducing that even further in the future (or even eliminating it), but that remains a topic of a future commit. At a high level, this patch changes semantics from: Before: Seed for the first time after 256 "bits" of estimated entropy have been accumulated since the system booted. Thereafter, reseed once every five minutes, but only if 256 new "bits" have been accumulated since the last reseeding. After: Seed for the first time after 256 "bits" of estimated entropy have been accumulated since the system booted. Thereafter, reseed once every minute. Most of this patch is renaming and removing: POOL_MIN_BITS becomes POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(), crng_reseed() loses its "force" parameter since it's now always true, the drain_entropy() function no longer has any use so it's removed, entropy estimation is skipped if we've already init'd, the various notifiers for "low on entropy" are now only active prior to init, and finally, some documentation comments are cleaned up here and there. Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ Cc: Theodore Ts'o <tytso@mit.edu> Cc: Nadia Heninger <nadiah@cs.ucsd.edu> Cc: Tom Ristenpart <ristenpart@cornell.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-15	selftests/arm64: Use switch statements in mte_common_util.c	Mark Brown
	In the MTE tests there are several places where we use chains of if statements to open code what could be written as switch statements, move over to switch statements to make the idiom clearer. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220510164520.768783-6-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	selftests/arm64: Remove casts to/from void in check_tags_inclusion	Mark Brown
	Void pointers may be freely used with other pointer types in C, any casts between void * and other pointer types serve no purpose other than to mask potential warnings. Drop such casts from check_tags_inclusion to help with future review of the code. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220510164520.768783-5-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	selftests/arm64: Check failures to set tags in check_tags_inclusion	Mark Brown
	The MTE check_tags_inclusion test uses the mte_switch_mode() helper but ignores the return values it generates meaning we might not be testing the things we're trying to test, fail the test if it reports an error. The helper will log any errors it returns. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220510164520.768783-4-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	selftests/arm64: Allow zero tags in mte_switch_mode()	Mark Brown
	mte_switch_mode() currently rejects attempts to set a zero tag however there are tests such as check_tags_inclusion which attempt to cover cases with zero tags using mte_switch_mode(). Since it is not clear why we are rejecting zero tags change the test to accept them. The issue has not previously been as apparent as it should be since the return value of mte_switch_mode() was not always checked in the callers and the tests weren't otherwise failing. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220510164520.768783-3-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	selftests/arm64: Log errors in verify_mte_pointer_validity()	Mark Brown
	When we detect a problem in verify_mte_pointer_validity() while checking tags we don't log what the problem was which makes debugging harder. Add some diagnostics. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220510164520.768783-2-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	arm64/sysreg: fix odd line spacing	Mark Rutland
	Between the header and the definitions, there's no line gap, and in a couple of places a double line gap for no semantic reason, which makes the output look a little odd. Fix this so blocks are consistently separated with a single line gap: * Add a newline after the "Generated file" comment line, so this is clearly split from whatever the first definition in the file is. * At the start of a SysregFields block there's no need for a newline as we haven't output any sysreg encoding details prior to this. * At the end of a Sysreg block there's no need for a newline if we have no RES0 or RES1 fields, as there will be a line gap after the previous element (e.g. a Fields line). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220513174118.266966-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-15	arm64/sysreg: improve comment for regs without fields	Mark Rutland
	Currently for registers without fields we create a comment pointing at the common definitions, e.g. \| #define REG_TTBR0_EL1 S3_0_C2_C0_0 \| #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) \| #define SYS_TTBR0_EL1_Op0 3 \| #define SYS_TTBR0_EL1_Op1 0 \| #define SYS_TTBR0_EL1_CRn 2 \| #define SYS_TTBR0_EL1_CRm 0 \| #define SYS_TTBR0_EL1_Op2 0 \| \| /* See TTBRx_EL1 / It would be slightly nicer if the comment said what we should be looking for, e.g. \| #define REG_TTBR0_EL1 S3_0_C2_C0_0 \| #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) \| #define SYS_TTBR0_EL1_Op0 3 \| #define SYS_TTBR0_EL1_Op1 0 \| #define SYS_TTBR0_EL1_CRn 2 \| #define SYS_TTBR0_EL1_CRm 0 \| #define SYS_TTBR0_EL1_Op2 0 \| \| / For TTBR0_EL1 fields see TTBRx_EL1 */ Update the comment generation accordingly. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220513174118.266966-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-14	can: m_can: remove support for custom bit timing, take #2	Jarkko Nikula
	Now when Intel Elkhart Lake uses again common bit timing and there are no other users for custom bit timing, we can bring back the changes done by the commit 0ddd83fbebbc ("can: m_can: remove support for custom bit timing"). This effectively reverts commit ea768b2ffec6 ("Revert "can: m_can: remove support for custom bit timing"") while taking into account commit ea22ba40debe ("can: m_can: make custom bittiming fields const") and commit 7d4a101c0bd3 ("can: dev: add sanity check in can_set_static_ctrlmode()"). Link: https://lore.kernel.org/all/20220512124144.536850-2-jarkko.nikula@linux.intel.com Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-05-14	Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"	Jarkko Nikula
	This reverts commit 0e8ffdf3b86dfd44b651f91b12fcae76c25c453b. Commit 0e8ffdf3b86d ("can: m_can: pci: use custom bit timings for Elkhart Lake") broke the test case using bitrate switching. \| ip link set can0 up type can bitrate 500000 dbitrate 4000000 fd on \| ip link set can1 up type can bitrate 500000 dbitrate 4000000 fd on \| candump can0 & \| cangen can1 -I 0x800 -L 64 -e -fb \ \| -D 11223344deadbeef55667788feedf00daabbccdd44332211 -n 1 -v -v Above commit does everything correctly according to the datasheet. However datasheet wasn't correct. I got confirmation from hardware engineers that the actual CAN hardware on Intel Elkhart Lake is based on M_CAN version v3.2.0. Datasheet was mirroring values from an another specification which was based on earlier M_CAN version leading to wrong bit timings. Therefore revert the commit and switch back to common bit timings. Fixes: ea4c1787685d ("can: m_can: pci: use custom bit timings for Elkhart Lake") Link: https://lore.kernel.org/all/20220512124144.536850-1-jarkko.nikula@linux.intel.com Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reported-by: Chee Hou Ong <chee.houx.ong@intel.com> Reported-by: Aman Kumar <aman.kumar@intel.com> Reported-by: Pallavi Kumari <kumari.pallavi@intel.com> Cc: <stable@vger.kernel.org> # v5.16+ Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-05-14	Merge tag 'perf-tools-fixes-for-v5.18-2022-05-14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix two NDEBUG warnings in 'perf bench numa' - Fix ARM coresight `perf test` failure - Sync linux/kvm.h with the kernel sources - Add James and Mike as Arm64 performance events reviewers * tag 'perf-tools-fixes-for-v5.18-2022-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: MAINTAINERS: Add James and Mike as Arm64 performance events reviewers tools headers UAPI: Sync linux/kvm.h with the kernel sources perf tests: Fix coresight `perf test` failure. perf bench: Fix two numa NDEBUG warnings
2022-05-14	genirq/irq_sim: Make the irq_work always run in hard irq context	Sebastian Andrzej Siewior
	The IRQ simulator uses irq_work to trigger an interrupt. Without the IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context on PREEMPT_RT. This causes locking errors later in handle_simple_irq() which expects to be invoked with disabled interrupts. Triggering individual interrupts in hardirq context should not lead to unexpected high latencies since this is also what the hardware controller does. Also it is used as a simulator so... Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de
2022-05-14	timers: Provide a better debugobjects hint for delayed works	Stephen Boyd
	With debugobjects enabled the timer hint for freeing of active timers embedded inside delayed works is always the same, i.e. the hint is delayed_work_timer_fn, even though the function the delayed work is going to run can be wildly different depending on what work was queued. Enabling workqueue debugobjects doesn't help either because the delayed work isn't considered active until it is actually queued to run on a workqueue. If the work is freed while the timer is pending the work isn't considered active so there is no information from workqueue debugobjects. Special case delayed works in the timer debugobjects hint logic so that the delayed work function is returned instead of the delayed_work_timer_fn. This will help to understand which delayed work was pending that got freed. Apply the same treatment for kthread_delayed_work because it follows the same pattern. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org
2022-05-14	io_uring: implement multishot mode for accept	Hao Xu
	Refactor io_accept() to support multishot mode. theoretical analysis: 1) when connections come in fast - singleshot: add accept sqe(userspace) --> accept inline ^ \| \|-----------------\| - multishot: add accept sqe(userspace) --> accept inline ^ \| \|----\| we do accept repeatedly in place until get EAGAIN 2) when connections come in at a low pressure similar thing like 1), we reduce a lot of userspace-kernel context switch and useless vfs_poll() tests: Did some tests, which goes in this way: server client(multiple) accept connect read write write read close close Basically, raise up a number of clients(on same machine with server) to connect to the server, and then write some data to it, the server will write those data back to the client after it receives them, and then close the connection after write return. Then the client will read the data and then close the connection. Here I test 10000 clients connect one server, data size 128 bytes. And each client has a go routine for it, so they come to the server in short time. test 20 times before/after this patchset, time spent:(unit cycle, which is the return value of clock()) before: 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075 +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984 +1934226+1914385)/20.0 = 1927633.75 after: 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112 +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506 +1871324+1940803)/20.0 = 1894750.45 (1927633.75 - 1894750.45) / 1927633.75 = 1.65% Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-5-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-14	io_uring: let fast poll support multishot	Hao Xu
	For operations like accept, multishot is a useful feature, since we can reduce a number of accept sqe. Let's integrate it to fast poll, it may be good for other operations in the future. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-4-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-14	io_uring: add REQ_F_APOLL_MULTISHOT for requests	Hao Xu
	Add a flag to indicate multishot mode for fast poll. currently only accept use it, but there may be more operations leveraging it in the future. Also add a mask IO_APOLL_MULTI_POLLED which stands for REQ_F_APOLL_MULTI \| REQ_F_POLLED, to make the code short and cleaner. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-3-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-14	io_uring: add IORING_ACCEPT_MULTISHOT for accept	Hao Xu
	add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to support multishot. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-2-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-13	net: macb: Increment rx bd head after allocating skb and buffer	Harini Katakam
	In gem_rx_refill rx_prepared_head is incremented at the beginning of the while loop preparing the skb and data buffers. If the skb or data buffer allocation fails, this BD will be unusable BDs until the head loops back to the same BD (and obviously buffer allocation succeeds). In the unlikely event that there's a string of allocation failures, there will be an equal number of unusable BDs and an inconsistent RX BD chain. Hence increment the head at the end of the while loop to be clean. Fixes: 4df95131ea80 ("net/macb: change RX path for GEM") Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220512171900.32593-1-harini.katakam@xilinx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13	Merge branch 'mptcp-subflow-accounting-fix'	Jakub Kicinski
	Mat Martineau says: ==================== mptcp: Subflow accounting fix This series contains a bug fix affecting the in-kernel path manager (patch 1), where closing subflows would sometimes not adjust the PM's count of active subflows. Patch 2 updates the selftests to exercise the new code. ==================== Link: https://lore.kernel.org/r/20220512232642.541301-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13	selftests: mptcp: add subflow limits test-cases	Paolo Abeni
	Add and delete a bunch of endpoints and verify the respect of configured limits. This covers the codepath introduced by the previous patch. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13	mptcp: fix subflow accounting on close	Paolo Abeni
	If the PM closes a fully established MPJ subflow or the subflow creation errors out in it's early stage the subflows counter is not bumped accordingly. This change adds the missing accounting, additionally taking care of updating accordingly the 'accept_subflow' flag. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13	Merge tag 'drm-fixes-2022-05-14' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull more drm fixes from Dave Airlie: "Turns out I was right, some fixes hadn't made it to me yet. The vmwgfx ones also popped up later, but all seem like bad enough things to fix. The dma-buf, vc4 and nouveau ones are all pretty small. The fbdev fixes are a bit more complicated: a fix to cleanup fbdev devices properly, uncovered some use-after-free bugs in existing drivers. Then the fix for those bugs wasn't correct. This reverts that fix, and puts the proper fixes in place in the drivers to avoid the use-after-frees. This has had a fair number of eyes on it at this stage, and I'm confident enough that it puts things in the right place, and is less dangerous than reverting our way out of the initial change at this stage. fbdev: - revert NULL deref fix that turned into a use-after-free - prevent use-after-free in fbdev - efifb/simplefb/vesafb: fix cleanup paths to avoid use-after-frees dma-buf: - fix panic in stats setup vc4: - fix hdmi build nouveau: - tegra iommu present fix - fix leak in backlight name vmwgfx: - Black screen due to fences using FIFO checks on SVGA3 - Random black screens on boot due to uninitialized drm_mode_fb_cmd2 - Hangs on SVGA3 due to command buffers being used with gbobjects" * tag 'drm-fixes-2022-05-14' of git://anongit.freedesktop.org/drm/drm: drm/vmwgfx: Disable command buffers on svga3 without gbobjects drm/vmwgfx: Initialize drm_mode_fb_cmd2 drm/vmwgfx: Fix fencing on SVGAv3 drm/vc4: hdmi: Fix build error for implicit function declaration dma-buf: call dma_buf_stats_setup after dmabuf is in valid list fbdev: efifb: Fix a use-after-free due early fb_info cleanup drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name() drm/nouveau/tegra: Stop using iommu_present() fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: Prevent possible use-after-free in fb_release() Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
2022-05-14	pinctrl: sunxi: f1c100s: Fix signal name comment for PA2 SPI pin	Andre Przywara
	The manual describes function 0x6 of pin PA2 as "SPI1_CLK", so change the comment to reflect that. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220504170736.2669595-1-andre.przywara@arm.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2022-05-14	pinctrl: sunxi: fix f1c100s uart2 function	IotaHydrae
	Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2 When the pin PD13 and PD14 is setting up to uart2 function in dts, there's an error occurred: 1c20800.pinctrl: unsupported function uart2 on pin PD14 Because 'uart2' is not any one multiplexing option of PD14, and pinctrl don't know how to configure it. So change the pin PD14 lvds1 function to uart2. Signed-off-by: IotaHydrae <writeforever@foxmail.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2022-05-13	block/mq-deadline: Set the fifo_time member also if inserting at head	Bart Van Assche
	Before commit 322cff70d46c the fifo_time member of requests on a dispatch list was not used. Commit 322cff70d46c introduces code that reads the fifo_time member of requests on dispatch lists. Hence this patch that sets the fifo_time member when adding a request to a dispatch list. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Fixes: 322cff70d46c ("block/mq-deadline: Prioritize high-priority requests") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220513171307.32564-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-14	Merge tag 'drm-misc-fixes-2022-05-13' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Multiple fixes to fbdev to address a regression at unregistration, an iommu detection improvement for nouveau, a memory leak fix for nouveau, pointer dereference fix for dma_buf_file_release(), and a build breakage fix for vc4 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220513073044.ymayac7x7bzatrt7@houat
2022-05-14	Merge tag 'vmwgfx-drm-fixes-5.18-2022-05-13' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/zack/vmwgfx into drm-fixes vmwgfx fixes for: - Black screen due to fences using FIFO checks on SVGA3 - Random black screens on boot due to uninitialized drm_mode_fb_cmd2 - Hangs on SVGA3 due to command buffers being used with gbobjects Signed-off-by: Dave Airlie <airlied@redhat.com> From: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/a1d32799e4c74b8540216376d7576bb783ca07ba.camel@vmware.com
2022-05-13	random: use first 128 bits of input as fast init	Jason A. Donenfeld
	Before, the first 64 bytes of input, regardless of how entropic it was, would be used to mutate the crng base key directly, and none of those bytes would be credited as having entropy. Then 256 bits of credited input would be accumulated, and only then would the rng transition from the earlier "fast init" phase into being actually initialized. The thinking was that by mixing and matching fast init and real init, an attacker who compromised the fast init state, considered easy to do given how little entropy might be in those first 64 bytes, would then be able to bruteforce bits from the actual initialization. By keeping these separate, bruteforcing became impossible. However, by not crediting potentially creditable bits from those first 64 bytes of input, we delay initialization, and actually make the problem worse, because it means the user is drawing worse random numbers for a longer period of time. Instead, we can take the first 128 bits as fast init, and allow them to be credited, and then hold off on the next 128 bits until they've accumulated. This is still a wide enough margin to prevent bruteforcing the rng state, while still initializing much faster. Then, rather than trying to piecemeal inject into the base crng key at various points, instead just extract from the pool when we need it, for the crng_init==0 phase. Performance may even be better for the various inputs here, since there are likely more calls to mix_pool_bytes() then there are to get_random_bytes() during this phase of system execution. Since the preinit injection code is gone, bootloader randomness can then do something significantly more straight forward, removing the weird system_wq hack in hwgenerator randomness. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	random: do not use batches when !crng_ready()	Jason A. Donenfeld
	It's too hard to keep the batches synchronized, and pointless anyway, since in !crng_ready(), we're updating the base_crng key really often, where batching only hurts. So instead, if the crng isn't ready, just call into get_random_bytes(). At this stage nothing is performance critical anyhow. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	random: mix in timestamps and reseed on system restore	Jason A. Donenfeld
	Since the RNG loses freshness with system suspend/hibernation, when we resume, immediately reseed using whatever data we can, which for this particular case is the various timestamps regarding system suspend time, in addition to more generally the RDSEED/RDRAND/RDTSC values that happen whenever the crng reseeds. On systems that suspend and resume automatically all the time -- such as Android -- we skip the reseeding on suspend resumption, since that could wind up being far too busy. This is the same trade-off made in WireGuard. In addition to reseeding upon resumption always mix into the pool these various stamps on every power notification event. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	random: vary jitter iterations based on cycle counter speed	Jason A. Donenfeld
	Currently, we do the jitter dance if two consecutive reads to the cycle counter return different values. If they do, then we consider the cycle counter to be fast enough that one trip through the scheduler will yield one "bit" of credited entropy. If those two reads return the same value, then we assume the cycle counter is too slow to show meaningful differences. This methodology is flawed for a variety of reasons, one of which Eric posted a patch to fix in [1]. The issue that patch solves is that on a system with a slow counter, you might be [un]lucky and read the counter _just_ before it changes, so that the second cycle counter you read differs from the first, even though there's usually quite a large period of time in between the two. For example: \| real time \| cycle counter \| \| --------- \| ------------- \| \| 3 \| 5 \| \| 4 \| 5 \| \| 5 \| 5 \| \| 6 \| 5 \| \| 7 \| 5 \| <--- a \| 8 \| 6 \| <--- b \| 9 \| 6 \| <--- c If we read the counter at (a) and compare it to (b), we might be fooled into thinking that it's a fast counter, when in reality it is not. The solution in [1] is to also compare counter (b) to counter (c), on the theory that if the counter is _actually_ slow, and (a)!=(b), then certainly (b)==(c). This helps solve this particular issue, in one sense, but in another sense, it mostly functions to disallow jitter entropy on these systems, rather than simply taking more samples in that case. Instead, this patch takes a different approach. Right now we assume that a difference in one set of consecutive samples means one "bit" of credited entropy per scheduler trip. We can extend this so that a difference in two sets of consecutive samples means one "bit" of credited entropy per /two/ scheduler trips, and three for three, and four for four. In other words, we can increase the amount of jitter "work" we require for each "bit", depending on how slow the cycle counter is. So this patch takes whole bunch of samples, sees how many of them are different, and divides to find the amount of work required per "bit", and also requires that at least some minimum of them are different in order to attempt any jitter entropy. Note that this approach is still far from perfect. It's not a real statistical estimate on how much these samples vary; it's not a real-time analysis of the relevant input data. That remains a project for another time. However, it makes the same (partly flawed) assumptions as the code that's there now, so it's probably not worse than the status quo, and it handles the issue Eric mentioned in [1]. But, again, it's probably a far cry from whatever a really robust version of this would be. [1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/ https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/ Cc: Eric Biggers <ebiggers@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	random: insist on random_get_entropy() existing in order to simplify	Jason A. Donenfeld
	All platforms are now guaranteed to provide some value for random_get_entropy(). In case some bug leads to this not being so, we print a warning, because that indicates that something is really very wrong (and likely other things are impacted too). This should never be hit, but it's a good and cheap way of finding out if something ever is problematic. Since we now have viable fallback code for random_get_entropy() on all platforms, which is, in the worst case, not worse than jiffies, we can count on getting the best possible value out of it. That means there's no longer a use for using jiffies as entropy input. It also means we no longer have a reason for doing the round-robin register flow in the IRQ handler, which was always of fairly dubious value. Instead we can greatly simplify the IRQ handler inputs and also unify the construction between 64-bits and 32-bits. We now collect the cycle counter and the return address, since those are the two things that matter. Because the return address and the irq number are likely related, to the extent we mix in the irq number, we can just xor it into the top unchanging bytes of the return address, rather than the bottom changing bytes of the cycle counter as before. Then, we can do a fixed 2 rounds of SipHash/HSipHash. Finally, we use the same construction of hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're not actually discarding any entropy, since that entropy is carried through until the next time. And more importantly, it lets us do the same sponge-like construction everywhere. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	xtensa: use fallback for random_get_entropy() instead of zero	Jason A. Donenfeld
	In the event that random_get_entropy() can't access a cycle counter or similar, falling back to returning 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. This is accomplished by just including the asm-generic code like on other architectures, which means we can get rid of the empty stub function here. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	sparc: use fallback for random_get_entropy() instead of zero	Jason A. Donenfeld
	In the event that random_get_entropy() can't access a cycle counter or similar, falling back to returning 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. This is accomplished by just including the asm-generic code like on other architectures, which means we can get rid of the empty stub function here. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13	um: use fallback for random_get_entropy() instead of zero	Jason A. Donenfeld
	In the event that random_get_entropy() can't access a cycle counter or similar, falling back to returning 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. This is accomplished by just including the asm-generic code like on other architectures, which means we can get rid of the empty stub function here. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>