summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-21printk: nbcon: Add unsafe flushing on panicJohn Ogness
Add nbcon_atomic_flush_unsafe() to flush all nbcon consoles using the write_atomic() callback and allowing unsafe hostile takeovers. Call this at the end of panic() as a final attempt to flush any pending messages. Note that legacy consoles use unsafe methods for flushing from the beginning of panic (see bust_spinlocks()). Therefore, systems using both legacy and nbcon consoles may still fail to see panic messages due to unsafe legacy console usage. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-27-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Flush nbcon consoles first on panicJohn Ogness
In console_flush_on_panic(), flush the nbcon consoles before flushing legacy consoles. The legacy write() callbacks are not fully safe when oops_in_progress is set. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-26-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Flush new records on device_release()John Ogness
There may be new records that were added while a driver was holding the nbcon context for non-printing purposes. These new records must be flushed by the nbcon_device_release() context because no other context will do it. If boot consoles are registered, the legacy loop is used (either direct or per irq_work) to handle the flushing. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-25-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Add is_printk_legacy_deferred()John Ogness
If printk has been explicitly deferred or is called from NMI context, legacy console printing must be deferred to an irq_work context. Introduce a helper function is_printk_legacy_deferred() for a CPU to query if it must defer legacy console printing. In follow-up commits this helper will be needed at other call sites as well. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-24-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Use nbcon consoles in console_flush_all()John Ogness
Allow nbcon consoles to print messages in the legacy printk() caller context (printing via unlock) by integrating them into console_flush_all(). The write_atomic() callback is used for printing. Provide nbcon_legacy_emit_next_record(), which acts as the nbcon variant of console_emit_next_record(). Call this variant within console_flush_all() for nbcon consoles. Since nbcon consoles use their own @nbcon_seq variable to track the next record to print, this also must be appropriately handled in console_flush_all(). Note that the legacy printing logic uses @handover to detect handovers for printing all consoles. For nbcon consoles, handovers/takeovers occur on a per-console basis and thus do not cause the console_flush_all() loop to abort. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-23-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Track registered boot consolesJohn Ogness
Unfortunately it is not known if a boot console and a regular (legacy or nbcon) console use the same hardware. For this reason they must not be allowed to print simultaneously. For legacy consoles this is not an issue because they are already synchronized with the boot consoles using the console lock. However nbcon consoles can be triggered separately. Add a global flag @have_boot_console to identify if any boot consoles are registered. This will be used in follow-up commits to ensure that boot consoles and nbcon consoles cannot print simultaneously. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-22-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Provide function to flush using write_atomic()Thomas Gleixner
Provide nbcon_atomic_flush_pending() to perform flushing of all registered nbcon consoles using their write_atomic() callback. Unlike console_flush_all(), nbcon_atomic_flush_pending() will only flush up through the newest record at the time of the call. This prevents a CPU from printing unbounded when other CPUs are adding records. If new records are added while flushing, it is expected that the dedicated printer threads will print those records. If the printer thread is not available (which is always the case at this point in the rework), nbcon_atomic_flush_pending() _will_ flush all records in the ringbuffer. Unlike console_flush_all(), nbcon_atomic_flush_pending() will fully flush one console before flushing the next. This helps to guarantee that a block of pending records (such as a stack trace in an emergency situation) can be printed atomically at once before releasing console ownership. nbcon_atomic_flush_pending() is safe in any context because it uses write_atomic() and acquires with unsafe_takeover disabled. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-21-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add helper to assign priority based on CPU stateJohn Ogness
Add a helper function to use the current state of the CPU to determine which priority to assign to the printing context. The EMERGENCY priority handling is added in a follow-up commit. It will use a per-CPU variable. Note: nbcon_device_try_acquire(), which is used by console drivers to acquire the nbcon console for non-printing activities, is hard-coded to always use NORMAL priority. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-20-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Add @flags argument for console_is_usable()John Ogness
The caller of console_is_usable() usually needs @console->flags for its own checks. Rather than having console_is_usable() read its own copy, make the caller pass in the @flags. This also ensures that the caller saw the same @flags value. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-19-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Let console_is_usable() handle nbconJohn Ogness
The nbcon consoles use a different printing callback. For nbcon consoles, check for the write_atomic() callback instead of write(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-18-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Make console_is_usable() available to nbcon.cJohn Ogness
Move console_is_usable() as-is into internal.h so that it can be used by nbcon printing functions as well. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-17-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Do not rely on proxy headersJohn Ogness
The headers kernel.h, serial_core.h, and console.h allow for the definitions of many types and functions from other headers. Rather than relying on these as proxy headers, explicitly include all headers providing needed definitions. Also sort the list alphabetically to be able to easily detect duplicates. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-16-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Acquire nbcon context in port->lock wrapperJohn Ogness
Currently the port->lock wrappers uart_port_lock(), uart_port_unlock() (and their variants) only lock/unlock the spin_lock. If the port is an nbcon console that has implemented the write_atomic() callback, the wrappers must also acquire/release the console context and mark the region as unsafe. This allows general port->lock synchronization to be synchronized against the nbcon write_atomic() callback. Note that __uart_port_using_nbcon() relies on the port->lock being held while a console is added and removed from the console list (i.e. all uart nbcon drivers *must* take the port->lock in their device_lock() callbacks). Signed-off-by: John Ogness <john.ogness@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-15-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21nbcon: Add API to acquire context for non-printing operationsJohn Ogness
Provide functions nbcon_device_try_acquire() and nbcon_device_release() which will try to acquire the nbcon console ownership with NBCON_PRIO_NORMAL and mark it unsafe for handover/takeover. These functions are to be used together with the device-specific locking when performing non-printing activities on the console device. They will allow synchronization against the atomic_write() callback which will be serialized, for higher priority contexts, only by acquiring the console context ownership. Pitfalls: The API requires to be called in a context with migration disabled because it uses per-CPU variables internally. The context is set unsafe for a takeover all the time. It guarantees full serialization against any atomic_write() caller except for the final flush in panic() which might try an unsafe takeover. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-14-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21console: Improve console_srcu_read_flags() commentsJohn Ogness
It was not clear when exactly console_srcu_read_flags() must be used vs. directly reading @console->flags. Refactor and clarify that console_srcu_read_flags() is only needed if the console is registered or the caller is in a context where the registration status of the console may change (due to another context). The function requires the caller holds @console_srcu, which will ensure that the caller sees an appropriate @flags value for the registered console and that exit/cleanup routines will not run if the console is in the process of unregistration. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-13-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Introduce wrapper to set @uart_port->consJohn Ogness
Introduce uart_port_set_cons() as a wrapper to set @cons of a uart_port. The wrapper sets @cons under the port lock in order to prevent @cons from disappearing while another context is holding the port lock. This is necessary for a follow-up commit relating to the port lock wrappers, which rely on @cons not changing between lock and unlock. Signed-off-by: John Ogness <john.ogness@linutronix.de> Tested-by: Théo Lebrun <theo.lebrun@bootlin.com> # EyeQ5, AMBA-PL011 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240820063001.36405-12-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Provide low-level functions to lock portJohn Ogness
It will be necessary at times for the uart nbcon console drivers to acquire the port lock directly (without the additional nbcon functionality of the port lock wrappers). These are special cases such as the implementation of the device_lock()/device_unlock() callbacks or for internal port lock wrapper synchronization. Provide low-level variants __uart_port_lock_irqsave() and __uart_port_unlock_irqrestore() for this purpose. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240820063001.36405-11-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Use driver synchronization while (un)registeringJohn Ogness
Console drivers typically have to deal with access to the hardware via user input/output (such as an interactive login shell) and output of kernel messages via printk() calls. They use some classic driver-specific locking mechanism in most situations. But console->write_atomic() callbacks, used by nbcon consoles, are synchronized only by acquiring the console context. The synchronization via the console context ownership is possible only when the console driver is registered. It is when a particular device driver is connected with a particular console driver. The two synchronization mechanisms must be synchronized between each other. It is tricky because the console context ownership is quite special. It might be taken over by a higher priority context. Also CPU migration must be disabled. The most tricky part is to (dis)connect these two mechanisms during the console (un)registration. Use the driver-specific locking callbacks: device_lock(), device_unlock(). They allow taking the device-specific lock while the device is being (un)registered by the related console driver. For example, these callbacks lock/unlock the port lock for serial port drivers. Note that the driver-specific locking is only needed during (un)register if it is an nbcon console with the write_atomic() callback implemented. If write_atomic() is not implemented, the driver should never attempt to access the hardware without first acquiring its driver-specific lock. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-10-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add callbacks to synchronize with driverJohn Ogness
Console drivers typically must deal with access to the hardware via user input/output (such as an interactive login shell) and output of kernel messages via printk() calls. To provide the necessary synchronization, usually some driver-specific locking mechanism is used (for example, the port spinlock for uart serial consoles). Until now, usage of this driver-specific locking has been hidden from the printk-subsystem and implemented within the various console callbacks. However, nbcon consoles would need to use it even in the generic code. Add device_lock() and device_unlock() callback which will need to get implemented by nbcon consoles. The callbacks will use whatever synchronization mechanism the driver is using for itself. The minimum requirement is to prevent CPU migration. It would allow a context friendly acquiring of nbcon console ownership in non-emergency and non-panic context. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-9-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add detailed doc for write_atomic()John Ogness
The write_atomic() callback has special requirements and is allowed to use special helper functions. Provide detailed documentation of the callback so that a developer has a chance of implementing it correctly. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-8-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Remove return value for write_atomic()John Ogness
The return value of write_atomic() does not provide any useful information. On the contrary, it makes things more complicated for the caller to appropriately deal with the information. Change write_atomic() to not have a return value. If the message did not get printed due to loss of ownership, the caller will notice this on its own. If ownership was not lost, it will be assumed that the driver successfully printed the message and the sequence number for that console will be incremented. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-7-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Clarify rules of the owner/waiter matchingJohn Ogness
The functions nbcon_owner_matches() and nbcon_waiter_matches() use a minimal set of data to determine if a context matches. The existing kerneldoc and comments were not clear enough and caused the printk folks to re-prove that the functions are indeed reliable in all cases. Update and expand the explanations so that it is clear that the implementations are sufficient for all cases. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-6-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Check printk_deferred_enter()/_exit() usageSebastian Andrzej Siewior
Add validation that printk_deferred_enter()/_exit() are called in non-migration contexts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-5-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Properly deal with nbcon consoles on seq initPetr Mladek
If a non-boot console is registering and boot consoles exist, the consoles are flushed before being unregistered. This allows the non-boot console to continue where the boot console left off. If for whatever reason flushing fails, the lowest seq found from any of the enabled boot consoles is used. Until now con->seq was checked. However, if it is an nbcon boot console, the function nbcon_seq_read() must be used to read seq because con->seq is not updated for nbcon consoles. Check if it is an nbcon boot console and if so call nbcon_seq_read() to read seq. Also, avoid usage of con->seq as temporary storage of the starting record. Instead, rename console_init_seq() to get_init_console_seq() and just return the value. For nbcon consoles set the sequence via nbcon_seq_force(), for legacy consoles set con->seq. The cleaned design should make sure that the value stays and is set before the console is added to the console list. It also unifies the sequence number initialization for legacy and nbcon consoles. Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20240820063001.36405-4-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Consolidate alloc() and init()John Ogness
Rather than splitting the nbcon allocation and initialization into two pieces, perform all initialization in nbcon_alloc(). Later, the initial sequence is calculated and can be explicitly set using nbcon_seq_force(). This removes the need for the strong rules of nbcon_init() that even included a BUG_ON(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-3-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Add notation to console_srcu lockingJohn Ogness
kernel/printk/printk.c:284:5: sparse: sparse: context imbalance in 'console_srcu_read_lock' - wrong count at exit include/linux/srcu.h:301:9: sparse: sparse: context imbalance in 'console_srcu_read_unlock' - unexpected unlock Fixes: 6c4afa79147e ("printk: Prepare for SRCU console list protection") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240820063001.36405-2-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21net: wwan: t7xx: PCIe reset rescanJinjian Song
WWAN device is programmed to boot in normal mode or fastboot mode, when triggering a device reset through ACPI call or fastboot switch command. Maintain state machine synchronization and reprobe logic after a device reset. The PCIe device reset triggered by several ways. E.g.: - fastboot: echo "fastboot_switching" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - reset: echo "reset" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - IRQ: PCIe device request driver to reset itself by an interrupt request. Use pci_reset_function() as a generic way to reset device, save and restore the PCIe configuration before and after reset device to ensure the reprobe process. Suggestion from Bjorn: Link: https://lore.kernel.org/all/20230127133034.GA1364550@bhelgaas/ Signed-off-by: Jinjian Song <jinjian.song@fibocom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-21rust: kernel: fix typos in code commentsMichael Vetter
Fix spelling mistakes in code comments. Signed-off-by: Michael Vetter <jubalh@iodoru.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20240819205731.2163-1-jubalh@iodoru.org [ Reworded slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-21docs: rust: remove unintended blockquote in Coding GuidelinesVincent Woltmann
An unordered list in coding-guidelines.rst was indented, producing a blockquote around it and making it look more indented than expected. Remove the indentation to only output an unordered list. Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://github.com/Rust-for-Linux/linux/issues/1063 Fixes: d07479b211b7 ("docs: add Rust documentation") Signed-off-by: Vincent Woltmann <vincent@woltmann.art> Link: https://lore.kernel.org/r/20240816200339.2495875-1-vincent@woltmann.art [ Reworded title. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-21rust: block: fix wrong usage of lockdep APIAndreas Hindborg
When allocating `struct gendisk`, `GenDiskBuilder` is using a dynamic lock class key without registering the key. This is an incorrect use of the API, which causes a `WARN` trace. Fix the issue by using a static lock class key, which is more appropriate for the situation anyway. Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module") Reported-by: Behme Dirk (XC-CP/ESB5) <Dirk.Behme@de.bosch.com> Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/x/topic/x/near/457090036 Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20240815074519.2684107-3-nmi@metaspace.dk [ Applied `rustfmt`, reworded slightly and made Zulip link a permalink. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-21powercap: intel_rapl: Fix off by one in get_rpi()Dan Carpenter
The rp->priv->rpi array is either rpi_msr or rpi_tpmi which have NR_RAPL_PRIMITIVES number of elements. Thus the > needs to be >= to prevent an off by one access. Fixes: 98ff639a7289 ("powercap: intel_rapl: Support per Interface primitive information") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Link: https://patch.msgid.link/86e3a059-504d-4795-a5ea-4a653f3b41f8@stanley.mountain Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-21ata: pata_macio: Use WARN instead of BUGMichael Ellerman
The overflow/underflow conditions in pata_macio_qc_prep() should never happen. But if they do there's no need to kill the system entirely, a WARN and failing the IO request should be sufficient and might allow the system to keep running. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2024-08-21ata: pata_macio: Fix DMA table overflowMichael Ellerman
Kolbjørn and Jonáš reported that their 32-bit PowerMacs were crashing in pata-macio since commit 09fe2bfa6b83 ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K"). For example: kernel BUG at drivers/ata/pata_macio.c:544! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac ... NIP pata_macio_qc_prep+0xf4/0x190 LR pata_macio_qc_prep+0xfc/0x190 Call Trace: 0xc1421660 (unreliable) ata_qc_issue+0x14c/0x2d4 __ata_scsi_queuecmd+0x200/0x53c ata_scsi_queuecmd+0x50/0xe0 scsi_queue_rq+0x788/0xb1c __blk_mq_issue_directly+0x58/0xf4 blk_mq_plug_issue_direct+0x8c/0x1b4 blk_mq_flush_plug_list.part.0+0x584/0x5e0 __blk_flush_plug+0xf8/0x194 __submit_bio+0x1b8/0x2e0 submit_bio_noacct_nocheck+0x230/0x304 btrfs_work_helper+0x200/0x338 process_one_work+0x1a8/0x338 worker_thread+0x364/0x4c0 kthread+0x100/0x104 start_kernel_thread+0x10/0x14 That commit increased max_segment_size to 64KB, with the justification that the SCSI core was already using that size when PAGE_SIZE == 64KB, and that there was existing logic to split over-sized requests. However with a sufficiently large request, the splitting logic causes each sg to be split into two commands in the DMA table, leading to overflow of the DMA table, triggering the BUG_ON(). With default settings the bug doesn't trigger, because the request size is limited by max_sectors_kb == 1280, however max_sectors_kb can be increased, and apparently some distros do that by default using udev rules. Fix the bug for 4KB kernels by reverting to the old max_segment_size. For 64KB kernels the sg_tablesize needs to be halved, to allow for the possibility that each sg will be split into two. Fixes: 09fe2bfa6b83 ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") Cc: stable@vger.kernel.org # v6.10+ Reported-by: Kolbjørn Barmen <linux-ppc@kolla.no> Closes: https://lore.kernel.org/all/62d248bb-e97a-25d2-bcf2-9160c518cae5@kolla.no/ Reported-by: Jonáš Vidra <vidra@ufal.mff.cuni.cz> Closes: https://lore.kernel.org/all/3b6441b8-06e6-45da-9e55-f92f2c86933e@ufal.mff.cuni.cz/ Tested-by: Kolbjørn Barmen <linux-ppc@kolla.no> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2024-08-20drm/amdgpu: fix eGPU hotplug regressionAlex Deucher
The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: 959056982a9b ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com> (cherry picked from commit c69b07f7bbc905022491c45097923d3487479529) Cc: stable@vger.kernel.org # 6.10.x
2024-08-20drm/amdgpu: Validate TA binary sizeCandice Li
Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442) Cc: stable@vger.kernel.org
2024-08-20drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1Alex Deucher
The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2dc3851ef7d9c5439ea8e9623fc36878f3b40649) Cc: stable@vger.kernel.org
2024-08-20drm/amdgpu: fixing rlc firmware loading failure issueYang Wang
Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit 849e133c973c ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: 3af2c80ae2f5 ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89ec85d16eb8110d88c273d1d34f1fe5a70ba8cc)
2024-08-20Merge tag '6.11-rc4-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - important reconnect fix - fix for memcpy issues on mount - two minor cleanup patches * tag '6.11-rc4-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Replace one-element arrays with flexible-array members ksmbd: fix spelling mistakes in documentation ksmbd: fix race condition between destroy_previous_session() and smb2 operations() ksmbd: Use unsafe_memcpy() for ntlm_negotiate
2024-08-20net: dsa: b53: Use dev_err_probe()Florian Fainelli
Rather than print an error even when we get -EPROBE_DEFER, use dev_err_probe() to filter out those messages. Link: https://github.com/openwrt/openwrt/pull/11680 Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240820004436.224603-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20Merge branch 'mptcp-pm-fix-ids-not-being-reusable'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: pm: fix IDs not being reusable Here are more fixes for the MPTCP in-kernel path-manager. In this series, the fixes are around the endpoint IDs not being reusable for on-going connections when re-creating endpoints with previously used IDs. - Patch 1 fixes this case for endpoints being used to send ADD_ADDR. Patch 2 validates this fix. The issue is present since v5.10. - Patch 3 fixes this case for endpoints being used to establish new subflows. Patch 4 validates this fix. The issue is present since v5.10. - Patch 5 fixes this case when all endpoints are flushed. Patch 6 validates this fix. The issue is present since v5.13. - Patch 7 removes a helper that is confusing, and introduced in v5.10. It helps simplifying the next patches. - Patch 8 makes sure a 'subflow' counter is only decremented when removing a 'subflow' endpoint. Can be backported up to v5.13. - Patch 9 is similar, but for a 'signal' counter. Can be backported up to v5.10. - Patch 10 checks the last max accepted ADD_ADDR limit before accepting new ADD_ADDR. For v5.10 as well. - Patch 11 removes a wrong restriction for the userspace PM, added during a refactoring in v6.5. - Patch 12 makes sure the fullmesh mode sets the ID 0 when a new subflow using the source address of the initial subflow is created. Patch 13 covers this case. This issue is present since v5.15. - Patch 14 avoid possible UaF when selecting an address from the endpoints list. ==================== Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-0-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: avoid possible UaF when selecting endpMatthieu Baerts (NGI0)
select_local_address() and select_signal_address() both select an endpoint entry from the list inside an RCU protected section, but return a reference to it, to be read later on. If the entry is dereferenced after the RCU unlock, reading info could cause a Use-after-Free. A simple solution is to copy the required info while inside the RCU protected section to avoid any risk of UaF later. The address ID might need to be modified later to handle the ID0 case later, so a copy seems OK to deal with. Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/45cd30d3-7710-491c-ae4d-a1368c00beb1@redhat.com Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-14-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20selftests: mptcp: join: validate fullmesh endp on 1st sfMatthieu Baerts (NGI0)
This case was not covered, and the wrong ID was set before the previous commit. The rest is not modified, it is just that it will increase the code coverage. The right address ID can be verified by looking at the packet traces. We could automate that using Netfilter with some cBPF code for example, but that's always a bit cryptic. Packetdrill seems better fitted for that. Fixes: 4f49d63352da ("selftests: mptcp: add fullmesh testcases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-13-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: fullmesh: select the right ID laterMatthieu Baerts (NGI0)
When reacting upon the reception of an ADD_ADDR, the in-kernel PM first looks for fullmesh endpoints. If there are some, it will pick them, using their entry ID. It should set the ID 0 when using the endpoint corresponding to the initial subflow, it is a special case imposed by the MPTCP specs. Note that msk->mpc_endpoint_id might not be set when receiving the first ADD_ADDR from the server. So better to compare the addresses. Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-12-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: only in-kernel cannot have entries with ID 0Matthieu Baerts (NGI0)
The ID 0 is specific per MPTCP connections. The per netns entries cannot have this special ID 0 then. But that's different for the userspace PM where the entries are per connection, they can then use this special ID 0. Fixes: f40be0db0b76 ("mptcp: unify pm get_flags_and_ifindex_by_id") Cc: stable@vger.kernel.org Acked-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-11-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDRMatthieu Baerts (NGI0)
The limits might have changed in between, it is best to check them before accepting new ADD_ADDR. Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-10-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: only decrement add_addr_accepted for MPJ reqMatthieu Baerts (NGI0)
Adding the following warning ... WARN_ON_ONCE(msk->pm.add_addr_accepted == 0) ... before decrementing the add_addr_accepted counter helped to find a bug when running the "remove single subflow" subtest from the mptcp_join.sh selftest. Removing a 'subflow' endpoint will first trigger a RM_ADDR, then the subflow closure. Before this patch, and upon the reception of the RM_ADDR, the other peer will then try to decrement this add_addr_accepted. That's not correct because the attached subflows have not been created upon the reception of an ADD_ADDR. A way to solve that is to decrement the counter only if the attached subflow was an MP_JOIN to a remote id that was not 0, and initiated by the host receiving the RM_ADDR. Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-9-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: only mark 'subflow' endp as availableMatthieu Baerts (NGI0)
Adding the following warning ... WARN_ON_ONCE(msk->pm.local_addr_used == 0) ... before decrementing the local_addr_used counter helped to find a bug when running the "remove single address" subtest from the mptcp_join.sh selftests. Removing a 'signal' endpoint will trigger the removal of all subflows linked to this endpoint via mptcp_pm_nl_rm_addr_or_subflow() with rm_type == MPTCP_MIB_RMSUBFLOW. This will decrement the local_addr_used counter, which is wrong in this case because this counter is linked to 'subflow' endpoints, and here it is a 'signal' endpoint that is being removed. Now, the counter is decremented, only if the ID is being used outside of mptcp_pm_nl_rm_addr_or_subflow(), only for 'subflow' endpoints, and if the ID is not 0 -- local_addr_used is not taking into account these ones. This marking of the ID as being available, and the decrement is done no matter if a subflow using this ID is currently available, because the subflow could have been closed before. Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-8-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: remove mptcp_pm_remove_subflow()Matthieu Baerts (NGI0)
This helper is confusing. It is in pm.c, but it is specific to the in-kernel PM and it cannot be used by the userspace one. Also, it simply calls one in-kernel specific function with the PM lock, while the similar mptcp_pm_remove_addr() helper requires the PM lock. What's left is the pr_debug(), which is not that useful, because a similar one is present in the only function called by this helper: mptcp_pm_nl_rm_subflow_received() After these modifications, this helper can be marked as 'static', and the lock can be taken only once in mptcp_pm_flush_addrs_and_subflows(). Note that it is not a bug fix, but it will help backporting the following commits. Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-7-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20selftests: mptcp: join: test for flush/re-add endpointsMatthieu Baerts (NGI0)
After having flushed endpoints that didn't cause the creation of new subflows, it is important to check endpoints can be re-created, re-using previously used IDs. Before the previous commit, the client would not have been able to re-create the subflow that was previously rejected. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-6-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20mptcp: pm: re-using ID of unused flushed subflowsMatthieu Baerts (NGI0)
If no subflows are attached to the 'subflow' endpoints that are being flushed, the corresponding addr IDs will not be marked as available again. Mark all ID as being available when flushing all the 'subflow' endpoints, and reset local_addr_used counter to cover these cases. Note that mptcp_pm_remove_addrs_and_subflows() helper is only called for flushing operations, not to remove a specific set of addresses and subflows. Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-5-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>