Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Allow built-in drivers, not just modular drivers, to use async
initial probing (Lukas Wunner)
- Support Immediate Readiness even on devices with no PM Capability
(Sean Christopherson)
- Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
required delay between a reset and sending config requests to a
device (Niklas Cassel)
- Add pci_is_display() to check for "Display" base class and use it
in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)
- Allow 'isolated PCI functions' (multi-function devices without a
function 0) for LoongArch, similar to s390 and jailhouse (Huacai
Chen)
Power control:
- Add ability to enable optional slot clock for cases where the PCIe
host controller and the slot are supplied by different clocks
(Marek Vasut)
PCIe native device hotplug:
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
misinterpreting a config read failure after a device has been
removed (Lukas Wunner)
- Avoid creating a useless PCIe port service device for pciehp if the
slot is handled by the ACPI hotplug driver (Lukas Wunner)
- Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
ports (Lukas Wunner)
Virtualization:
- Save VF resizable BAR state and restore it after reset (Michał
Winiarski)
- Allow IOV resources (VF BARs) to be resized (Michał Winiarski)
- Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
(Michał Winiarski)
Endpoint framework:
- Add RC-to-EP doorbell support using platform MSI controller,
including a test case (Frank Li)
- Allow BAR assignment via configfs so platforms have flexibility in
determining BAR usage (Jerome Brunet)
Native PCIe controller drivers:
- Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
DT schema format (Rob Herring)
- Use dev_fwnode() instead of of_fwnode_handle() to remove OF
dependency in altera (fixes an unused variable), designware-host,
mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
xilinx-nwl (Jiri Slaby, Arnd Bergmann)
- Convert aardvark, altera, brcmstb, designware-host, iproc,
mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
using msi_create_parent_irq_domain() instead; this makes the
interrupt controller per-PCI device, allows dynamic allocation of
vectors after initialization, and allows support of IMS (Nam Cao)
APM X-Gene PCIe controller driver:
- Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
bits, use device-managed memory allocations, and clean things up
(Marc Zyngier)
- Probe xgene-msi as a standard platform driver rather than a
subsys_initcall (Marc Zyngier)
Broadcom STB PCIe controller driver:
- Add optional DT 'num-lanes' property and if present, use it to
override the Maximum Link Width advertised in Link Capabilities
(Jim Quinlan)
Cadence PCIe controller driver:
- Use PCIe Message routing types from the PCI core rather than
defining private ones (Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)
- Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
(Richard Zhu)
- Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
trigger doorbel on Endpoint (Frank Li)
- Remove apps_reset (LTSSM_EN) from
imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
regression on i.MX8MM (Richard Zhu)
- Delay Endpoint link start until configfs 'start' written (Richard
Zhu)
Intel VMD host bridge driver:
- Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)
Qualcomm PCIe controller driver:
- Add DT binding and driver support for SA8255p, which supports ECAM
for Configuration Space access (Mayank Rana)
- Update DT binding and driver to describe PHYs and per-Root Port
resets in a Root Port stanza and deprecate describing them in the
host bridge; this makes it possible to support multiple Root Ports
in the future (Krishna Chaitanya Chundru)
- Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)
- Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)
- Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
bindings (Konrad Dybcio)
- Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
Zhang)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Rockchip PCIe controller driver:
- Drop unused PCIe Message routing and code definitions (Hans Zhang)
- Remove several unused header includes (Hans Zhang)
- Use standard PCIe config register definitions instead of
rockchip-specific redefinitions (Geraldo Nascimento)
- Set Target Link Speed to 5.0 GT/s before retraining so we have a
chance to train at a higher speed (Geraldo Nascimento)
Rockchip DesignWare PCIe controller driver:
- Prevent race between link training and register update via DBI by
inhibiting link training after hot reset and link down (Wilfred
Mallawa)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Sophgo PCIe controller driver:
- Add DT binding and driver for Sophgo SG2044 PCIe controller driver
in Root Complex mode (Inochi Amaoto)
Synopsys DesignWare PCIe controller driver:
- Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
Ports that support > 5.0 GT/s. Slower Ports still rely on the
not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
waiting for the Link (Niklas Cassel)"
* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
dt-bindings: PCI: Remove 83xx-512x-pci.txt
dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
PCI: Move is_pciehp check out of pciehp_is_native()
PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
selftests: pci_endpoint: Add doorbell test case
misc: pci_endpoint_test: Add doorbell test case
PCI: endpoint: pci-epf-test: Add doorbell test support
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
PCI: vmd: Switch to msi_create_parent_irq_domain()
PCI: vmd: Convert to lock guards
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so
it supports arbitrary hart indices
- Add a command line parameter and related handling to disable the
generic RISCV IMSIC mechanism on platforms which use a trap-emulated
IMSIC. Unfortunatly this is required because there is no mechanism
available to discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend
variants.
- The usual cleanups and improvements all over the place
* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/riscv-imsic: Add kernel parameter to disable IPIs
irqchip/gic-v3: Fix GICD_CTLR register naming
irqchip/ls-scfg-msi: Fix NULL dereference in error handling
irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Convert to __free
irqchip/alpine-msi: Convert to lock guards
irqchip/alpine-msi: Clean up whitespace style
irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
irqdomain: Add device pointer to irq_domain_info and msi_domain_info
irqchip/renesas-rzv2h: Remove unneeded includes
irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/aslint-sswi: Resolve hart index
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
- Prevent a interrupt migration related live lock in handle_edge_irq()
If the interrupt affinity is moved to a new target CPU and the
interrupt is currently handled on the previous target CPU for edge
type interrupts the handler might get stuck on the previous target
for a long time, which causes both involved CPUs to waste cycles and
eventually run into a soft-lockup situation.
Solve this by checking whether the interrupt is redirected to a new
target CPU and if the interrupt is handled on that new target CPU,
busy wait for completion instead of masking it and sending the
pending but which would cause the old CPU to re-run the handler and
in the worst case repeating this excercise for a long time.
This only works on architectures which use single CPU interrupt
targets, but that's so far the only ones where this behaviour has
been observed.
- Add a kunit test for interrupt disable depth counts
The nested interrupt disable depth has been an issue in the past
especially vs. free_irq(), interrupt shutdown and CPU hotplug and
their interactions. The test exercises the combinations of these
scenarios and checks for correctness.
* tag 'irq-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent migration live lock in handle_edge_irq()
genirq: Split up irq_pm_check_wakeup()
genirq: Move irq_wait_for_poll() to call site
genirq: Remove pointless local variable
genirq: Add kunit tests for depth counts
|
|
It appears that the defect outlined in 9c15eeb5362c4 ("genirq: Allow
fasteoi handler to resend interrupts on concurrent handling") also
affects some other less stellar MSI controllers, this time using
the handle_simple_irq() flow.
Teach this flow about irqd_needs_resend_when_in_progress(). Given
the invasive nature of this workaround, only this flow is updated.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250708173404.1278635-2-maz@kernel.org
|
|
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.
If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:
CPU 0 (previous target) CPU 1 (new target)
handle_edge_irq()
repeat:
handle_event() handle_edge_irq()
if (INPROGESS) {
set(PENDING);
mask();
return;
}
if (PENDING) {
clear(PENDING);
unmask();
goto repeat;
}
The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.
Prevent this by checking whether the CPU which observes the INPROGRESS flag
is the new affinity target. If that's the case, do not set the PENDING flag
and wait for the INPROGRESS flag to be cleared instead, so that the new
interrupt is handled on the new target CPU and the previous CPU is released
from the action.
This is restricted to the edge type handler and only utilized on systems,
which use single CPU targets for interrupt affinity.
Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
Reported-by: Liangyan <liangyan.peng@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
|
|
Let the calling code check for the IRQD_WAKEUP_ARMED flag to prepare for a
live lock mitigation in the edge type handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Link: https://lore.kernel.org/all/20250718185312.012392426@linutronix.de
|
|
Move it to the call site so that the waiting for the INPROGRESS flag can be
reused by an upcoming mitigation for a potential live lock in the edge type
handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250718185311.948555026@linutronix.de
|
|
The variable is only used at one place, which can simply take the constant
as function argument.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Link: https://lore.kernel.org/all/20250718185311.884314473@linutronix.de
|
|
Export irq_domain_free_irqs_top(), making it usable for drivers compiled as
modules.
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add device pointer to irq_domain_info and msi_domain_info, so that the device
can be specified at domain creation time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/943e52403b20cf13c320d55bd4446b4562466aab.1750860131.git.namcao@linutronix.de
|
|
group_cpu_evenly() might have allocated less groups then requested:
group_cpu_evenly()
__group_cpus_evenly()
alloc_nodes_groups()
# allocated total groups may be less than numgrps when
# active total CPU number is less then numgrps
In this case, the caller will do an out of bound access because the
caller assumes the masks returned has numgrps.
Return the number of groups created so the caller can limit the access
range accordingly.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-1-13923686b54b@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Initialize `ops` member's pointers properly by using kzalloc() instead of
kmalloc() when allocating the simulation work context. Otherwise the
pointers contain random content leading to invalid dereferencing.
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250612124827.63259-1-gye976@gmail.com
|
|
There have been a few bugs and/or misunderstandings about the reference
counting, and startup/shutdown behaviors in the IRQ core and related CPU
hotplug code. These 4 test cases try to capture a few interesting cases.
* irq_disable_depth_test: basic request/disable/enable sequence
* irq_free_disabled_test: request/disable/free/re-request sequence -
this catches errors on previous revisions of my work
* irq_cpuhotplug_test: exercises managed-affinity IRQ + CPU hotplug.
This captures a problematic test case which was fixed recently.
This test requires CONFIG_SMP and a hotpluggable CPU#1.
* irq_shutdown_depth_test: exercises similar behavior from
irq_cpuhotplug_test, but directly using irq_*() APIs instead of going
through CPU hotplug. This still requires CONFIG_SMP, because
managed-affinity is stubbed out (and not all APIs are even present)
without it.
Note the use of 'imply SMP': ARCH=um doesn't support SMP, and kunit is
often exercised there. Thus, 'imply' will force SMP on where possible
(such as ARCH=x86_64), but leave it off where it's not.
Behavior on various SMP and ARCH configurations:
$ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64 --qemu_args '-smp 2'
[...]
[11:12:24] Testing complete. Ran 4 tests: passed: 4
$ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64
[...]
[11:13:27] [SKIPPED] irq_cpuhotplug_test
[11:13:27] ================= [PASSED] irq_test_cases ==================
[11:13:27] ============================================================
[11:13:27] Testing complete. Ran 4 tests: passed: 3, skipped: 1
# default: ARCH=um
$ tools/testing/kunit/kunit.py run 'irq_test_cases*'
[11:14:26] [SKIPPED] irq_shutdown_depth_test
[11:14:26] [SKIPPED] irq_cpuhotplug_test
[11:14:26] ================= [PASSED] irq_test_cases ==================
[11:14:26] ============================================================
[11:14:26] Testing complete. Ran 4 tests: passed: 2, skipped: 2
Without commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug"), this fails as follows:
[11:18:55] =============== irq_test_cases (4 subtests) ================
[11:18:55] [PASSED] irq_disable_depth_test
[11:18:55] [PASSED] irq_free_disabled_test
[11:18:55] # irq_shutdown_depth_test: EXPECTATION FAILED at kernel/irq/irq_test.c:147
[11:18:55] Expected desc->depth == 1, but
[11:18:55] desc->depth == 0 (0x0)
[11:18:55] ------------[ cut here ]------------
[11:18:55] Unbalanced enable for IRQ 26
[11:18:55] WARNING: CPU: 1 PID: 36 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
...
[11:18:55] [FAILED] irq_shutdown_depth_test
[11:18:55] #1
[11:18:55] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:202
[11:18:55] Expected irqd_is_activated(data) to be false, but is true
[11:18:55] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:203
[11:18:55] Expected irqd_is_started(data) to be false, but is true
[11:18:55] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:204
[11:18:55] Expected desc->depth == 1, but
[11:18:55] desc->depth == 0 (0x0)
[11:18:55] ------------[ cut here ]------------
[11:18:55] Unbalanced enable for IRQ 27
[11:18:55] WARNING: CPU: 0 PID: 38 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
...
[11:18:55] [FAILED] irq_cpuhotplug_test
[11:18:55] # module: irq_test
[11:18:55] # irq_test_cases: pass:2 fail:2 skip:0 total:4
[11:18:55] # Totals: pass:2 fail:2 skip:0 total:4
[11:18:55] ================= [FAILED] irq_test_cases ==================
[11:18:55] ============================================================
[11:18:55] Testing complete. Ran 4 tests: passed: 2, failed: 2
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250522210837.4135244-1-briannorris@chromium.org
|
|
Commit 788019eb559f ("genirq: Retain disable depth for managed interrupts
across CPU hotplug") tried to make managed shutdown/startup properly
reference counted, but it missed the fact that the unplug and hotplug code
has an intentional imbalance by skipping IRQS_SUSPENDED interrupts on
the "restore" path.
This means that if a managed-affinity interrupt was both suspended and
managed-shutdown (such as may happen during system suspend / S3), resume
skips calling irq_startup_managed(), and would again have an unbalanced
depth this time, with a positive value (i.e., remaining unexpectedly
masked).
This IRQS_SUSPENDED check was introduced in commit a60dd06af674
("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity")
for essentially the same reason as commit 788019eb559f, to prevent that
irq_startup() would unconditionally re-enable an interrupt too early.
Because irq_startup_managed() now respsects the disable-depth count, the
IRQS_SUSPENDED check is not longer needed, and instead, it causes harm.
Thus, drop the IRQS_SUSPENDED check, and restore balance.
This effectively reverts commit a60dd06af674 ("genirq/cpuhotplug: Skip
suspended interrupts when restoring affinity"), because it is replaced
by commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug").
Fixes: 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Link: https://lore.kernel.org/all/20250612183303.3433234-3-briannorris@chromium.org
Closes: https://lore.kernel.org/lkml/24ec4adc-7c80-49e9-93ee-19908a97ab84@gmail.com/
|
|
Commit 788019eb559f ("genirq: Retain disable depth for managed interrupts
across CPU hotplug") intended to only decrement the disable depth once per
managed shutdown, but instead it decrements for each CPU hotplug in the
affinity mask, until its depth reaches a point where it finally gets
re-started.
For example, consider:
1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 1
6. CPU M goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 0
*** BUG: driver expects the interrupt is still disabled ***
-> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
This should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.
Note: It might be cleaner to also remove the irqd_clr_managed_shutdown()
invocation from __irq_startup_managed(). But this is currently not possible
because of irq_update_affinity_desc() as it sets IRQD_MANAGED_SHUTDOWN and
expects irq_startup() to clear it.
Fixes: 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Link: https://lore.kernel.org/all/20250612183303.3433234-2-briannorris@chromium.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for the MSI subsystem (core code and PCI):
- Switch the MSI descriptor locking to lock guards
- Replace a broken and naive implementation of PCI/MSI-X control word
updates in the PCI/TPH driver with a properly serialized variant in
the PCI/MSI core code.
- Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
replacing the direct access to the MSI descriptors with the proper
API function calls. People will never understand that APIs exist
for a reason...
- Provide core infrastructre for the upcoming PCI endpoint library
extensions. Currently limited to ARM GICv3+, but in theory
extensible to other architectures.
- Provide a MSI domain::teardown() callback, which allows drivers to
undo the effects of the prepare() callback.
- Move the MSI domain::prepare() callback invocation to domain
creation time to avoid redundant (and in case of ARM/GIC-V3-ITS
confusing) invocations on every allocation.
In combination with the new teardown callback this removes some
ugly hacks in the GIC-V3-ITS driver, which pretended to work around
the short comings of the core code so far. With this update the
code is correct by design and implementation.
- Make the irqchip MSI library globally available, provide a MSI
parent domain creation helper and convert a bunch of (PCI/)MSI
drivers over to the modern MSI parent mechanism. This is the first
step to get rid of at least one incarnation of the three PCI/MSI
management schemes.
- The usual small cleanups and improvements"
* tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
PCI/MSI: Use bool for MSI enable state tracking
PCI: tegra: Convert to MSI parent infrastructure
PCI: xgene: Convert to MSI parent infrastructure
PCI: apple: Convert to MSI parent infrastructure
irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag
irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper
irqchip/gic: Convert to msi_create_parent_irq_domain() helper
genirq/msi: Add helper for creating MSI-parent irq domains
irqchip: Make irq-msi-lib.h globally available
irqchip/gic-v3-its: Use allocation size from the prepare call
genirq/msi: Engage the .msi_teardown() callback on domain removal
genirq/msi: Move prepare() call to per-device allocation
irqchip/gic-v3-its: Implement .msi_teardown() callback
genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
genirq/msi: Rename msi_[un]lock_descs()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Thomas Gleixner:
"A set of cleanups for the generic interrupt subsystem:
- Consolidate on one set of functions for the interrupt domain code
to get rid of pointlessly duplicated code with only marginal
different semantics.
- Update the documentation accordingly and consolidate the coding
style of the irqdomain header"
* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
irqdomain: Consolidate coding style
irqdomain: Fix kernel-doc and add it to Documentation
Documentation: irqdomain: Update it
Documentation: irq-domain.rst: Simple improvements
Documentation: irq/concepts: Minor improvements
Documentation: irq/concepts: Add commas and reflow
irqdomain: Improve kernel-docs of functions
irqdomain: Make struct irq_domain_info variables const
irqdomain: Use irq_domain_instantiate()'s return value as initializers
irqdomain: Drop irq_linear_revmap()
pinctrl: keembay: Switch to irq_find_mapping()
irqchip/armada-370-xp: Switch to irq_find_mapping()
gpu: ipu-v3: Switch to irq_find_mapping()
gpio: idt3243x: Switch to irq_find_mapping()
sh: Switch to irq_find_mapping()
powerpc: Switch to irq_find_mapping()
irqdomain: Drop irq_domain_add_*() functions
powerpc: Switch irq_domain_add_nomap() to use fwnode
thermal: Switch to irq_domain_create_linear()
soc: Switch to irq_domain_create_*()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq controller updates from Thomas Gleixner:
"Update for interrupt chip drivers:
- Convert the generic interrupt chip to lock guards to remove copy &
pasta boilerplate code and gotos.
- A new driver fot the interrupt controller in the EcoNet EN751221
MIPS SoC.
- Extend the SG2042-MSI driver to support the new SG2044 SoC
- Updates and cleanups for the (ancient) VT8500 driver
- Improve the scalability of the ARM GICV4.1 ITS driver by utilizing
node local copies a VM's interrupt translation table when possible.
This results in a 12% reduction of VM IPI latency in certain
workloads.
- The usual cleanups and improvements all over the place"
* tag 'irq-drivers-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/irq-pruss-intc: Simplify chained interrupt handler setup
irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGI
irqchip/econet-en751221: Switch to of_fwnode_handle()
irqchip/irq-vt8500: Switch to irq_domain_create_*()
irqchip/econet-en751221: Switch to irq_domain_create_linear()
irqchip/irq-vt8500: Use fewer global variables and add error handling
irqchip/irq-vt8500: Use a dedicated chained handler function
irqchip/irq-vt8500: Don't require 8 interrupts from a chained controller
irqchip/irq-vt8500: Drop redundant copy of the device node pointer
irqchip/irq-vt8500: Split up ack/mask functions
irqchip/sg2042-msi: Fix wrong type cast in sg2044_msi_irq_ack()
irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller
irqchip/sg2042-msi: Introduce configurable chipinfo for SG2042
irqchip/sg2042-msi: Rename functions and data structures to be SG2042 agnostic
dt-bindings: interrupt-controller: Add Sophgo SG2044 MSI controller
genirq/generic-chip: Fix incorrect lock guard conversions
genirq/generic-chip: Remove unused lock wrappers
irqchip: Convert generic irqchip locking to guards
gpio: mvebu: Convert generic irqchip locking to guard()
ARM: orion/gpio:: Convert generic irqchip locking to guard()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
"Updates for the generic interrupt subsystem core code:
- Address a long standing subtle problem in the CPU hotplug code for
affinity-managed interrupts.
Affinity-managed interrupts are shut down by the core code when the
last CPU in the affinity set goes offline and started up again when
the first CPU in the affinity set becomes online again.
This unfortunately does not take into account whether an interrupt
has been disabled before the last CPU goes offline and starts up
the interrupt unconditionally when the first CPU becomes online
again.
That's obviously not what drivers expect.
Address this by preserving the disabled state for affinity-managed
interrupts accross these CPU hotplug operations. All non-managed
interrupts are not affected by this because startup/shutdown is
coupled to request/free_irq() which obviously has to reset state.
- Support three-cell scheme interrupts to allow GPIO drivers to
specify interrupts from an already existing scheme
- Switch the interrupt subsystem core to lock guards. This gets rid
of quite some copy & pasta boilerplate code all over the place.
- The usual small cleanups and improvements all over the place"
* tag 'irq-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
genirq/irqdesc: Remove double locking in hwirq_show()
genirq: Retain disable depth for managed interrupts across CPU hotplug
genirq: Bump the size of the local variable for sprintf()
genirq/manage: Use the correct lock guard in irq_set_irq_wake()
genirq: Consistently use '%u' format specifier for unsigned int variables
genirq: Ensure flags in lock guard is consistently initialized
genirq: Fix inverted condition in handle_nested_irq()
genirq/cpuhotplug: Fix up lock guards conversion brainf..t
genirq: Use scoped_guard() to shut clang up
genirq: Remove unused remove_percpu_irq()
genirq: Remove irq_[get|put]_desc*()
genirq/manage: Rework irq_set_irqchip_state()
genirq/manage: Rework irq_get_irqchip_state()
genirq/manage: Rework teardown_percpu_nmi()
genirq/manage: Rework prepare_percpu_nmi()
genirq/manage: Rework disable_percpu_irq()
genirq/manage: Rework irq_percpu_is_enabled()
genirq/manage: Rework enable_percpu_irq()
genirq/manage: Rework irq_set_parent()
genirq/manage: Rework can_request_irq()
...
|
|
&desc->lock is acquired on 2 consecutive lines in hwirq_show(). This leads
obviously to a deadlock. Drop the raw_spin_lock_irq() and keep guard().
Fixes: 5d964a9f7cd8 ("genirq/irqdesc: Switch to lock guards")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250521142541.3832130-1-claudiu.beznea.uj@bp.renesas.com
|
|
Creating an irq domain that serves as an MSI parent requires
a substantial amount of esoteric boiler-plate code, some of
which is often provided twice (such as the bus token).
To make things a bit simpler for the unsuspecting MSI tinkerer,
provide a helper that does it for them, and serves as documentation
of what needs to be provided.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-3-maz@kernel.org
|
|
irqdomain.c's kernel-doc exists, but is not plugged into Documentation/
yet.
Before plugging it in, fix it first: irq_domain_get_irq_data() and
irq_domain_set_info() were documented twice. Identically, by both
definitions for CONFIG_IRQ_DOMAIN_HIERARCHY and !CONFIG_IRQ_DOMAIN_HIERARCHY.
Therefore, switch the second kernel-doc into an ordinary comment -- change
"/**" to simple "/*". This avoids sphinx's: WARNING: Duplicate C
declaration
Next, in commit b7b377332b96 ("irqdomain: Fix the kernel-doc and plug it
into Documentation"), irqdomain.h's (header) kernel-doc was added into
core-api/genericirq.rst. But given the amount of irqdomain functions and
structures, move all these to core-api/irq/irq-domain.rst now.
Finally, add these newly fixed irqdomain.c's (source) docs there as
well.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-58-jirislaby@kernel.org
|
|
Most irq_domain_add_*() functions are unused now, so drop them. The
remaining ones are moved to the deprecated section and will be removed
during the merge window after the patches in various trees have been
merged.
Note: The Chinese docs are touched but unfinished. I cannot parse those.
[ tglx: Remove the leftover in irq-domain.rst and handle merge logistics ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-41-jirislaby@kernel.org
|
|
There is no reason to export the function as an extra symbol. It is
simple enough and is just a wrapper to already exported functions.
Therefore, switch the exported function to an inline.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-13-jirislaby@kernel.org
|
|
All uses of of_node_to_fwnode() in non-irqdomain code were changed to
"officially" defined of_fwnode_handle(). Therefore, the former can be
dropped along with the last uses in the irqdomain code.
Due to merge logistics the inline cannot be dropped immediately. Move it to
a deprecated section, which will be removed during the merge window.
[ tglx: Handle merge logistics ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-12-jirislaby@kernel.org
|
|
Affinity-managed interrupts can be shut down and restarted during CPU
hotunplug/plug. Thereby the interrupt may be left in an unexpected state.
Specifically:
1. Interrupt is affine to CPU N
2. disable_irq() -> depth is 1
3. CPU N goes offline
4. irq_shutdown() -> depth is set to 1 (again)
5. CPU N goes online
6. irq_startup() -> depth is set to 0 (BUG! driver expects that the interrupt
still disabled)
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
This is only a problem for managed interrupts and CPU hotplug, all other
cases like request()/free()/request() truly needs to reset a possibly stale
disable depth value.
Provide a startup function, which takes the disable depth into account, and
invoked it for the managed interrupts in the CPU hotplug path.
This requires to change irq_shutdown() to do a depth increment instead of
setting it to 1, which allows to retain the disable depth, but is harmless
for the other code paths using irq_startup(), which will still reset the
disable depth unconditionally to keep the original correct behaviour.
A kunit tests will be added separately to cover some of these aspects.
[ tglx: Massaged changelog ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250514201353.3481400-2-briannorris@chromium.org
|
|
GCC is not happy about a sprintf() call on a buffer that might be too small
for the given formatting string.
kernel/irq/debugfs.c:233:26: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
Fix this by bumping the size of the local variable for sprintf().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250515085516.2913290-1-andriy.shevchenko@linux.intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202505151057.xbyXAbEn-lkp@intel.com/
|
|
Kindly inform the MSI driver that the domain is torn down, providing the
allocation context previously populated on domain creation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-5-maz@kernel.org
|
|
The current device MSI infrastructure is subtly broken, as it will issue an
.msi_prepare() callback into the MSI controller driver every time it needs
to allocate an MSI. That's pretty wrong, as the contract (or unwarranted
assumption, depending who you ask) between the MSI controller and the core
code is that .msi_prepare() is called exactly once per device.
This leads to some subtle breakage in some MSI controller drivers, as it
gives the impression that there are multiple endpoints sharing a bus
identifier (RID in PCI parlance, DID for GICv3+). It implies that whatever
allocation the ITS driver (for example) has done on behalf of these devices
cannot be undone, as there is no way to track the shared state. This is
particularly bad for wire-MSI devices, for which .msi_prepare() is called
for each input line.
To address this issue, move the call to .msi_prepare() to take place at the
point of irq domain allocation, which is the only place that makes
sense. The msi_alloc_info_t structure is made part of the
msi_domain_template, so that its life-cycle is that of the domain as well.
Finally, the msi_info::alloc_data field is made to point at this allocation
tracking structure, ensuring that it is carried around the block.
This is all pretty straightforward, except for the non-device-MSI
leftovers, which still have to call .msi_prepare() at the old spot. One
day...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-4-maz@kernel.org
|
|
The ITS driver currently nukes the structure representing an endpoint
device translating via an ITS on freeing the last LPI allocated for it.
That's an unfortunate state of affair, as it is pretty common for a driver
to allocate a single MSI, do something clever, teardown this MSI, and
reallocate a whole bunch of them. The NVME driver does exactly that,
amongst others.
What happens in that case is that the core code is accidentaly issuing
another .msi_prepare() call, even if it shouldn't. This luckily cancels
the above behaviour and hides the problem.
In order to fix the core code, start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect is that
the its_dev structure will not be freed and that the DID will stay
mapped. Not a big deal, and this will be solved in following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-3-maz@kernel.org
|
|
While the MSI ops do have a .msi_prepare() callback that is responsible for
setting up the relevant (usually per-device) allocation, there is no
callback reversing this setup.
For this purpose, add .msi_teardown() callback.
In order to avoid breaking the ITS driver that suffers from related issues,
do not call the callback just yet.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-2-maz@kernel.org
|
|
Commit 8589e325ba4f ("genirq/manage: Rework irq_set_irq_wake()") updated
the irq_set_irq_wake() to use the new guards for locking the interrupt
descriptor.
However, in doing so it inadvertently changed irq_set_irq_wake() such that
the 'chip_bus_lock' is no longer acquired. This has caused system suspend
tests to fail on some Tegra platforms.
Fix this by correcting the guard used in irq_set_irq_wake() to ensure the
'chip_bus_lock' is held.
Fixes: 8589e325ba4f ("genirq/manage: Rework irq_set_irq_wake()")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250514095041.1109783-1-jonathanh@nvidia.com
|
|
There are three cases in the genirq code when the irq, as an unsigned
integer variable, is converted to text representation by sprintf().
In two cases it uses '%d' specifier which is for signed values. While
it's not a problem right now, potentially it might be in the future
in case too big (> INT_MAX) number will appear there.
Consistently use '%u' format specifier for @irq which is declared as
unsigned int in all these cases.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250509154643.1499171-1-andriy.shevchenko@linux.intel.com
|
|
After the conversion to locking guards within the interrupt core code,
several builds with clang show the "Interrupts were enabled early"
WARN() in start_kernel() on boot.
In class_irqdesc_lock_constructor(), _t.flags is initialized via
__irq_get_desc_lock() within the _t initializer list. However, the C11
standard 6.7.9.23 states that the evaluation of the initialization list
expressions are indeterminately sequenced relative to one another,
meaning _t.flags could be initialized by __irq_get_desc_lock() then be
initialized to zero due to flags being absent from the initializer list.
To ensure _t.flags is consistently initialized, move the call to
__irq_get_desc_lock() and the assignment of its result to _t.lock out of
the designated initializer.
Fixes: 0f70a49f3fa3 ("genirq: Provide conditional lock guards")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250513-irq-guards-fix-flags-init-v1-1-1dca3f5992d6@kernel.org
|
|
Marek reported that the rework of handle_nested_irq() introduced a inverted
condition, which prevents handling of interrupts. Fix it up.
Fixes: 2ef2e13094c7 ("genirq/chip: Rework handle_nested_irq()")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel/org/all/46ed4040-ca11-4157-8bd7-13c04c113734@samsung.com
|
|
The lock guard conversion converted raw_spin_lock_irq() to
scoped_guard(raw_spinlock), which is obviously bogus and makes lockdep
mightily unhappy.
Note to self: Copy and pasta without using brain is a patently bad idea.
Fixes: 88a4df117ad6 ("genirq/cpuhotplug: Convert to lock guards")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
|
|
This code pattern trips clang up:
if (fail)
goto undo;
guard(lock)(lock);
do_stuff();
return 0;
undo:
...
as it somehow extends the scope of the guard beyond the return statement.
Replace it with a scoped guard to help it to get its act together.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/oe-kbuild-all/202505071809.ajpPxfoZ-lkp@intel.com/
|
|
remove_percpu_irq() has been unused since it was added in 2011 by
commit 31d9d9b6d830 ("genirq: Add support for per-cpu dev_id interrupts")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250420164656.112641-1-linux@treblig.org
|
|
All users are converted to the guards. Remove the helpers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.729586582@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.670808288@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.612184618@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.552884529@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.494561120@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.435932527@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.376836282@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.315844964@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.258216558@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
Make the return value boolean to reflect it's meaning.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250429065422.187250840@linutronix.de
|
|
Use the new guards to get and lock the interrupt descriptor and tidy up the
code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/87ldrhq0hc.ffs@tglx
|