Age | Commit message (Collapse) | Author |
|
Pull xtensa fixes from Max Filippov:
- fix unused variable warning caused by empty flush_dcache_page()
definition
- fix stack unwinding on windowed noMMU XIP configurations
- fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform
code
* tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros
tty: xtensa/iss: Use min() to fix Coccinelle warning
xtensa: fix MAKE_PC_FROM_RA second argument
|
|
With recent sanity checks for topology information added, there are now
warnings issued for APs when running as a Xen PV guest:
[Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
This is due to the initial APIC ID obtained via CPUID for PV guests is
always 0.
Avoid the warnings by synthesizing the CPUID data to contain the same
initial APIC ID as xen_pv_smp_config() is using for registering the
APIC IDs of all CPUs.
Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete")
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Contrary to i915, in xe ADL-N is kept as a different platform, not a
subplatform of ADL-P. Since the display side doesn't need to
differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called,
just fixup the compat header to check for both P and N.
Moving ADL-N to be a subplatform would be more complex as the firmware
loading in xe only handles platforms, not subplatforms, as going forward
the direction is to check on IP version rather than
platforms/subplatforms.
Fix warning when initializing display:
xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
------------[ cut here ]------------
xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P))
And wrong paths being taken on the display side.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425181610.2704633-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6a2a90cba12b42eb96c2af3426b77ceb4be31df2)
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
"Two driver fixes:
- The firewire-ohci driver for 1394 OHCI hardware does not fill time
stamp for response packet when handling asynchronous transaction to
local destination. This brings an inconvenience that the response
packet is not equivalent between the transaction to local and
remote. It is fixed by fulfilling the time stamp with hardware
time. The fix should be applied to Linux kernel v6.5 or later as
well.
- The nosy driver for Texas Instruments TSB12LV21A (PCILynx) has
long-standing issue about the behaviour when user space application
passes less size of buffer than expected. It is fixed by returning
zero according to the convention of UNIX-like systems"
* tag 'firewire-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: fulfill timestamp for some local asynchronous transaction
firewire: nosy: ensure user_length is taken into account when fetching packet contents
|
|
The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.
The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.
Reported-by: Juergen Gross <jgross@suse.com>
Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Fix a memory leak and a few locking issues (that may cause the kernel
to crash in principle if all goes wrong) in the thermal debug code
introduced during the 6.8 development cycle"
* tag 'thermal-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/debugfs: Prevent use-after-free from occurring after cdev removal
thermal/debugfs: Fix two locking issues with thermal zone debug
thermal/debugfs: Free all thermal zone debug memory on zone removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
|
|
The kvm_pi_irte_update tracepoint is called from both SVM and VMX vendor
code, and while the "posted interrupt" naming is also adopted by SVM in
several places, VT-d specifically refers to Intel's "Virtualization
Technology for Directed I/O".
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Link: https://lore.kernel.org/r/20240418021823.1275276-3-alejandro.j.jimenez@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use the APICv enablement status to determine if APICV_INHIBIT_REASON_ABSENT
needs to be set, instead of unconditionally setting the reason during
initialization.
Specifically, in cases where AVIC is disabled via module parameter or lack
of hardware support, unconditionally setting an inhibit reason due to the
absence of an in-kernel local APIC can lead to a scenario where the reason
incorrectly remains set after a local APIC has been created by either
KVM_CREATE_IRQCHIP or the enabling of KVM_CAP_IRQCHIP_SPLIT. This is
because the helpers in charge of removing the inhibit return early if
enable_apicv is not true, and therefore the bit remains set.
This leads to confusion as to the cause why APICv is not active, since an
incorrect reason will be reported by tracepoints and/or a debugging tool
that examines the currently set inhibit reasons.
Fixes: ef8b4b720368 ("KVM: ensure APICv is considered inactive if there is no APIC")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Link: https://lore.kernel.org/r/20240418021823.1275276-2-alejandro.j.jimenez@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add full memory barriers in kvm_mmu_track_write() and account_shadowed()
to plug a (very, very theoretical) race where kvm_mmu_track_write() could
miss a 0->1 transition of indirect_shadow_pages and fail to zap relevant,
*stale* SPTEs.
Without the barriers, because modern x86 CPUs allow (per the SDM):
Reads may be reordered with older writes to different locations but not
with older writes to the same location.
it's possible that the following could happen (terms of values being
visible/resolved):
CPU0 CPU1
read memory[gfn] (=Y)
memory[gfn] Y=>X
read indirect_shadow_pages (=0)
indirect_shadow_pages 0=>1
or conversely:
CPU0 CPU1
indirect_shadow_pages 0=>1
read indirect_shadow_pages (=0)
read memory[gfn] (=Y)
memory[gfn] Y=>X
E.g. in the below scenario, CPU0 could fail to zap SPTEs, and CPU1 could
fail to retry the faulting instruction, resulting in a KVM entering the
guest with a stale SPTE (map PTE=X instead of PTE=Y).
PTE = X;
CPU0:
emulator_write_phys()
PTE = Y
kvm_page_track_write()
kvm_mmu_track_write()
// memory barrier missing here
if (indirect_shadow_pages)
zap();
CPU1:
FNAME(page_fault)
FNAME(walk_addr)
FNAME(walk_addr_generic)
gw->pte = PTE; // X
FNAME(fetch)
kvm_mmu_get_child_sp
kvm_mmu_get_shadow_page
__kvm_mmu_get_shadow_page
kvm_mmu_alloc_shadow_page
account_shadowed
indirect_shadow_pages++
// memory barrier missing here
if (FNAME(gpte_changed)) // if (PTE == X)
return RET_PF_RETRY;
In practice, this bug likely cannot be observed as both the 0=>1
transition and reordering of this scope are extremely rare occurrences.
Note, if the cost of the barrier (which is simply a locked ADD, see commit
450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")),
is problematic, KVM could avoid the barrier by bailing earlier if checking
kvm_memslots_have_rmaps() is false. But the odds of the barrier being
problematic is extremely low, *and* the odds of the extra checks being
meaningfully faster overall is also low.
Link: https://lore.kernel.org/r/20240423193114.2887673-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Remove kvm_make_all_cpus_request_except() as it effectively has no users,
and arguably should never have been added in the first place.
Commit 54163a346d4a ("KVM: Introduce kvm_make_all_cpus_request_except()")
added the "except" variation for use in SVM's AVIC update path, which used
it to skip sending a request to the current vCPU (commit 7d611233b016
("KVM: SVM: Disable AVIC before setting V_IRQ")).
But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a
hack-a-fix that simply squashed the most likely scenario of a racy WARN
without addressing the underlying problem(s). Commit f1577ab21442 ("KVM:
SVM: svm_set_vintr don't warn if AVIC is active but is about to be
deactivated") eventually fixed the WARN itself, and the "except" usage was
subsequently dropped by df63202fe52b ("KVM: x86: APICv: drop immediate
APICv disablement on current vCPU").
That kvm_make_all_cpus_request_except() hasn't gained any users in the
last ~3 years isn't a coincidence. If a VM-wide broadcast *needs* to skip
the current vCPU, then odds are very good that there is underlying bug
that could be better fixed elsewhere.
Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Link: https://lore.kernel.org/r/20240404232651.1645176-1-venkateshs@chromium.org
[sean: rewrite changelog with --verbose]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When handling userspace writes to immutable feature MSRs for a vCPU that
has already run, fall through into the normal code to set the MSR instead
of immediately returning '0'. I.e. allow such writes, instead of ignoring
such writes. This fixes a bug where KVM incorrectly allows writes to the
VMX MSRs that enumerate which CR{0,4} can be set, but only if the vCPU has
already run.
The intent of returning '0' and thus ignoring the write, was to avoid any
side effects, e.g. refreshing the PMU and thus doing weird things with
perf events while the vCPU is running. That approach sounds nice in
theory, but in practice it makes it all but impossible to maintain a sane
ABI, e.g. all VMX MSRs return -EBUSY if the CPU is post-VMXON, and the VMX
MSRs for fixed-1 CR bits are never writable, etc.
As for refreshing the PMU, kvm_set_msr_common() explicitly skips the PMU
refresh if MSR_IA32_PERF_CAPABILITIES is being written with the current
value, specifically to avoid unwanted side effects. And if necessary,
adding similar logic for other MSRs is not difficult.
Fixes: 0094f62c7eaa ("KVM: x86: Disallow writes to immutable feature MSRs after KVM_RUN")
Reported-by: Jim Mattson <jmattson@google.com>
Cc: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20240408231500.1388122-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
git://git.infradead.org/nvme into block-6.9
Pull NVMe fixes from Keith.
* git://git.infradead.org/nvme:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
|
|
Using restricted DMA pools (CONFIG_DMA_RESTRICTED_POOL=y) in conjunction
with dynamic SWIOTLB (CONFIG_SWIOTLB_DYNAMIC=y) leads to the following
crash when initialising the restricted pools at boot-time:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
| Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
| pc : rmem_swiotlb_device_init+0xfc/0x1ec
| lr : rmem_swiotlb_device_init+0xf0/0x1ec
| Call trace:
| rmem_swiotlb_device_init+0xfc/0x1ec
| of_reserved_mem_device_init_by_idx+0x18c/0x238
| of_dma_configure_id+0x31c/0x33c
| platform_dma_configure+0x34/0x80
faddr2line reveals that the crash is in the list validation code:
include/linux/list.h:83
include/linux/rculist.h:79
include/linux/rculist.h:106
kernel/dma/swiotlb.c:306
kernel/dma/swiotlb.c:1695
because add_mem_pool() is trying to list_add_rcu() to a NULL
'mem->pools'.
Fix the crash by initialising the 'mem->pools' list_head in
rmem_swiotlb_device_init() before calling add_mem_pool().
Reported-by: Nikita Ioffe <ioffe@google.com>
Tested-by: Nikita Ioffe <ioffe@google.com>
Fixes: 1aaa736815eb ("swiotlb: allocate a new memory pool when existing pools are full")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Chris's email address bounces and lore hasn't seen an email
from anyone with his name for almost a decade.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240430233532.1356982-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
aelior@marvell.com bounces, we haven't seen Ariel on lore
since March 2022.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240430233305.1356105-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Richard Gobert says:
====================
net: gro: add flush/flush_id checks and fix wrong offset in udp
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to add network_offset and inner_network_offset to
napi_gro_cb and use these offsets for socket lookup.
In addition p->flush/flush_id should be checked in all UDP flows. The
same logic from tcp_gro_receive is applied for all flows in
udp_gro_receive_segment. This prevents packets with mismatching network
headers (flush/flush_id turned on) from merging in UDP GRO.
The original series includes a change to vxlan test which adds the local
parameter to prevent similar future bugs. I plan to submit it separately to
net-next.
This series is part of a previously submitted series to net-next:
https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/
v3 -> v4:
- Store network offsets, and use them only in udp_gro_complete flows
- Correct commit hash used in Fixes tag
- v3:
https://lore.kernel.org/netdev/20240424163045.123528-1-richardbgobert@gmail.com/
v2 -> v3:
- Add network_offsets and fix udp bug in a single commit to make backporting easier
- Write to inner_network_offset in {inet,ipv6}_gro_receive
- Use network_offsets union in tcp[46]_gro_complete as well
- v2:
https://lore.kernel.org/netdev/20240419153542.121087-1-richardbgobert@gmail.com/
v1 -> v2:
- Use network_offsets instead of p_poff param as suggested by Willem
- Check flush before postpull, and for all UDP GRO flows
- v1:
https://lore.kernel.org/netdev/20240412152120.115067-1-richardbgobert@gmail.com/
====================
Link: https://lore.kernel.org/r/20240430143555.126083-1-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.
Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
{inner_}network_offset to napi_gro_cb
Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
complete phase of gro. The functions always return skb->network_header,
which in the case of encapsulated packets at the gro complete phase, is
always set to the innermost L3 of the packet. That means that calling
{ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
L3/L4 may return an unexpected value.
This incorrect usage leads to a bug in GRO's UDP socket lookup.
udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
*_hdr functions return network_header which will point to the innermost L3,
resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
encapsulated packets.
This patch adds network_offset and inner_network_offset to napi_gro_cb, and
makes sure both are set correctly.
To fix the issue, network_offsets union is used inside napi_gro_cb, in
which both the outer and the inner network offsets are saved.
Reproduction example:
Endpoint configuration example (fou + local address bind)
# ip fou add port 6666 ipproto 4
# ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
# ip link set tun1 up
# ip a add 1.1.1.2/24 dev tun1
Netperf TCP_STREAM result on net-next before patch is applied:
net-next main, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.28 2.37
net-next main, GRO disabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.01 2745.06
patch applied, GRO enabled:
$ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 5.01 2877.38
Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb()
tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a
race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL
while __ip_make_skb() is running, the function will access icmphdr in the
skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL
on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These
are union in struct flowi4 and are implicitly initialized by
flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1]
BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
__ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
ip_finish_skb include/net/ip.h:243 [inline]
ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508
raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
__ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128
ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365
raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In one case the -1 is returned which is quite confusing code for
the wrong device ID, in another the ret is returning instead of
plain 0 that also confusing as readed may ask the possible meaning
of positive codes, which are never the case there. Convert both
to use explicit predefined error codes to make it clear what's going
on there.
Fixes: 5a04227326b0 ("drm/panel: Add ilitek ili9341 panel driver")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Link: https://lore.kernel.org/r/20240425142706.2440113-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425142706.2440113-4-andriy.shevchenko@linux.intel.com
|
|
GPIO controller might not be available when driver is being probed.
There are plenty of reasons why, one of which is deferred probe.
Since GPIOs are optional, return any error code we got to the upper
layer, including deferred probe. With that in mind, use dev_err_probe()
in order to avoid spamming the logs.
Fixes: 5a04227326b0 ("drm/panel: Add ilitek ili9341 panel driver")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Link: https://lore.kernel.org/r/20240425142706.2440113-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425142706.2440113-3-andriy.shevchenko@linux.intel.com
|
|
It seems driver missed the point of proper use of device property APIs.
Correct this by updating headers and calls respectively.
Fixes: 5a04227326b0 ("drm/panel: Add ilitek ili9341 panel driver")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240425142706.2440113-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425142706.2440113-2-andriy.shevchenko@linux.intel.com
|
|
Symptom:
When the hsuid attribute is set for the first time on an IQD Layer3
device while the corresponding network interface is already UP,
the kernel will try to execute a napi function pointer that is NULL.
Example:
---------------------------------------------------------------------------
[ 2057.572696] illegal operation: 0001 ilc:1 [#1] SMP
[ 2057.572702] Modules linked in: af_iucv qeth_l3 zfcp scsi_transport_fc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ghash_s390 prng xts aes_s390 des_s390 de
s_generic sha3_512_s390 sha3_256_s390 sha512_s390 vfio_ccw vfio_mdev mdev vfio_iommu_type1 eadm_sch vfio ext4 mbcache jbd2 qeth_l2 bridge stp llc dasd_eckd_mod qeth dasd_mod
qdio ccwgroup pkey zcrypt
[ 2057.572739] CPU: 6 PID: 60182 Comm: stress_client Kdump: loaded Not tainted 4.18.0-541.el8.s390x #1
[ 2057.572742] Hardware name: IBM 3931 A01 704 (LPAR)
[ 2057.572744] Krnl PSW : 0704f00180000000 0000000000000002 (0x2)
[ 2057.572748] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 2057.572751] Krnl GPRS: 0000000000000004 0000000000000000 00000000a3b008d8 0000000000000000
[ 2057.572754] 00000000a3b008d8 cb923a29c779abc5 0000000000000000 00000000814cfd80
[ 2057.572756] 000000000000012c 0000000000000000 00000000a3b008d8 00000000a3b008d8
[ 2057.572758] 00000000bab6d500 00000000814cfd80 0000000091317e46 00000000814cfc68
[ 2057.572762] Krnl Code:#0000000000000000: 0000 illegal
>0000000000000002: 0000 illegal
0000000000000004: 0000 illegal
0000000000000006: 0000 illegal
0000000000000008: 0000 illegal
000000000000000a: 0000 illegal
000000000000000c: 0000 illegal
000000000000000e: 0000 illegal
[ 2057.572800] Call Trace:
[ 2057.572801] ([<00000000ec639700>] 0xec639700)
[ 2057.572803] [<00000000913183e2>] net_rx_action+0x2ba/0x398
[ 2057.572809] [<0000000091515f76>] __do_softirq+0x11e/0x3a0
[ 2057.572813] [<0000000090ce160c>] do_softirq_own_stack+0x3c/0x58
[ 2057.572817] ([<0000000090d2cbd6>] do_softirq.part.1+0x56/0x60)
[ 2057.572822] [<0000000090d2cc60>] __local_bh_enable_ip+0x80/0x98
[ 2057.572825] [<0000000091314706>] __dev_queue_xmit+0x2be/0xd70
[ 2057.572827] [<000003ff803dd6d6>] afiucv_hs_send+0x24e/0x300 [af_iucv]
[ 2057.572830] [<000003ff803dd88a>] iucv_send_ctrl+0x102/0x138 [af_iucv]
[ 2057.572833] [<000003ff803de72a>] iucv_sock_connect+0x37a/0x468 [af_iucv]
[ 2057.572835] [<00000000912e7e90>] __sys_connect+0xa0/0xd8
[ 2057.572839] [<00000000912e9580>] sys_socketcall+0x228/0x348
[ 2057.572841] [<0000000091514e1a>] system_call+0x2a6/0x2c8
[ 2057.572843] Last Breaking-Event-Address:
[ 2057.572844] [<0000000091317e44>] __napi_poll+0x4c/0x1d8
[ 2057.572846]
[ 2057.572847] Kernel panic - not syncing: Fatal exception in interrupt
-------------------------------------------------------------------------------------------
Analysis:
There is one napi structure per out_q: card->qdio.out_qs[i].napi
The napi.poll functions are set during qeth_open().
Since
commit 1cfef80d4c2b ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
qeth_set_offline()/qeth_set_online() no longer call dev_close()/
dev_open(). So if qeth_free_qdio_queues() cleared
card->qdio.out_qs[i].napi.poll while the network interface was UP and the
card was offline, they are not set again.
Reproduction:
chzdev -e $devno layer2=0
ip link set dev $network_interface up
echo 0 > /sys/bus/ccwgroup/devices/0.0.$devno/online
echo foo > /sys/bus/ccwgroup/devices/0.0.$devno/hsuid
echo 1 > /sys/bus/ccwgroup/devices/0.0.$devno/online
-> Crash (can be enforced e.g. by af_iucv connect(), ip link down/up, ...)
Note that a Completion Queue (CQ) is only enabled or disabled, when hsuid
is set for the first time or when it is removed.
Workarounds:
- Set hsuid before setting the device online for the first time
or
- Use chzdev -d $devno; chzdev $devno hsuid=xxx; chzdev -e $devno;
to set hsuid on an existing device. (this will remove and recreate the
network interface)
Fix:
There is no need to free the output queues when a completion queue is
added or removed.
card->qdio.state now indicates whether the inbound buffer pool and the
outbound queues are allocated.
card->qdio.c_q indicates whether a CQ is allocated.
Fixes: 1cfef80d4c2b ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240430091004.2265683-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The alc_spec.power_hook is defined only with CONFIG_PM, and the recent
fix overlooked it, resulting in a build error without CONFIG_PM.
Fix it with the simple ifdef and set __maybe_unused for the function.
We may drop the whole CONFIG_PM dependency there, but it should be
done in a separate cleanup patch later.
Fixes: 1e707769df07 ("ALSA: hda/realtek - Set GPIO3 to default at S4 state for Thinkpad with ALC1318")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405012104.Dr7h318W-lkp@intel.com/
Message-ID: <20240502062442.30545-1-tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Ensure the inner IP header is part of skb's linear data before reading
its ECN bits. Otherwise we might read garbage.
One symptom is the system erroneously logging errors like
"vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx".
Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see
commit 1ca1ba465e55 ("geneve: make sure to pull inner header in
geneve_rx()") for example). So let's reuse the same code structure for
consistency. Maybe we'll can add a common helper in the future.
Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/1239c8db54efec341dd6455c77e0380f58923a3c.1714495737.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__skb_linearize() doesn't free the skb when it fails, so move
'*buf = NULL' after __skb_linearize(), so that the skb can be
freed on the err path.
Fixes: b7df21cf1b79 ("tipc: skb_linearize the head skb when reassembling msgs")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/90710748c29a1521efac4f75ea01b3b7e61414cf.1714485818.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sam Page (sam4k) working with Trend Micro Zero Day Initiative reported
a UAF in the tipc_buf_append() error path:
BUG: KASAN: slab-use-after-free in kfree_skb_list_reason+0x47e/0x4c0
linux/net/core/skbuff.c:1183
Read of size 8 at addr ffff88804d2a7c80 by task poc/8034
CPU: 1 PID: 8034 Comm: poc Not tainted 6.8.2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.0-debian-1.16.0-5 04/01/2014
Call Trace:
<IRQ>
__dump_stack linux/lib/dump_stack.c:88
dump_stack_lvl+0xd9/0x1b0 linux/lib/dump_stack.c:106
print_address_description linux/mm/kasan/report.c:377
print_report+0xc4/0x620 linux/mm/kasan/report.c:488
kasan_report+0xda/0x110 linux/mm/kasan/report.c:601
kfree_skb_list_reason+0x47e/0x4c0 linux/net/core/skbuff.c:1183
skb_release_data+0x5af/0x880 linux/net/core/skbuff.c:1026
skb_release_all linux/net/core/skbuff.c:1094
__kfree_skb linux/net/core/skbuff.c:1108
kfree_skb_reason+0x12d/0x210 linux/net/core/skbuff.c:1144
kfree_skb linux/./include/linux/skbuff.h:1244
tipc_buf_append+0x425/0xb50 linux/net/tipc/msg.c:186
tipc_link_input+0x224/0x7c0 linux/net/tipc/link.c:1324
tipc_link_rcv+0x76e/0x2d70 linux/net/tipc/link.c:1824
tipc_rcv+0x45f/0x10f0 linux/net/tipc/node.c:2159
tipc_udp_recv+0x73b/0x8f0 linux/net/tipc/udp_media.c:390
udp_queue_rcv_one_skb+0xad2/0x1850 linux/net/ipv4/udp.c:2108
udp_queue_rcv_skb+0x131/0xb00 linux/net/ipv4/udp.c:2186
udp_unicast_rcv_skb+0x165/0x3b0 linux/net/ipv4/udp.c:2346
__udp4_lib_rcv+0x2594/0x3400 linux/net/ipv4/udp.c:2422
ip_protocol_deliver_rcu+0x30c/0x4e0 linux/net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x2e4/0x520 linux/net/ipv4/ip_input.c:233
NF_HOOK linux/./include/linux/netfilter.h:314
NF_HOOK linux/./include/linux/netfilter.h:308
ip_local_deliver+0x18e/0x1f0 linux/net/ipv4/ip_input.c:254
dst_input linux/./include/net/dst.h:461
ip_rcv_finish linux/net/ipv4/ip_input.c:449
NF_HOOK linux/./include/linux/netfilter.h:314
NF_HOOK linux/./include/linux/netfilter.h:308
ip_rcv+0x2c5/0x5d0 linux/net/ipv4/ip_input.c:569
__netif_receive_skb_one_core+0x199/0x1e0 linux/net/core/dev.c:5534
__netif_receive_skb+0x1f/0x1c0 linux/net/core/dev.c:5648
process_backlog+0x101/0x6b0 linux/net/core/dev.c:5976
__napi_poll.constprop.0+0xba/0x550 linux/net/core/dev.c:6576
napi_poll linux/net/core/dev.c:6645
net_rx_action+0x95a/0xe90 linux/net/core/dev.c:6781
__do_softirq+0x21f/0x8e7 linux/kernel/softirq.c:553
do_softirq linux/kernel/softirq.c:454
do_softirq+0xb2/0xf0 linux/kernel/softirq.c:441
</IRQ>
<TASK>
__local_bh_enable_ip+0x100/0x120 linux/kernel/softirq.c:381
local_bh_enable linux/./include/linux/bottom_half.h:33
rcu_read_unlock_bh linux/./include/linux/rcupdate.h:851
__dev_queue_xmit+0x871/0x3ee0 linux/net/core/dev.c:4378
dev_queue_xmit linux/./include/linux/netdevice.h:3169
neigh_hh_output linux/./include/net/neighbour.h:526
neigh_output linux/./include/net/neighbour.h:540
ip_finish_output2+0x169f/0x2550 linux/net/ipv4/ip_output.c:235
__ip_finish_output linux/net/ipv4/ip_output.c:313
__ip_finish_output+0x49e/0x950 linux/net/ipv4/ip_output.c:295
ip_finish_output+0x31/0x310 linux/net/ipv4/ip_output.c:323
NF_HOOK_COND linux/./include/linux/netfilter.h:303
ip_output+0x13b/0x2a0 linux/net/ipv4/ip_output.c:433
dst_output linux/./include/net/dst.h:451
ip_local_out linux/net/ipv4/ip_output.c:129
ip_send_skb+0x3e5/0x560 linux/net/ipv4/ip_output.c:1492
udp_send_skb+0x73f/0x1530 linux/net/ipv4/udp.c:963
udp_sendmsg+0x1a36/0x2b40 linux/net/ipv4/udp.c:1250
inet_sendmsg+0x105/0x140 linux/net/ipv4/af_inet.c:850
sock_sendmsg_nosec linux/net/socket.c:730
__sock_sendmsg linux/net/socket.c:745
__sys_sendto+0x42c/0x4e0 linux/net/socket.c:2191
__do_sys_sendto linux/net/socket.c:2203
__se_sys_sendto linux/net/socket.c:2199
__x64_sys_sendto+0xe0/0x1c0 linux/net/socket.c:2199
do_syscall_x64 linux/arch/x86/entry/common.c:52
do_syscall_64+0xd8/0x270 linux/arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6f/0x77 linux/arch/x86/entry/entry_64.S:120
RIP: 0033:0x7f3434974f29
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 37 8f 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fff9154f2b8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3434974f29
RDX: 00000000000032c8 RSI: 00007fff9154f300 RDI: 0000000000000003
RBP: 00007fff915532e0 R08: 00007fff91553360 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 000055ed86d261d0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
In the critical scenario, either the relevant skb is freed or its
ownership is transferred into a frag_lists. In both cases, the cleanup
code must not free it again: we need to clear the skb reference earlier.
Fixes: 1149557d64c9 ("tipc: eliminate unnecessary linearization of incoming buffers")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-23852
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/752f1ccf762223d109845365d07f55414058e5a3.1714484273.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The find connection logic of Transarc's Rx was modified in the mid-1990s
to support multi-homed servers which might send a response packet from
an address other than the destination address in the received packet.
The rules for accepting a packet by an Rx initiator (RX_CLIENT_CONNECTION)
were altered to permit acceptance of a packet from any address provided
that the port number was unchanged and all of the connection identifiers
matched (Epoch, CID, SecurityClass, ...).
This change applies the same rules to the Linux implementation which makes
it consistent with IBM AFS 3.6, Arla, OpenAFS and AuriStorFS.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/20240419163057.4141728-1-marc.dionne@auristor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
This is much larger than is ideal, partly due to your holiday but also
due to several vendors having come in with relatively large fixes at
similar times. It's all driver specific stuff.
The meson fixes from Jerome fix some rare timing issues with blocking
operations happening in triggers, plus the continuous clock support
which fixes clocking for some platforms. The SOF series from Peter
builds to the fix to avoid spurious resets of ChainDMA which triggered
errors in cleanup paths with both PulseAudio and PipeWire, and there's
also some simple new debugfs files from Pierre which make support a lot
eaiser.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"There's a few simple driver specific fixes here, plus some core
cleanups from Matti which fix issues found with client drivers due to
the API being confusing.
The two fixes for the stubs provide more constructive behaviour with
!REGULATOR configurations, issues were noticed with some hwmon drivers
which would otherwise have needed confusing bodges in the users.
The irq_helpers fix to duplicate the provided name for the interrupt
controller was found because a driver got this wrong and it's again a
case where the core is the sensible place to put the fix"
* tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: change devm_regulator_get_enable_optional() stub to return Ok
regulator: change stubbed devm_regulator_get_enable to return Ok
regulator: vqmmc-ipq4019: fix module autoloading
regulator: qcom-refgen: fix module autoloading
regulator: mt6360: De-capitalize devicetree regulator subnodes
regulator: irq_helpers: duplicate IRQ name
|
|
If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.
However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.
This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.
Finally, document this tweak to the API.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-31-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
For practical reasons as well as security related ones, not all
capabilities are supported for protected VMs in pKVM.
Add a function that restricts the capabilities for protected VMs.
This behaves as an allow-list to ensure that future capabilities
are checked for compatibility and security before being allowed
for protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-30-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Initialize r = -EINVAL to get rid of the error-path
initializations in kvm_vm_ioctl_enable_cap().
No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-29-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.
Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-28-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.
Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-27-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.
Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-26-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Expand comment clarifying why the host value representing SVE
vector length being restored for ZCR_EL1 on guest exit isn't the
same as it was on guest entry.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-21-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In order to determine whether or not a VM or vcpu are protected,
introduce helpers to query this state. While at it, use the vcpu
helper to check vcpus protected state instead of the kvm one.
Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-19-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-18-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Move the unlock earlier in user_mem_abort() to shorten the
critical section. This also helps for future refactoring and
reuse of similar code.
This moves out marking the page as dirty outside of the critical
section. That code does not interact with the stage-2 page
tables, which the read lock in the critical section protects.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-16-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Most exit handlers return <= 0 to indicate that the host needs to
handle the exit. Make kvm_handle_mmio_return() consistent with
the exit handlers in handle_exit(). This makes the code easier to
reason about, and makes it easier to add other handlers in future
patches.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-15-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Fix the comment to clarify that __pkvm_vcpu_init_traps()
initializes traps for all VMs in protected mode, and not only
for protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-14-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We've added a .data section for the hypervisor, which kmemleak is
eager to parse. This clearly doesn't go well, so add the section
to kmemleak's block list.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-13-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
pKVM maintains its own state at EL2 for tracking the host fpsimd
state. Therefore, no need to map and share the host's view with
it.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-12-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Rename __tlb_switch_to_{guest,host}() to
{enter,exit}_vmid_context() in VHE code to maintain symmetry
between the nVHE and VHE TLB invalidations.
No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-11-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.
With guest-to-host memory sharing and unsharing hypercalls
originating from the guest under pKVM, there is need to support
both guest and host VMID invalidations issued from guest context.
Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.
Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Don't just assume that the PTE is valid when checking whether it
describes an executable or cacheable mapping.
This makes sure that we don't issue CMOs for invalid mappings.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|