Age | Commit message (Collapse) | Author |
|
* kvm-arm64/wfxt:
: .
: Add support for the WFET/WFIT instructions that provide the same
: service as WFE/WFI, only with a timeout.
: .
KVM: arm64: Expose the WFXT feature to guests
KVM: arm64: Offer early resume for non-blocking WFxT instructions
KVM: arm64: Handle blocking WFIT instruction
KVM: arm64: Introduce kvm_counter_compute_delta() helper
KVM: arm64: Simplify kvm_cpu_has_pending_timer()
arm64: Use WFxT for __delay() when possible
arm64: Add wfet()/wfit() helpers
arm64: Add HWCAP advertising FEAT_WFXT
arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Merge arm64's SME branch to resolve conflicts with the WFxT branch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
This is required to make loading this as a module work.
Signed-off-by: Hector Martin <marcan@marcan.st>
Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver")
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20220502092238.30486-1-marcan@marcan.st
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Assert that the vCPU exits to userspace with KVM_SYSTEM_EVENT_SUSPEND if
the guest calls PSCI SYSTEM_SUSPEND. Additionally, guarantee that the
SMC32 and SMC64 flavors of this call are discoverable with the
PSCI_FEATURES call.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-13-oupton@google.com
|
|
Split up the current test into several helpers that will be useful to
subsequent test cases added to the PSCI test suite.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-12-oupton@google.com
|
|
Setting a vCPU's MP state to KVM_MP_STATE_STOPPED has the effect of
powering off the vCPU. Rather than using the vCPU init feature flag, use
the KVM_SET_MP_STATE ioctl to power off the target vCPU.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-11-oupton@google.com
|
|
The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-10-oupton@google.com
|
|
There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-9-oupton@google.com
|
|
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.
Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.
The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.
Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
|
|
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.
Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
|
|
A subsequent change to KVM will introduce a vCPU request that could
result in an exit to userspace. Change check_vcpu_requests() to return a
value and document the function. Unconditionally return 1 for now.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-6-oupton@google.com
|
|
The naming of the kvm_req_sleep function is confusing: the function
itself sleeps the vCPU, it does not request such an event. Rename the
function to make its purpose more clear.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-5-oupton@google.com
|
|
A subsequent change to KVM will add support for additional power states.
Store the MP state by value rather than keeping track of it as a
boolean.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
|
|
vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
former and replace all callsites to the latter.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
|
|
Depending on a fallthrough to the default case for hiding SYSTEM_RESET2
requires that any new case statements clean up the failure path for this
PSCI call.
Unhitch SYSTEM_RESET2 from the default case by setting val to
PSCI_RET_NOT_SUPPORTED outside of the switch statement. Apply the
cleanup to both the PSCI_1_1_FN_SYSTEM_RESET2 and
PSCI_1_0_FN_PSCI_FEATURES handlers.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-2-oupton@google.com
|
|
The IT6505 is using functions provided by the DRM_DP_HELPER driver.
In order to avoid having the bridge enabled but the helper disabled,
let's add a select in order to be sure that the DP helper functions are
always available.
Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver")
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220426141536.274727-1-fparent@baylibre.com
|
|
Fix the new instances of ESR being described as a u32, now that
we consistently are using a u64 for this register.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The cited commits didn't use proper matching on inner TTC
as a result distribution of encapsulated packets wasn't symmetric
between the physical ports.
Fixes: 4c71ce50d2fe ("net/mlx5: Support partial TTC rules")
Fixes: 8e25a2bc6687 ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Double clear of reset requested state can lead to NULL pointer as it
will try to delete the timer twice. This can happen for example on a
race between abort from FW and pci error or reset. Avoid such case using
test_and_clear_bit() to verify only one time reset requested state clear
flow. Similarly use test_and_set_bit() to verify only one time reset
requested state set flow.
Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The sync reset flow can lead to the following deadlock when
poll_sync_reset() is called by timer softirq and waiting on
del_timer_sync() for the same timer. Fix that by moving the part of the
flow that waits for the timer to reset_reload_work.
It fixes the following kernel Trace:
RIP: 0010:del_timer_sync+0x32/0x40
...
Call Trace:
<IRQ>
mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
poll_sync_reset.cold+0x36/0x52 [mlx5_core]
call_timer_fn+0x32/0x130
__run_timers.part.0+0x180/0x280
? tick_sched_handle+0x33/0x60
? tick_sched_timer+0x3d/0x80
? ktime_get+0x3e/0xa0
run_timer_softirq+0x2a/0x50
__do_softirq+0xe1/0x2d6
? hrtimer_interrupt+0x136/0x220
irq_exit+0xae/0xb0
smp_apic_timer_interrupt+0x7b/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
Fixes: 3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Setting dscp2prio during the driver reload can cause dcb ieee app list to
be not empty after the reload finish and as a result to a conflict between
the priority trust state reported by the app and the state in the device
register.
Reset the dcb ieee app list on initialization in case this is
conflicting with the register status.
Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
During TC action parsing, the can_offload callback is called
before calling the action's main parsing callback.
Later on, the can_offload callback is called again before handling
the action's post_parse callback if exists.
Since the main parsing callback might have changed and set parsing
params for the rule, following can_offload checks might fail because
some parsing params were already set.
Specifically, the ct action main parsing sets the ct param in the
parsing status structure and when the second can_offload for ct action
is called, before handling the ct post parsing, it will return an error
since it checks this ct param to indicate multiple ct actions which are
not supported.
Therefore, the can_offload call is removed from the post parsing
handling to prevent such cases.
This is allowed since the first can_offload call will ensure that the
action can be offloaded and the fact the code reached the post parsing
handling already means that the action can be offloaded.
Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
__mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
if that is the last reference to that tuple, the actual deletion of
the tuple can happen after the FT is already destroyed and freed.
Flush the used workqueue before destroying the ct FT.
Fixes: a2173131526d ("net/mlx5e: CT: manage the lifetime of the ct entry object")
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When resolving the decap route device for a tunnel decap rule,
the result may be an OVS internal port device.
Prior to adding the support for internal port offload, such case
would result in using the uplink as the default decap route device
which allowed devices that can't support internal port offload
to offload this decap rule.
This behavior got broken by adding the internal port offload which
will fail in case the device can't support internal port offload.
To restore the old behavior, use the uplink device as the decap
route as before when internal port offload is not supported.
Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
ct_clear action is translated to clearing reg_c metadata
which holds ct state and zone information using mod header
actions.
These actions are allocated during the actions parsing, as
part of the flow attributes main mod header action list.
If ct action exists in the rule, the flow's main mod header
is used only in the post action table rule, after the ct tables
which set the ct info in the reg_c as part of the ct actions.
Therefore, if the original rule has a ct_clear action followed
by a ct action, the ct action reg_c setting will be done first and
will be followed by the ct_clear resetting reg_c and overwriting
the ct info.
Fix this by moving the ct_clear mod header actions allocation from
the ct action parsing stage to the ct action post parsing stage where
it is already known if ct_clear is followed by a ct action.
In such case, we skip the mod header actions allocation for the ct
clear since the ct action will write to reg_c anyway after clearing it.
Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Referenced change added check to skip updating fib when new fib instance
has same or lower priority. However, new fib instance can be an update on
same dst address as existing one even though the structure is another
instance that has different address. Ignoring events on such instances
causes multipath LAG state to not be correctly updated.
Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
structure and don't skip events that have the same value of that fields.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Referenced change incorrectly sets single path fib_info even when LAG is
not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
that verifies LAG state.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.
Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.
[0]:
[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>
[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3
[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected
[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The arguments of update_buffer_lossy() is in a wrong order. Fix it.
Fixes: 88b3d5c90e96 ("net/mlx5e: Fix port buffers cell size value")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, match VLAN rule also matches packets that have multiple VLAN
headers. This behavior is similar to buggy flower classifier behavior that
has recently been fixed. Fix the issue by matching on
outer_second_cvlan_tag with value 0 which will cause the HW to verify the
packet doesn't contain second vlan header.
Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Resource dump menu may span over more than a single page, support it.
Otherwise, menu read may result in a memory access violation: reading
outside of the allocated page.
Note that page format of the first menu page contains menu headers while
the proceeding menu pages contain only records.
The KASAN logs are as follows:
BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0
Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496
CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G B 5.16.0_for_upstream_debug_2022_01_10_23_12 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0+0x1f/0x140
? strcmp+0x9b/0xb0
? strcmp+0x9b/0xb0
kasan_report.cold+0x83/0xdf
? strcmp+0x9b/0xb0
strcmp+0x9b/0xb0
mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core]
? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x286/0x400
? raw_spin_unlock_irqrestore+0x47/0x50
? aomic_notifier_chain_register+0x32/0x40
mlx5_load+0x104/0x2e0 [mlx5_core]
mlx5_init_one+0x41b/0x610 [mlx5_core]
....
The buggy address belongs to the object at ffff88812b2e0000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 4048 bytes to the right of
4096-byte region [ffff88812b2e0000, ffff88812b2e1000)
The buggy address belongs to the page:
page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0
head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x8000000000010200(slab|head|zone=2)
raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040
raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: 12206b17235a ("net/mlx5: Add support for resource dump")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When OVS internal port is the vtep device, the first decap
rule is matching on the internal port's vport metadata value
and then changes the metadata to be the uplink's value.
Therefore, following rules on the tunnel, in chain > 0, should
avoid matching on internal port metadata and use the uplink
vport metadata instead.
Select the uplink's metadata value for the source vport match
in case the rule is in chain greater than zero, even if the tunnel
route device is internal port.
Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Michael Chan says:
====================
bnxt_en: Bug fixes
This patch series includes 3 fixes:
- Fix an occasional VF open failure.
- Fix a PTP spinlock usage before initialization
- Fix unnecesary RX packet drops under high TX traffic load.
====================
Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In bnxt_poll_p5(), we first check cpr->has_more_work. If it is true,
we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
continue polling. It is possible to exhanust the budget again when
__bnxt_poll_cqs() returns.
We then enter the main while loop to check for new entries in the NQ.
If we had previously exhausted the NAPI budget, we may call
__bnxt_poll_work() to process an RX entry with zero budget. This will
cause packets to be dropped unnecessarily, thinking that we are in the
netpoll path. Fix it by breaking out of the while loop if we need
to process an RX NQ entry with no budget left. We will then exit
NAPI and stay in polling mode.
Fixes: 389a877a3b20 ("bnxt_en: Process the NQ under NAPI continuous polling.")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
spinlock. The spinlock is not initialized until later. Move the
bnxt_ptp_init_rtc() call after the spinlock is initialized.
Fixes: 24ac1ecd5240 ("bnxt_en: Add driver support to use Real Time Counter for PTP")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_open() can fail in this code path, especially on a VF when
it fails to reserve default rings:
bnxt_open()
__bnxt_open_nic()
bnxt_clear_int_mode()
bnxt_init_dflt_ring_mode()
RX rings would be set to 0 when we hit this error path.
It is possible for a subsequent bnxt_open() call to potentially succeed
with a code path like this:
bnxt_open()
bnxt_hwrm_if_change()
bnxt_fw_init_one()
bnxt_fw_init_one_p3()
bnxt_set_dflt_rfs()
bnxt_rfs_capable()
bnxt_hwrm_reserve_rings()
On older chips, RFS is capable if we can reserve the number of vnics that
is equal to RX rings + 1. But since RX rings is still set to 0 in this
code path, we may mistakenly think that RFS is supported for 0 RX rings.
Later, when the default RX rings are reserved and we try to enable
RFS, it would fail and cause bnxt_open() to fail unnecessarily.
We fix this in 2 places. bnxt_rfs_capable() will always return false if
RX rings is not yet set. bnxt_init_dflt_ring_mode() will call
bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
supported.
Fixes: 20d7d1c5c9b1 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC
LAN911x Ethernet chip, so the networking on them must have been broken by
commit 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
which filtered out 0 as well as the negative error codes -- it was kinda
correct at the time, as platform_get_irq() could return 0 on of_irq_get()
failure and on the actual 0 in an IRQ resource. This issue was fixed by
me (back in 2016!), so we should be able to fix this driver to allow IRQ0
usage again...
When merging this to the stable kernels, make sure you also merge commit
e330b9a6bb35 ("platform: don't return 0 from platform_get_irq[_byname]()
on error") -- that's my fix to platform_get_irq() for the DT platforms...
Fixes: 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As noted elsewhere, various GPON SFP modules exhibit non-standard
TX-fault behaviour. In the tested case, the Huawei MA5671A, when used
in combination with a Marvell mv88e6085 switch, was found to
persistently assert TX-fault, resulting in the module being disabled.
This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the
module to function.
Change from v1: removal of erroneous return statment (Andrew Lunn)
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp selftest fix from Kees Cook:
- Avoid using stdin for read syscall testing (Jann Horn)
* tag 'seccomp-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: Don't call read() on TTY from background pgrp
|
|
Add the psuedo-firmware registers KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP to
the base_regs[] list.
Also, add the COPROC support for KVM_REG_ARM_FW_FEAT_BMAP.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-10-rananta@google.com
|
|
Introduce a KVM selftest to check the hypercall interface
for arm64 platforms. The test validates the user-space'
[GET|SET]_ONE_REG interface to read/write the psuedo-firmware
registers as well as its effects on the guest upon certain
configurations.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-9-rananta@google.com
|
|
The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220409184549.1681189-11-oupton@google.com
|
|
There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220409184549.1681189-10-oupton@google.com
|
|
Import the standard SMCCC definitions from include/linux/arm-smccc.h.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-8-rananta@google.com
|
|
Add the documentation for the bitmap firmware registers in
hypercalls.rst and api.rst. This includes the details for
KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_HYP_BMAP, and
KVM_REG_ARM_VENDOR_HYP_BMAP registers.
Since the document is growing to carry other hypercall related
information, make necessary adjustments to present the document
in a generic sense, rather than being PSCI focused.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: small scale reformat, move things about, random typo fixes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-7-rananta@google.com
|
|
Since the doc also covers general hypercalls' details,
rather than just PSCI, and the fact that the bitmap firmware
registers' details will be added to this doc, rename the file
to a more appropriate name- hypercalls.rst.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-6-rananta@google.com
|
|
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support for KVM-vendor features along with
reading the UID, represented by bit-0, and Precision Time
Protocol (PTP), represented by bit-1.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
|
|
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
|
|
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.
As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.
These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.
During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.
Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.
In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: reduced the scope of some helpers, tidy-up bitmap max values,
dropped error-only fast path]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
|
|
Ignore compatible strings for the IPA virt drivers that were removed in
commits 2fb251c26560 ("interconnect: qcom: sdx55: Drop IP0
interconnects") and 2f3724930eb4 ("interconnect: qcom: sc7180: Drop IP0
interconnects") so that the sync state logic can kick in again.
Otherwise all the interconnects in the system will stay pegged at max
speeds because 'providers_count' is always going to be one larger than
the number of drivers that will ever probe on sc7180 or sdx55. This
fixes suspend on sc7180 and sdx55 devices when you don't have a
devicetree patch to remove the ipa-virt compatible node.
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Alex Elder <elder@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Cc: Mike Tipton <quic_mdtipton@quicinc.com>
Fixes: 2fb251c26560 ("interconnect: qcom: sdx55: Drop IP0 interconnects")
Fixes: 2f3724930eb4 ("interconnect: qcom: sc7180: Drop IP0 interconnects")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20220427013226.341209-1-swboyd@chromium.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
|