Age | Commit message (Collapse) | Author |
|
Add a missing kernel-doc entry for the offload_flags, and
correct the name of the update_vif_offload method.
Link: https://lore.kernel.org/r/20200918132115.d46a0915ba8a.Ibba536d04a5a5fb655f8ef6e51b247457bfda4ca@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add missing kernel-doc for the S1G band capabilities in the
per band data.
Link: https://lore.kernel.org/r/20200918131921.08c893cd73a1.Id71583c37baca8a9a3329426e02b66d9ab65ac03@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This patch adds mac80211 support to configure unsolicited
broadcast probe response transmission for in-band discovery in 6GHz.
Changes include functions to store and retrieve probe response template,
and packet interval (0 - 20 TUs).
Setting interval to 0 disables the unsolicited broadcast probe response
transmission.
Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/010101747a946b35-ad25858a-1f1f-48df-909e-dc7bf26d9169-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This patch adds new attributes to support unsolicited broadcast
probe response transmission used for in-band
discovery in 6GHz band (IEEE P802.11ax/D6.0 26.17.2.3.2, AP behavior for
fast passive scanning).
The new attribute, NL80211_ATTR_UNSOL_BCAST_PROBE_RESP, is nested which
supports following parameters:
(1) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_INT - Packet interval
(2) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_TMPL - Template data
Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/010101747a946698-aac263ae-2ed3-4dab-9590-0bc7131214e1-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This patch adds mac80211 support to configure FILS discovery
transmission.
Changes include functions to store and retrieve FILS discovery
template, minimum and maximum packet intervals.
Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/20200805011838.28166-3-alokad@codeaurora.org
[remove SUPPORTS_FILS_DISCOVERY, driver can just set wiphy info]
Link: https://lore.kernel.org/r/010101747a7b3cbb-6edaa89c-436d-4391-8765-61456d7f5f4e-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
FILS discovery attribute, NL80211_ATTR_FILS_DISCOVERY, is nested which
supports following parameters as given in IEEE Std 802.11ai-2016,
Annex C.3 MIB detail:
(1) NL80211_FILS_DISCOVERY_ATTR_INT_MIN - Minimum packet interval
(2) NL80211_FILS_DISCOVERY_ATTR_INT_MAX - Maximum packet interval
(3) NL80211_FILS_DISCOVERY_ATTR_TMPL - Template data
Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/20200805011838.28166-2-alokad@codeaurora.org
[fix attribute and other names, use NLA_RANGE(), use policy only once]
Link: https://lore.kernel.org/r/010101747a7b38a8-306f06b2-9061-4baf-81c1-054a42a18e22-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When trying to associate to an AP support 180 or 80+80 MHz on 6 GHz with a
STA that only has 80 Mhz support the cf2 field inside the chandef will get
set causing the association to fail when trying to validate the chandef.
Fix this by checking the support flags prior to setting cf2.
Fixes: 57fa5e85d53ce ("mac80211: determine chandef from HE 6 GHz operation")
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200918115304.1135693-1-john@phrozen.org
[reword commit message a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Some APs (e.g. Asus RT-AC88U) have been observed to report an HT MSDU size
limit of 3839 and a VHT limit of 7991. These APs can handle bigger frames
than 3839 bytes just fine, so we should remove the VHT limit based on the
HT capabilities. This improves tx throughput.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200916164611.8022-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
S1G channels have a single width defined per frequency, so
derive it from the channel flags with
ieee80211_s1g_channel_width().
Also support setting an S1G channel where control frequency may
differ from operating, and add some basic validation to
ensure the control channel is with the operating.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-6-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The S1G beacon has a different header size than regular
beacons, so adjust the beacon head validator.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-5-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
S1G channels have a minimum bandwidth of 1Mhz, and there
is a 1:1 mapping of allowed bandwidth to channel number.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-4-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
S1G supports 5 channel widths: 1, 2, 4, 8, and 16. One
channel width is allowed per frequency in each operating
class, so it makes more sense to advertise the specific
channel width allowed.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-3-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The S1G capability fields were defined by ORing BITS()
together, and expecting a custom macro to use the _SHIFT
definitions. Use the Linux kernel GENMASK for the
definitions now, and FIELD_{GET,PREP} to access the fields
in the future.
Take the chance to rename eg. S1G_CAPAB_B0 to the more
compact S1G_CAP0.
Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-2-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Remove the newly added ieee80211_set_vif_encap_ops declaration.
No further code changes.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-15-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Store processed skbs ready to be freed in a list so the driver bulk free them
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-13-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
For encap-offloaded packets, ieee80211_free_txskb cannot be used, since it
does not have the vif pointer.
Using ieee80211_tx_status_ext for this purpose has the advantage of being able
avoid an extra station lookup for AQL
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-12-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Make ieee80211_tx_status_8023 call ieee80211_tx_status_ext, similar to
ieee80211_tx_status.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-11-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Calling mod_timer for every rx/tx packet can be quite expensive.
Instead of constantly updating the timer, we can simply let it run out
and check the timestamp of the last ACK or rx packet to re-arm it.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-9-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is needed for encapsulation offload of 4-address mode packets
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-14-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to unify the tx status path, the hw 802.11 encapsulation flag
needs to survive the trip to the tx status call.
Since we don't have any free bits in info->flags, we need to move one.
IEEE80211_TX_INTFL_NEED_TXPROCESSING is only used internally in mac80211,
and only before the call into the driver.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-10-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
All drivers using airtime fairness are calling ieee80211_sta_register_airtime
directly, now they must. Document this as well.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-8-nbd@nbd.name
[johannes: update the documentation to suit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Move redundant functionality from __ieee80211_tx_status into
ieee80211_tx_status_ext. Preparation for unifying with the 802.3 tx status
codepath.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-7-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The current API (which lets the driver turn on/off per vif directly) has a
number of limitations:
- it does not deal with AP_VLAN
- conditions for enabling (no tkip, no monitor) are only checked at
add_interface time
- no way to indicate 4-addr support
In order to address this, store offload flags in struct ieee80211_vif
(easy to extend for decap offload later). mac80211 initially sets the enable
flag, but gives the driver a chance to modify it before its settings are
applied. In addition to the .add_interface op, a .update_vif_offload op is
introduced, which can be used for runtime changes.
If a driver can't disable encap offload at runtime, or if it has some extra
limitations, it can simply override the flags within those ops.
Support for encap offload with 4-address mode interfaces can be enabled
by setting a flag from .add_interface or .update_vif_offload.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-6-nbd@nbd.name
[resolved conflict with commit aa2092a9bab3 ("ath11k: add raw mode and
software crypto support")]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is needed for drivers that don't do the key lookup themselves
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This simplifies the checks in the encap offload tx handler and allows using
it in cases where software crypto is used for multicast packets, e.g. when
using an AP_VLAN.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-4-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Update the last_tx timestamp to avoid tearing down the aggregation session
early. Fall back to the slow path if the session setup is still running
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Fixes AQL for encap-offloaded tx
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200908123702.88454-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The value of struct bss_parameters::ap_isolate will be -1, 0 or 1.
The value -1 means not to change. To prevent developers from thinking
ap_isolate is only 0 or 1, I add more comments on it.
Signed-off-by: Wright Feng <wright.feng@cypress.com>
Reviewed-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200908060157.98846-1-wright.feng@cypress.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is no caller in tree any more.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20200829115506.17828-1-yuehaibing@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Limit maximum VHT MPDU size by local capability.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200917125031.45009-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
We shouldn't accept any channels bigger than 233, fix that.
Reported-by: Amar <asinghal@codeaurora.org>
Fixes: d1a1646c0de7 ("cfg80211: adapt to new channelization of the 6GHz band")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200917115222.312ba6f1d461.I3a8c8fbcc3cc019814fd9cd0aced7eb591626136@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
VHT is not supported on 2.4 GHz, but HE is; don't disable HE if HT
is missing there, do that only on 5 GHz (6 GHz is only HE).
Fixes: 57fa5e85d53ce51 ("mac80211: determine chandef from HE 6 GHz operation")
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/010101747cb617f2-593c5410-1648-4a42-97a0-f3646a5a6dd1-000000@us-west-2.amazonses.com
[rewrite the commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Radiotap header field 'Channel flags' has '2 GHz spectrum' set to
'true' for 6GHz packet.
Change it to 5GHz as there isn't a separate option available for 6GHz.
Signed-off-by: Aloka Dixit <alokad@codeaurora.org>
Link: https://lore.kernel.org/r/010101747ab7b703-1d7c9851-1594-43bf-81f7-f79ce7a67cc6-000000@us-west-2.amazonses.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When LIB80211_CRYPT_CCMP is enabled and CRYPTO is disabled, it results in unmet
direct dependencies config warning. The reason is that LIB80211_CRYPT_CCMP
selects CRYPTO_AES and CRYPTO_CCM, which are subordinate to CRYPTO. This is
reproducible with CRYPTO disabled and R8188EU enabled, where R8188EU selects
LIB80211_CRYPT_CCMP but does not select or depend on CRYPTO.
Honor the kconfig menu hierarchy to remove kconfig dependency warnings.
Fixes: a11e2f85481c ("lib80211: use crypto API ccm(aes) transform for CCMP processing")
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Link: https://lore.kernel.org/r/20200909095452.3080-1-fazilyildiran@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When converting from struct ieee80211_tx_rate to ieee80211_rx_status,
there was one check missing to fill in the bandwidth for 160 MHz
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20200915085945.3782-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The unit of the return value of ieee80211_get_rate_duration is nanoseconds, not
microseconds. Adjust the duration checks to account for that.
For higher data rates, allow larger estimated aggregation sizes, and add some
values for HE as well, which can use much larger aggregates.
Since small packets with high data rates can now lead to duration values too
small for info->tx_time_est, return a minimum of 4us.
Fixes: f01cfbaf9b29 ("mac80211: improve AQL aggregation estimation for low data rates")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20200915085945.3782-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating
128-bit IRTE") removed an assumption that modify_irte_ga always set
the valid bit, which requires the callers to set the appropriate value
for the struct irte_ga.valid bit before calling the function.
Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn
bit after programming IRTE"), which is for the function
amd_iommu_deactivate_guest_mode().
The same change is also needed for the amd_iommu_activate_guest_mode().
Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
AVIC enabled.
Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200916111720.43913-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after
programming IRTE"), smatch warns:
drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode()
warn: variable dereferenced before check 'entry' (see line 3867)
Fix this by moving the @valid assignment to after @entry has been checked
for NULL.
Fixes: 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
There have been some reports of "bad bp value" warnings printed by the
frame pointer unwinder:
WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000
This warning happens when unwinding from an interrupt in
ret_from_fork(). If entry code gets interrupted, the state of the
frame pointer (rbp) may be undefined, which can confuse the unwinder,
resulting in warnings like the above.
There's an in_entry_code() check which normally silences such
warnings for entry code. But in this case, ret_from_fork() is getting
interrupted. It recently got moved out of .entry.text, so the
in_entry_code() check no longer works.
It could be moved back into .entry.text, but that would break the
noinstr validation because of the call to schedule_tail().
Instead, initialize each new task's RBP to point to the task's entry
regs via an encoded frame pointer. That will allow the unwinder to
reach the end of the stack gracefully.
Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com
|
|
Add four tests to tailcalls selftest explicitly named
"tailcall_bpf2bpf_X" as their purpose is to validate that combination
of tailcalls with bpf2bpf calls are working properly.
These tests also validate LD_ABS from subprograms.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
LD_[ABS|IND] instructions may return from the function early. bpf_tail_call
pseudo instruction is either fallthrough or return. Allow them in the
subprograms only when subprograms are BTF annotated and have scalar return
types. Allow ld_abs and tail_call in the main program even if it calls into
subprograms. In the past that was not ok to do for ld_abs, since it was JITed
with special exit sequence. Since bpf_gen_ld_abs() was introduced the ld_abs
looks like normal exit insn from JIT point of view, so it's safe to allow them
in the main program.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Relax verifier's restriction that was meant to forbid tailcall usage
when subprog count was higher than 1.
Also, do not max out the stack depth of program that utilizes tailcalls.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This commit serves two things:
1) it optimizes BPF prologue/epilogue generation
2) it makes possible to have tailcalls within BPF subprogram
Both points are related to each other since without 1), 2) could not be
achieved.
In [1], Alexei says:
"The prologue will look like:
nop5
xor eax,eax // two new bytes if bpf_tail_call() is used in this
// function
push rbp
mov rbp, rsp
sub rsp, rounded_stack_depth
push rax // zero init tail_call counter
variable number of push rbx,r13,r14,r15
Then bpf_tail_call will pop variable number rbx,..
and final 'pop rax'
Then 'add rsp, size_of_current_stack_frame'
jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov
rbp, rsp'
This way new function will set its own stack size and will init tail
call
counter with whatever value the parent had.
If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'.
Instead it would need to have 'nop2' in there."
Implement that suggestion.
Since the layout of stack is changed, tail call counter handling can not
rely anymore on popping it to rbx just like it have been handled for
constant prologue case and later overwrite of rbx with actual value of
rbx pushed to stack. Therefore, let's use one of the register (%rcx) that
is considered to be volatile/caller-saved and pop the value of tail call
counter in there in the epilogue.
Drop the BUILD_BUG_ON in emit_prologue and in
emit_bpf_tail_call_indirect where instruction layout is not constant
anymore.
Introduce new poke target, 'tailcall_bypass' to poke descriptor that is
dedicated for skipping the register pops and stack unwind that are
generated right before the actual jump to target program.
For case when the target program is not present, BPF program will skip
the pop instructions and nop5 dedicated for jmpq $target. An example of
such state when only R6 of callee saved registers is used by program:
ffffffffc0513aa1: e9 0e 00 00 00 jmpq 0xffffffffc0513ab4
ffffffffc0513aa6: 5b pop %rbx
ffffffffc0513aa7: 58 pop %rax
ffffffffc0513aa8: 48 81 c4 00 00 00 00 add $0x0,%rsp
ffffffffc0513aaf: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffffc0513ab4: 48 89 df mov %rbx,%rdi
When target program is inserted, the jump that was there to skip
pops/nop5 will become the nop5, so CPU will go over pops and do the
actual tailcall.
One might ask why there simply can not be pushes after the nop5?
In the following example snippet:
ffffffffc037030c: 48 89 fb mov %rdi,%rbx
(...)
ffffffffc0370332: 5b pop %rbx
ffffffffc0370333: 58 pop %rax
ffffffffc0370334: 48 81 c4 00 00 00 00 add $0x0,%rsp
ffffffffc037033b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffffc0370340: 48 81 ec 00 00 00 00 sub $0x0,%rsp
ffffffffc0370347: 50 push %rax
ffffffffc0370348: 53 push %rbx
ffffffffc0370349: 48 89 df mov %rbx,%rdi
ffffffffc037034c: e8 f7 21 00 00 callq 0xffffffffc0372548
There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall
and jump target is not present. ctx is in %rbx register and BPF
subprogram that we will call into on ffffffffc037034c is relying on it,
e.g. it will pick ctx from there. Such code layout is therefore broken
as we would overwrite the content of %rbx with the value that was pushed
on the prologue. That is the reason for the 'bypass' approach.
Special care needs to be taken during the install/update/remove of
tailcall target. In case when target program is not present, the CPU
must not execute the pop instructions that precede the tailcall.
To address that, the following states can be defined:
A nop, unwind, nop
B nop, unwind, tail
C skip, unwind, nop
D skip, unwind, tail
A is forbidden (lead to incorrectness). The state transitions between
tailcall install/update/remove will work as follows:
First install tail call f: C->D->B(f)
* poke the tailcall, after that get rid of the skip
Update tail call f to f': B(f)->B(f')
* poke the tailcall (poke->tailcall_target) and do NOT touch the
poke->tailcall_bypass
Remove tail call: B(f')->C(f')
* poke->tailcall_bypass is poked back to jump, then we wait the RCU
grace period so that other programs will finish its execution and
after that we are safe to remove the poke->tailcall_target
Install new tail call (f''): C(f')->D(f'')->B(f'').
* same as first step
This way CPU can never be exposed to "unwind, tail" state.
Last but not least, when tailcalls get mixed with bpf2bpf calls, it
would be possible to encounter the endless loop due to clearing the
tailcall counter if for example we would use the tailcall3-like from BPF
selftests program that would be subprogram-based, meaning the tailcall
would be present within the BPF subprogram.
This test, broken down to particular steps, would do:
entry -> set tailcall counter to 0, bump it by 1, tailcall to func0
func0 -> call subprog_tail
(we are NOT skipping the first 11 bytes of prologue and this subprogram
has a tailcall, therefore we clear the counter...)
subprog -> do the same thing as entry
and then loop forever.
To address this, the idea is to go through the call chain of bpf2bpf progs
and look for a tailcall presence throughout whole chain. If we saw a single
tail call then each node in this call chain needs to be marked as a subprog
that can reach the tailcall. We would later feed the JIT with this info
and:
- set eax to 0 only when tailcall is reachable and this is the entry prog
- if tailcall is reachable but there's no tailcall in insns of currently
JITed prog then push rax anyway, so that it will be possible to
propagate further down the call chain
- finally if tailcall is reachable, then we need to precede the 'call'
insn with mov rax, [rbp - (stack_depth + 8)]
Tail call related cases from test_verifier kselftest are also working
fine. Sample BPF programs that utilize tail calls (sockex3, tracex5)
work properly as well.
[1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Protect against potential stack overflow that might happen when bpf2bpf
calls get combined with tailcalls. Limit the caller's stack depth for
such case down to 256 so that the worst case scenario would result in 8k
stack size (32 which is tailcall limit * 256 = 8k).
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu fix from Dennis Zhou:
"This is a fix for the first chunk size calculation where the variable
length array incorrectly used the number of longs instead of bytes of
longs.
This came in as a code fix and not a bug report, so I don't think it
was widely problematic. I believe it worked out due to it being
memblock memory and alignment requirements working in our favor"
* 'for-5.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: fix first chunk size calculation for populated bitmap
|
|
Christoph reported an infinite loop in the subflow receive path
under stress condition.
If there are multiple subflows, each of them using a large send
buffer, the delta between the sequence number used by
MPTCP-level retransmission can and the current msk->ack_seq
can be greater than MAX_INT.
In the above scenario, when calling mptcp_subflow_discard_data(),
such delta will be truncated to int, and could result in a negative
number: no bytes will be dropped, and subflow_check_data_avail()
will try again to process the same packet, looping forever.
This change addresses the issue by expanding the 'limit' size to 64
bits, so that overflows are not possible anymore.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/87
Fixes: 6719331c2f73 ("mptcp: trigger msk processing even for OoO data")
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If smc_listen_rmda_finish() returns with an error, the storage
addressed by 'buf' is freed a second time.
Consolidate freeing under a common label and jump to that label.
Fixes: 6bb14e48ee8d ("net/smc: dynamic allocation of CLC proposal buffer")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
>> ld.lld: error: undefined symbol: net_dim_get_rx_moderation
>>> referenced by ionic_lif.c:52 (drivers/net/ethernet/pensando/ionic/ionic_lif.c:52)
>>> net/ethernet/pensando/ionic/ionic_lif.o:(ionic_dim_work) in archive drivers/built-in.a
>> ld.lld: error: undefined symbol: net_dim
>>> referenced by ionic_txrx.c:456 (drivers/net/ethernet/pensando/ionic/ionic_txrx.c:456)
>>> net/ethernet/pensando/ionic/ionic_txrx.o:(ionic_dim_update) in archive drivers/built-in.a
v2: removed sketchy dashes in commit message
Fixes: 04a834592bf5 ("ionic: dynamic interrupt moderation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
struct ethtool_fecparam carries bitmasks not bit numbers.
We want to return 1 (NONE), not 0.
Fixes: 0d0870938337 ("nfp: implement ethtool FEC mode settings")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We removed the misleading comments from struct rtnl_link_stats64
when we added proper kdoc. struct rtnl_link_stats has the same
inline comments, so remove them, too.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|