git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-09-18	mac80211: fix some encapsulation offload kernel-doc	Johannes Berg
	Add a missing kernel-doc entry for the offload_flags, and correct the name of the update_vif_offload method. Link: https://lore.kernel.org/r/20200918132115.d46a0915ba8a.Ibba536d04a5a5fb655f8ef6e51b247457bfda4ca@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	cfg80211: add missing kernel-doc for S1G band capabilities	Johannes Berg
	Add missing kernel-doc for the S1G band capabilities in the per band data. Link: https://lore.kernel.org/r/20200918131921.08c893cd73a1.Id71583c37baca8a9a3329426e02b66d9ab65ac03@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: Unsolicited broadcast probe response support	Aloka Dixit
	This patch adds mac80211 support to configure unsolicited broadcast probe response transmission for in-band discovery in 6GHz. Changes include functions to store and retrieve probe response template, and packet interval (0 - 20 TUs). Setting interval to 0 disables the unsolicited broadcast probe response transmission. Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/010101747a946b35-ad25858a-1f1f-48df-909e-dc7bf26d9169-000000@us-west-2.amazonses.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	nl80211: Unsolicited broadcast probe response support	Aloka Dixit
	This patch adds new attributes to support unsolicited broadcast probe response transmission used for in-band discovery in 6GHz band (IEEE P802.11ax/D6.0 26.17.2.3.2, AP behavior for fast passive scanning). The new attribute, NL80211_ATTR_UNSOL_BCAST_PROBE_RESP, is nested which supports following parameters: (1) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_INT - Packet interval (2) NL80211_UNSOL_BCAST_PROBE_RESP_ATTR_TMPL - Template data Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/010101747a946698-aac263ae-2ed3-4dab-9590-0bc7131214e1-000000@us-west-2.amazonses.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: Add FILS discovery support	Aloka Dixit
	This patch adds mac80211 support to configure FILS discovery transmission. Changes include functions to store and retrieve FILS discovery template, minimum and maximum packet intervals. Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/20200805011838.28166-3-alokad@codeaurora.org [remove SUPPORTS_FILS_DISCOVERY, driver can just set wiphy info] Link: https://lore.kernel.org/r/010101747a7b3cbb-6edaa89c-436d-4391-8765-61456d7f5f4e-000000@us-west-2.amazonses.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	nl80211: Add FILS discovery support	Aloka Dixit
	FILS discovery attribute, NL80211_ATTR_FILS_DISCOVERY, is nested which supports following parameters as given in IEEE Std 802.11ai-2016, Annex C.3 MIB detail: (1) NL80211_FILS_DISCOVERY_ATTR_INT_MIN - Minimum packet interval (2) NL80211_FILS_DISCOVERY_ATTR_INT_MAX - Maximum packet interval (3) NL80211_FILS_DISCOVERY_ATTR_TMPL - Template data Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/20200805011838.28166-2-alokad@codeaurora.org [fix attribute and other names, use NLA_RANGE(), use policy only once] Link: https://lore.kernel.org/r/010101747a7b38a8-306f06b2-9061-4baf-81c1-054a42a18e22-000000@us-west-2.amazonses.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: fix 80 MHz association to 160/80+80 AP on 6 GHz	John Crispin
	When trying to associate to an AP support 180 or 80+80 MHz on 6 GHz with a STA that only has 80 Mhz support the cf2 field inside the chandef will get set causing the association to fail when trying to validate the chandef. Fix this by checking the support flags prior to setting cf2. Fixes: 57fa5e85d53ce ("mac80211: determine chandef from HE 6 GHz operation") Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200918115304.1135693-1-john@phrozen.org [reword commit message a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: allow bigger A-MSDU sizes in VHT, even if HT is limited	Felix Fietkau
	Some APs (e.g. Asus RT-AC88U) have been observed to report an HT MSDU size limit of 3839 and a VHT limit of 7991. These APs can handle bigger frames than 3839 bytes just fine, so we should remove the VHT limit based on the HT capabilities. This improves tx throughput. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200916164611.8022-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	nl80211: support setting S1G channels	Thomas Pedersen
	S1G channels have a single width defined per frequency, so derive it from the channel flags with ieee80211_s1g_channel_width(). Also support setting an S1G channel where control frequency may differ from operating, and add some basic validation to ensure the control channel is with the operating. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200908190323.15814-6-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	nl80211: correctly validate S1G beacon head	Thomas Pedersen
	The S1G beacon has a different header size than regular beacons, so adjust the beacon head validator. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200908190323.15814-5-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	cfg80211: regulatory: handle S1G channels	Thomas Pedersen
	S1G channels have a minimum bandwidth of 1Mhz, and there is a 1:1 mapping of allowed bandwidth to channel number. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200908190323.15814-4-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	nl80211: advertise supported channel width in S1G	Thomas Pedersen
	S1G supports 5 channel widths: 1, 2, 4, 8, and 16. One channel width is allowed per frequency in each operating class, so it makes more sense to advertise the specific channel width allowed. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200908190323.15814-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	ieee80211: redefine S1G bits with GENMASK	Thomas Pedersen
	The S1G capability fields were defined by ORing BITS() together, and expecting a custom macro to use the _SHIFT definitions. Use the Linux kernel GENMASK for the definitions now, and FIELD_{GET,PREP} to access the fields in the future. Take the chance to rename eg. S1G_CAPAB_B0 to the more compact S1G_CAP0. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200908190323.15814-2-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: reorganize code to remove a forward declaration	Felix Fietkau
	Remove the newly added ieee80211_set_vif_encap_ops declaration. No further code changes. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-15-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: extend ieee80211_tx_status_ext to support bulk free	Felix Fietkau
	Store processed skbs ready to be freed in a list so the driver bulk free them Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-13-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: support using ieee80211_tx_status_ext to free skbs without status info	Felix Fietkau
	For encap-offloaded packets, ieee80211_free_txskb cannot be used, since it does not have the vif pointer. Using ieee80211_tx_status_ext for this purpose has the advantage of being able avoid an extra station lookup for AQL Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-12-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: unify 802.3 (offload) and 802.11 tx status codepath	Felix Fietkau
	Make ieee80211_tx_status_8023 call ieee80211_tx_status_ext, similar to ieee80211_tx_status. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-11-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: optimize station connection monitor	Felix Fietkau
	Calling mod_timer for every rx/tx packet can be quite expensive. Instead of constantly updating the timer, we can simply let it run out and check the timestamp of the last ACK or rx packet to re-arm it. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-9-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: notify the driver when a sta uses 4-address mode	Felix Fietkau
	This is needed for encapsulation offload of 4-address mode packets Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-14-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: swap NEED_TXPROCESSING and HW_80211_ENCAP tx flags	Felix Fietkau
	In order to unify the tx status path, the hw 802.11 encapsulation flag needs to survive the trip to the tx status call. Since we don't have any free bits in info->flags, we need to move one. IEEE80211_TX_INTFL_NEED_TXPROCESSING is only used internally in mac80211, and only before the call into the driver. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-10-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: remove tx status call to ieee80211_sta_register_airtime	Felix Fietkau
	All drivers using airtime fairness are calling ieee80211_sta_register_airtime directly, now they must. Document this as well. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-8-nbd@nbd.name [johannes: update the documentation to suit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: reduce duplication in tx status functions	Felix Fietkau
	Move redundant functionality from __ieee80211_tx_status into ieee80211_tx_status_ext. Preparation for unifying with the 802.3 tx status codepath. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-7-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: rework tx encapsulation offload API	Felix Fietkau
	The current API (which lets the driver turn on/off per vif directly) has a number of limitations: - it does not deal with AP_VLAN - conditions for enabling (no tkip, no monitor) are only checked at add_interface time - no way to indicate 4-addr support In order to address this, store offload flags in struct ieee80211_vif (easy to extend for decap offload later). mac80211 initially sets the enable flag, but gives the driver a chance to modify it before its settings are applied. In addition to the .add_interface op, a .update_vif_offload op is introduced, which can be used for runtime changes. If a driver can't disable encap offload at runtime, or if it has some extra limitations, it can simply override the flags within those ops. Support for encap offload with 4-address mode interfaces can be enabled by setting a flag from .add_interface or .update_vif_offload. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-6-nbd@nbd.name [resolved conflict with commit aa2092a9bab3 ("ath11k: add raw mode and software crypto support")] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: set info->control.hw_key for encap offload packets	Felix Fietkau
	This is needed for drivers that don't do the key lookup themselves Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-5-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: skip encap offload for tx multicast/control packets	Felix Fietkau
	This simplifies the checks in the encap offload tx handler and allows using it in cases where software crypto is used for multicast packets, e.g. when using an AP_VLAN. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-4-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: check and refresh aggregation session in encap offload tx	Felix Fietkau
	Update the last_tx timestamp to avoid tearing down the aggregation session early. Fall back to the slow path if the session setup is still running Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: add missing queue/hash initialization to 802.3 xmit	Felix Fietkau
	Fixes AQL for encap-offloaded tx Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200908123702.88454-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	cfg80211: add more comments for ap_isolate in bss_parameters	Wright Feng
	The value of struct bss_parameters::ap_isolate will be -1, 0 or 1. The value -1 means not to change. To prevent developers from thinking ap_isolate is only 0 or 1, I add more comments on it. Signed-off-by: Wright Feng <wright.feng@cypress.com> Reviewed-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200908060157.98846-1-wright.feng@cypress.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	lib80211: Remove unused macro DRV_NAME	YueHaibing
	There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20200829115506.17828-1-yuehaibing@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: do not allow bigger VHT MPDUs than the hardware supports	Felix Fietkau
	Limit maximum VHT MPDU size by local capability. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20200917125031.45009-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	cfg80211: fix 6 GHz channel conversion	Johannes Berg
	We shouldn't accept any channels bigger than 233, fix that. Reported-by: Amar <asinghal@codeaurora.org> Fixes: d1a1646c0de7 ("cfg80211: adapt to new channelization of the 6GHz band") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200917115222.312ba6f1d461.I3a8c8fbcc3cc019814fd9cd0aced7eb591626136@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: do not disable HE if HT is missing on 2.4 GHz	Wen Gong
	VHT is not supported on 2.4 GHz, but HE is; don't disable HE if HT is missing there, do that only on 5 GHz (6 GHz is only HE). Fixes: 57fa5e85d53ce51 ("mac80211: determine chandef from HE 6 GHz operation") Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/010101747cb617f2-593c5410-1648-4a42-97a0-f3646a5a6dd1-000000@us-west-2.amazonses.com [rewrite the commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: Fix radiotap header channel flag for 6GHz band	Aloka Dixit
	Radiotap header field 'Channel flags' has '2 GHz spectrum' set to 'true' for 6GHz packet. Change it to 5GHz as there isn't a separate option available for 6GHz. Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/010101747ab7b703-1d7c9851-1594-43bf-81f7-f79ce7a67cc6-000000@us-west-2.amazonses.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	lib80211: fix unmet direct dependendices config warning when !CRYPTO	Necip Fazil Yildiran
	When LIB80211_CRYPT_CCMP is enabled and CRYPTO is disabled, it results in unmet direct dependencies config warning. The reason is that LIB80211_CRYPT_CCMP selects CRYPTO_AES and CRYPTO_CCM, which are subordinate to CRYPTO. This is reproducible with CRYPTO disabled and R8188EU enabled, where R8188EU selects LIB80211_CRYPT_CCMP but does not select or depend on CRYPTO. Honor the kconfig menu hierarchy to remove kconfig dependency warnings. Fixes: a11e2f85481c ("lib80211: use crypto API ccm(aes) transform for CCMP processing") Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com> Link: https://lore.kernel.org/r/20200909095452.3080-1-fazilyildiran@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: add AQL support for VHT160 tx rates	Felix Fietkau
	When converting from struct ieee80211_tx_rate to ieee80211_rx_status, there was one check missing to fill in the bandwidth for 160 MHz Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200915085945.3782-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	mac80211: extend AQL aggregation estimation to HE and fix unit mismatch	Felix Fietkau
	The unit of the return value of ieee80211_get_rate_duration is nanoseconds, not microseconds. Adjust the duration checks to account for that. For higher data rates, allow larger estimated aggregation sizes, and add some values for HE as well, which can use much larger aggregates. Since small packets with high data rates can now lead to duration values too small for info->tx_time_est, return a minimum of 4us. Fixes: f01cfbaf9b29 ("mac80211: improve AQL aggregation estimation for low data rates") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20200915085945.3782-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18	iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode	Suravee Suthikulpanit
	Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE") removed an assumption that modify_irte_ga always set the valid bit, which requires the callers to set the appropriate value for the struct irte_ga.valid bit before calling the function. Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE"), which is for the function amd_iommu_deactivate_guest_mode(). The same change is also needed for the amd_iommu_activate_guest_mode(). Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with AVIC enabled. Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20200916111720.43913-1-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-09-18	iommu/amd: Fix potential @entry null deref	Joao Martins
	After commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE"), smatch warns: drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode() warn: variable dereferenced before check 'entry' (see line 3867) Fix this by moving the @valid assignment to after @entry has been checked for NULL. Fixes: 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-09-18	x86/unwind/fp: Fix FP unwinding in ret_from_fork	Josh Poimboeuf
	There have been some reports of "bad bp value" warnings printed by the frame pointer unwinder: WARNING: kernel stack regs at 000000005bac7112 in sh:1014 has bad 'bp' value 0000000000000000 This warning happens when unwinding from an interrupt in ret_from_fork(). If entry code gets interrupted, the state of the frame pointer (rbp) may be undefined, which can confuse the unwinder, resulting in warnings like the above. There's an in_entry_code() check which normally silences such warnings for entry code. But in this case, ret_from_fork() is getting interrupted. It recently got moved out of .entry.text, so the in_entry_code() check no longer works. It could be moved back into .entry.text, but that would break the noinstr validation because of the call to schedule_tail(). Instead, initialize each new task's RBP to point to the task's entry regs via an encoded frame pointer. That will allow the unwinder to reach the end of the stack gracefully. Fixes: b9f6976bfb94 ("x86/entry/64: Move non entry code into .text section") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f366bbf5a8d02e2318ee312f738112d0af74d16f.1600103007.git.jpoimboe@redhat.com
2020-09-17	selftests/bpf: Add tailcall_bpf2bpf tests	Maciej Fijalkowski
	Add four tests to tailcalls selftest explicitly named "tailcall_bpf2bpf_X" as their purpose is to validate that combination of tailcalls with bpf2bpf calls are working properly. These tests also validate LD_ABS from subprograms. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17	bpf: Add abnormal return checks.	Alexei Starovoitov
	LD_[ABS\|IND] instructions may return from the function early. bpf_tail_call pseudo instruction is either fallthrough or return. Allow them in the subprograms only when subprograms are BTF annotated and have scalar return types. Allow ld_abs and tail_call in the main program even if it calls into subprograms. In the past that was not ok to do for ld_abs, since it was JITed with special exit sequence. Since bpf_gen_ld_abs() was introduced the ld_abs looks like normal exit insn from JIT point of view, so it's safe to allow them in the main program. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17	bpf: allow for tailcalls in BPF subprograms for x64 JIT	Maciej Fijalkowski
	Relax verifier's restriction that was meant to forbid tailcall usage when subprog count was higher than 1. Also, do not max out the stack depth of program that utilizes tailcalls. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17	bpf, x64: rework pro/epilogue and tailcall handling in JIT	Maciej Fijalkowski
	This commit serves two things: 1) it optimizes BPF prologue/epilogue generation 2) it makes possible to have tailcalls within BPF subprogram Both points are related to each other since without 1), 2) could not be achieved. In [1], Alexei says: "The prologue will look like: nop5 xor eax,eax // two new bytes if bpf_tail_call() is used in this // function push rbp mov rbp, rsp sub rsp, rounded_stack_depth push rax // zero init tail_call counter variable number of push rbx,r13,r14,r15 Then bpf_tail_call will pop variable number rbx,.. and final 'pop rax' Then 'add rsp, size_of_current_stack_frame' jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov rbp, rsp' This way new function will set its own stack size and will init tail call counter with whatever value the parent had. If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'. Instead it would need to have 'nop2' in there." Implement that suggestion. Since the layout of stack is changed, tail call counter handling can not rely anymore on popping it to rbx just like it have been handled for constant prologue case and later overwrite of rbx with actual value of rbx pushed to stack. Therefore, let's use one of the register (%rcx) that is considered to be volatile/caller-saved and pop the value of tail call counter in there in the epilogue. Drop the BUILD_BUG_ON in emit_prologue and in emit_bpf_tail_call_indirect where instruction layout is not constant anymore. Introduce new poke target, 'tailcall_bypass' to poke descriptor that is dedicated for skipping the register pops and stack unwind that are generated right before the actual jump to target program. For case when the target program is not present, BPF program will skip the pop instructions and nop5 dedicated for jmpq $target. An example of such state when only R6 of callee saved registers is used by program: ffffffffc0513aa1: e9 0e 00 00 00 jmpq 0xffffffffc0513ab4 ffffffffc0513aa6: 5b pop %rbx ffffffffc0513aa7: 58 pop %rax ffffffffc0513aa8: 48 81 c4 00 00 00 00 add $0x0,%rsp ffffffffc0513aaf: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ffffffffc0513ab4: 48 89 df mov %rbx,%rdi When target program is inserted, the jump that was there to skip pops/nop5 will become the nop5, so CPU will go over pops and do the actual tailcall. One might ask why there simply can not be pushes after the nop5? In the following example snippet: ffffffffc037030c: 48 89 fb mov %rdi,%rbx (...) ffffffffc0370332: 5b pop %rbx ffffffffc0370333: 58 pop %rax ffffffffc0370334: 48 81 c4 00 00 00 00 add $0x0,%rsp ffffffffc037033b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ffffffffc0370340: 48 81 ec 00 00 00 00 sub $0x0,%rsp ffffffffc0370347: 50 push %rax ffffffffc0370348: 53 push %rbx ffffffffc0370349: 48 89 df mov %rbx,%rdi ffffffffc037034c: e8 f7 21 00 00 callq 0xffffffffc0372548 There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall and jump target is not present. ctx is in %rbx register and BPF subprogram that we will call into on ffffffffc037034c is relying on it, e.g. it will pick ctx from there. Such code layout is therefore broken as we would overwrite the content of %rbx with the value that was pushed on the prologue. That is the reason for the 'bypass' approach. Special care needs to be taken during the install/update/remove of tailcall target. In case when target program is not present, the CPU must not execute the pop instructions that precede the tailcall. To address that, the following states can be defined: A nop, unwind, nop B nop, unwind, tail C skip, unwind, nop D skip, unwind, tail A is forbidden (lead to incorrectness). The state transitions between tailcall install/update/remove will work as follows: First install tail call f: C->D->B(f) * poke the tailcall, after that get rid of the skip Update tail call f to f': B(f)->B(f') * poke the tailcall (poke->tailcall_target) and do NOT touch the poke->tailcall_bypass Remove tail call: B(f')->C(f') * poke->tailcall_bypass is poked back to jump, then we wait the RCU grace period so that other programs will finish its execution and after that we are safe to remove the poke->tailcall_target Install new tail call (f''): C(f')->D(f'')->B(f''). * same as first step This way CPU can never be exposed to "unwind, tail" state. Last but not least, when tailcalls get mixed with bpf2bpf calls, it would be possible to encounter the endless loop due to clearing the tailcall counter if for example we would use the tailcall3-like from BPF selftests program that would be subprogram-based, meaning the tailcall would be present within the BPF subprogram. This test, broken down to particular steps, would do: entry -> set tailcall counter to 0, bump it by 1, tailcall to func0 func0 -> call subprog_tail (we are NOT skipping the first 11 bytes of prologue and this subprogram has a tailcall, therefore we clear the counter...) subprog -> do the same thing as entry and then loop forever. To address this, the idea is to go through the call chain of bpf2bpf progs and look for a tailcall presence throughout whole chain. If we saw a single tail call then each node in this call chain needs to be marked as a subprog that can reach the tailcall. We would later feed the JIT with this info and: - set eax to 0 only when tailcall is reachable and this is the entry prog - if tailcall is reachable but there's no tailcall in insns of currently JITed prog then push rax anyway, so that it will be possible to propagate further down the call chain - finally if tailcall is reachable, then we need to precede the 'call' insn with mov rax, [rbp - (stack_depth + 8)] Tail call related cases from test_verifier kselftest are also working fine. Sample BPF programs that utilize tail calls (sockex3, tracex5) work properly as well. [1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/ Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17	bpf: Limit caller's stack depth 256 for subprogs with tailcalls	Maciej Fijalkowski
	Protect against potential stack overflow that might happen when bpf2bpf calls get combined with tailcalls. Limit the caller's stack depth for such case down to 256 so that the worst case scenario would result in 8k stack size (32 which is tailcall limit * 256 = 8k). Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17	Merge branch 'for-5.9-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu fix from Dennis Zhou: "This is a fix for the first chunk size calculation where the variable length array incorrectly used the number of longs instead of bytes of longs. This came in as a code fix and not a bug report, so I don't think it was widely problematic. I believe it worked out due to it being memblock memory and alignment requirements working in our favor" * 'for-5.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: fix first chunk size calculation for populated bitmap
2020-09-17	mptcp: fix integer overflow in mptcp_subflow_discard_data()	Paolo Abeni
	Christoph reported an infinite loop in the subflow receive path under stress condition. If there are multiple subflows, each of them using a large send buffer, the delta between the sequence number used by MPTCP-level retransmission can and the current msk->ack_seq can be greater than MAX_INT. In the above scenario, when calling mptcp_subflow_discard_data(), such delta will be truncated to int, and could result in a negative number: no bytes will be dropped, and subflow_check_data_avail() will try again to process the same packet, looping forever. This change addresses the issue by expanding the 'limit' size to 64 bits, so that overflows are not possible anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/87 Fixes: 6719331c2f73 ("mptcp: trigger msk processing even for OoO data") Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net/smc: fix double kfree in smc_listen_work()	Ursula Braun
	If smc_listen_rmda_finish() returns with an error, the storage addressed by 'buf' is freed a second time. Consolidate freeing under a common label and jump to that label. Fixes: 6bb14e48ee8d ("net/smc: dynamic allocation of CLC proposal buffer") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	ionic: add DIMLIB to Kconfig	Shannon Nelson
	>> ld.lld: error: undefined symbol: net_dim_get_rx_moderation >>> referenced by ionic_lif.c:52 (drivers/net/ethernet/pensando/ionic/ionic_lif.c:52) >>> net/ethernet/pensando/ionic/ionic_lif.o:(ionic_dim_work) in archive drivers/built-in.a >> ld.lld: error: undefined symbol: net_dim >>> referenced by ionic_txrx.c:456 (drivers/net/ethernet/pensando/ionic/ionic_txrx.c:456) >>> net/ethernet/pensando/ionic/ionic_txrx.o:(ionic_dim_update) in archive drivers/built-in.a v2: removed sketchy dashes in commit message Fixes: 04a834592bf5 ("ionic: dynamic interrupt moderation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	nfp: use correct define to return NONE fec	Jakub Kicinski
	struct ethtool_fecparam carries bitmasks not bit numbers. We want to return 1 (NONE), not 0. Fixes: 0d0870938337 ("nfp: implement ethtool FEC mode settings") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-17	net: remove comments on struct rtnl_link_stats	Jakub Kicinski
	We removed the misleading comments from struct rtnl_link_stats64 when we added proper kdoc. struct rtnl_link_stats has the same inline comments, so remove them, too. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>