summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-06-23cfg80211: remove ieee80211_get_he_sta_cap()Johannes Berg
This function turned out to be too easy to misuse since it doesn't consider the interface type. Remove it now that we no longer use it in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.8c9c72f914b0.I68e9c0626dc77a0f67f238a05ae16a0b77b09895@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23ieee80211: add defines for HE PHY cap byte 10Johannes Berg
One bit out of the previously completely reserved byte 10 in the PHY capabilities is used since 802.11ax D7.0, add a new define for it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c026feb3873d.I380f52a05ddb4153bc77ff7f276a3484819f69b2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23nl80211/cfg80211: add BSS color to NDP ranging parametersAvraham Stern
In NDP ranging, the initiator need to set the BSS color in the NDP to the BSS color of the responder. Add the BSS color as a parameter for NDP ranging. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.f097a6144b59.I27dec8b994df52e691925ea61be4dd4fa6d396c0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23mac80211: add to bss_conf if broadcast TWT is supportedShaul Triebitz
Add to struct ieee80211_bss_conf a twt_broadcast field. Set it to true if both STA and AP support broadcast TWT. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.f7c105237541.I50b302044e2b35e5ed4d3fb8bc7bd3d8bb89b1e1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23ieee80211: define timing measurement in extended capabilities IEKrishnanand Prabhu
Define the bit used for timing measurement support in extended capabilities IE, used for time synchronization. Signed-off-by: Krishnanand Prabhu <krishnanand.prabhu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.b75f40765538.I92b50e43e29272c97d17ed5f37f216f4caf0f205@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23ieee80211: add the value for Category '6' in "rtw_ieee80211_category"Christophe JAILLET
Preparation work for removing the "enum rtw_ieee80211_category" in "drivers/staging/rtl8188eu/include/ieee80211.h" and "drivers/staging/rtl8723bs/include/ieee80211.h". This enum is similar to "enum ieee80211_category" from "include/linux/ieee80211.h". However it defines the value '6' as RTW_WLAN_CATEGORY_FT. So add a corresponding value in "ieee80211_category" Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/66be0187869bd7dae1c0b0785a32db695ee9872e.1624108556.git.christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23mac80211: move A-MPDU session check from minstrel_ht to mac80211Felix Fietkau
This avoids calling back into tx handlers from within the rate control module. Preparation for deferring rate control until tx dequeue Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210617163113.75815-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23cfg80211: expose the rfkill device to the low level driverEmmanuel Grumbach
This will allow the low level driver to query the rfkill state. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20210616202826.9833-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23mac80211: add ieee80211_is_tx_data helper functionPhilipp Borgers
Add a helper function that checks if a frame is a data frame. Frames with hardware encapsulation enabled are data frames. Signed-off-by: Philipp Borgers <borgers@mi.fu-berlin.de> Link: https://lore.kernel.org/r/20210519122019.92359-2-borgers@mi.fu-berlin.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23cfg80211: remove CFG80211_MAX_NUM_DIFFERENT_CHANNELSJohannes Berg
We no longer need to put any limits here, hardware will and mac80211-hwsim can do whatever it likes. The reason we had this was some accounting code (still mentioned in the comment) but that code was deleted in commit c781944b71f8 ("cfg80211: Remove unused cfg80211_can_use_iftype_chan()"). Link: https://lore.kernel.org/r/20210506221159.d1d61db1d31c.Iac4da68d54b9f1fdc18a03586bbe06aeb9515425@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-22net/xfrm: Add inner_ipproto into sec_pathHuy Nguyen
The inner_ipproto saves the inner IP protocol of the plain text packet. This allows vendor's IPsec feature making offload decision at skb's features_check and configuring hardware at ndo_start_xmit. For example, ConnectX6-DX IPsec device needs the plaintext's IP protocol to support partial checksum offload on VXLAN/GENEVE packet over IPsec transport mode tunnel. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22mptcp: add allow_join_id0 in mptcp_out_optionsGeliang Tang
This patch defined a new flag MPTCP_CAP_DENY_JOIN_ID0 for the third bit, labeled "C" of the MP_CAPABLE option. Add a new flag allow_join_id0 in struct mptcp_out_options. If this flag is set, send out the MP_CAPABLE option with the flag MPTCP_CAP_DENY_JOIN_ID0. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: process sctp over udp icmp err on sctp sideXin Long
Previously, sctp over udp was using udp tunnel's icmp err process, which only does sk lookup on sctp side. However for sctp's icmp error process, there are more things to do, like syncing assoc pmtu/retransmit packets for toobig type err, and starting proto_unreach_timer for unreach type err etc. Now after adding PLPMTUD, which also requires to process toobig type err on sctp side. This patch is to process icmp err on sctp side by parsing the type/code/info in .encap_err_lookup and call sctp's icmp processing functions. Note as the 'redirect' err process needs to know the outer ip(v6) header's, we have to leave it to udp(v6)_err to handle it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: do state transition when a probe succeeds on HB ACK recv pathXin Long
As described in rfc8899#section-5.2, when a probe succeeds, there might be the following state transitions: - Base -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_BIG_STEP, - Error -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU, pl.probe_size increases by SCTP_PL_BIG_STEP. - Search -> Search Complete, occurs when probe succeeds with the probe size SCTP_MAX_PLPMTU less than pl.probe_high, pl.pmtu is not changing, but update *pathmtu* with it, pl.probe_size is set back to pl.pmtu to double check it. - Search Complete -> Search, occurs when probe succeeds with the probe size equal to pl.pmtu, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_MIN_STEP. So search process can be described as: 1. When it just enters 'Search' state, *pathmtu* is not updated with pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP) each round. 2. Until pl.probe_high is set when a probe fails, and probe_size decreases back to pl.pmtu, as described in the last patch. 3. When the probe with the new size succeeds, probe_size changes to increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high is set. 4. Until probe_size is next to pl.probe_high, the searching finishes and it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and then probe_size is set to pl.pmtu to confirm by once more probe. 5. This probe occurs after "30 * probe_inteval", a much longer time than that in Search state. Once it is done it goes to 'Search' state again with probe_size increased by SCTP_PL_MIN_STEP. As we can see above, during the searching, pl.pmtu changes while *pathmtu* doesn't. *pathmtu* is only updated when the search finishes by which it gets an optimal value for it. A big step is used at the beginning until it gets close to the optimal value, then it changes to a small step until it has this optimal value. The small step is also used in 'Complete' until it goes to 'Search' state again and the probe with 'pmtu + the small step' succeeds, which means a higher size could be used. Then probe_size changes to increase by a big step again until it gets close to the next optimal value. Note that anytime when black hole is detected, it goes directly to 'Base' state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send pathXin Long
The state transition is described in rfc8899#section-5.2, PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the state transition includes: - Base -> Error, occurs when BASE_PLPMTU Confirmation Fails, pl.pmtu is set to SCTP_MIN_PLPMTU, probe_size is still SCTP_BASE_PLPMTU; - Search -> Base, occurs when Black Hole Detected, pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU; - Search Complete -> Base, occurs when Black Hole Detected pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU; Note a black hole is encountered when a sender is unaware that packets are not being delivered to the destination endpoint. So it includes the probe failures with equal probe_size to pl.pmtu, and definitely not include that with greater probe_size than pl.pmtu. The later one is the normal probe failure where probe_size should decrease back to pl.pmtu and pl.probe_high is set. pl.probe_high would be used on HB ACK recv path in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: do the basic send and recv for PLPMTUD probeXin Long
This patch does exactly what rfc8899#section-6.2.1.2 says: The SCTP sender needs to be able to determine the total size of a probe packet. The HEARTBEAT chunk could carry a Heartbeat Information parameter that includes, besides the information suggested in [RFC4960], the probe size to help an implementation associate a HEARTBEAT ACK with the size of probe that was sent. The sender could also use other methods, such as sending a nonce and verifying the information returned also contains the corresponding nonce. The length of the PAD chunk is computed by reducing the probing size by the size of the SCTP common header and the HEARTBEAT chunk. Note that HB ACK chunk will carry back whatever HB chunk carried, including the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK against link->pl.probe_size to validate this HB ACK chunk. v1->v2: - Remove the unused 'sp' and add static for sctp_packet_bundle_pad(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: add the probe timer in transport for PLPMTUDXin Long
There are 3 timers described in rfc8899#section-5.1.1: PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER This patches adds a 'probe_timer' in transport, and it works as either PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER and expires every a 'probe_interval' time to send the HB probe packet. When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER and expires in 'probe_interval * 30' time to go back to SEARCH state and do searching again. SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed. The timer will start when transport pl enters BASE state and stop when it enters DISABLED state. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: add the constants/variables and states and some APIs for transportXin Long
These are 4 constants described in rfc8899#section-5.1.2: MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU; And 2 variables described in rfc8899#section-5.1.3: PROBED_SIZE, PROBE_COUNT; And 5 states described in rfc8899#section-5.2: DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR; And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is enabled, and calculate the additional headers length for a transport. Note the member 'probe_high' in transport will be set to the probe size when a probe fails with this probe size in the next patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transportXin Long
With this socket option, users can change probe_interval for a transport, asoc or sock after it's created. Note that if the change is for an asoc, also apply the change to each transport in this asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: add probe_interval in sysctl and sock/asoc/transportXin Long
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'. 'n' is the interval for PLPMTUD probe timer in milliseconds, and it can't be less than 5000 if it's not 0. All asoc/transport's PLPMTUD in a new socket will be enabled by default. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22sctp: add pad chunk and its make function and event tableXin Long
This chunk is defined in rfc4820#section-3, and used to pad an SCTP packet. The receiver must discard this chunk and continue processing the rest of the chunks in the packet. Add it now, as it will be bundled with a heartbeat chunk to probe pmtu in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22compiler_attributes.h: cleanups for GCC 4.9+Nick Desaulniers
Since commit 6ec4476ac825 ("Raise gcc version requirement to 4.9") we no longer support building the kernel with GCC 4.8; drop the preprocess checks for __GNUC_MINOR__ version. It's implied that if __GNUC_MAJOR__ is 4, then the only supported version of __GNUC_MINOR__ left is 9. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210621231822.2848305-3-ndesaulniers@google.com
2021-06-22compiler_attributes.h: define __no_profile, add to noinstrNick Desaulniers
noinstr implies that we would like the compiler to avoid instrumenting a function. Add support for the compiler attribute no_profile_instrument_function to compiler_attributes.h, then add __no_profile to the definition of noinstr. Link: https://lore.kernel.org/lkml/20210614162018.GD68749@worktop.programming.kicks-ass.net/ Link: https://reviews.llvm.org/D104257 Link: https://reviews.llvm.org/D104475 Link: https://reviews.llvm.org/D104658 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210621231822.2848305-2-ndesaulniers@google.com
2021-06-22ethtool: Use kernel data types for internal EEPROM structIdo Schimmel
The struct is not visible to user space and therefore should not use the user visible data types. Instead, use internal data types like other structures in the file. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22ethtool: Document correct attribute typeIdo Schimmel
'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22wwan: core: add WWAN common private data for netdevSergey Ryazanov
The WWAN core not only multiplex the netdev configuration data, but process it too, and needs some space to store its private data associated with the netdev. Add a structure to keep common WWAN core data. The structure will be stored inside the netdev private data before WWAN driver private data and have a field to make it easier to access the driver data. Also add a helper function that simplifies drivers access to their data. At the moment we use the common WWAN private data to store the WWAN data link (channel) id at the time the link is created, and report it back to user using the .fill_info() RTNL callback. This should help the user to be aware which network interface is bound to which WWAN device data channel. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22wwan: core: support default netdev creationSergey Ryazanov
Most, if not each WWAN device driver will create a netdev for the default data channel. Therefore, add an option for the WWAN netdev ops registration function to create a default netdev for the WWAN device. A WWAN device driver should pass a default data channel link id to the ops registering function to request the creation of a default netdev, or a special value WWAN_NO_DEFAULT_LINK to inform the WWAN core that the default netdev should not be created. For now, only wwan_hwsim utilize the default link creation option. Other drivers will be reworked next. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22wwan: core: no more hold netdev ops owning moduleSergey Ryazanov
The WWAN netdev ops owner holding was used to protect from the unexpected memory disappear. This approach causes a dependency cycle (driver -> core -> driver) and effectively prevents a WWAN driver unloading. E.g. WWAN hwsim could not be unloaded until all simulated devices are removed: ~# modprobe wwan_hwsim devices=2 ~# lsmod | grep wwan wwan_hwsim 16384 2 wwan 20480 1 wwan_hwsim ~# rmmod wwan_hwsim rmmod: ERROR: Module wwan_hwsim is in use ~# echo > /sys/kernel/debug/wwan_hwsim/hwsim0/destroy ~# echo > /sys/kernel/debug/wwan_hwsim/hwsim1/destroy ~# lsmod | grep wwan wwan_hwsim 16384 0 wwan 20480 1 wwan_hwsim ~# rmmod wwan_hwsim For a real device driver this will cause an inability to unload module until a served device is physically detached. Since the last commit we are removing all child netdev(s) when a driver unregister the netdev ops. This allows us to permit the driver unloading, since any sane driver will call ops unregistering on a device deinitialization. So, remove the holding of an ops owner to make it easier to unload a driver module. The owner field has also beed removed from the ops structure as there are no more users of this field. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22net: mdiobus: Introduce fwnode_mdbiobus_register()Marcin Wojtas
This patch introduces a new helper function that wraps acpi_/of_ mdiobus_register() and allows its usage via common fwnode_ interface. Fall back to raw mdiobus_register() in case CONFIG_FWNODE_MDIO is not enabled, in order to satisfy compatibility in all future user drivers. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22clocksource: Provide kernel module to test clocksource watchdogPaul E. McKenney
When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. It would be good to have a way of testing the clocksource watchdog's ability to distinguish between these two causes of clock skew and instability. Therefore, provide a new clocksource-wdtest module selected by a new TEST_CLOCKSOURCE_WATCHDOG Kconfig option. This module has a single module parameter named "holdoff" that provides the number of seconds of delay before testing should start, which defaults to zero when built as a module and to 10 seconds when built directly into the kernel. Very large systems that boot slowly may need to increase the value of this module parameter. This module uses hand-crafted clocksource structures to do its testing, thus avoiding messing up timing for the rest of the kernel and for user applications. This module first verifies that the ->uncertainty_margin field of the clocksource structures are set sanely. It then tests the delay-detection capability of the clocksource watchdog, increasing the number of consecutive delays injected, first provoking console messages complaining about the delays and finally forcing a clock-skew event. Unexpected test results cause at least one WARN_ON_ONCE() console splat. If there are no splats, the test has passed. Finally, it fuzzes the value returned from a clocksource to test the clocksource watchdog's ability to detect time skew. This module checks the state of its clocksource after each test, and uses WARN_ON_ONCE() to emit a console splat if there are any failures. This should enable all types of test frameworks to detect any such failures. This facility is intended for diagnostic use only, and should be avoided on production systems. Reported-by: Chris Mason <clm@fb.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
2021-06-22clocksource: Reduce clocksource-skew thresholdPaul E. McKenney
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed by more than 12.5% in order to be marked unstable. Except that a clock that is skewed by that much is probably destroying unsuspecting software right and left. And given that there are now checks for false-positive skews due to delays between reading the two clocks, it should be possible to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks such as TSC. Therefore, add a new uncertainty_margin field to the clocksource structure that contains the maximum uncertainty in nanoseconds for the corresponding clock. This field may be initialized manually, as it is for clocksource_tsc_early and clocksource_jiffies, which is copied to refined_jiffies. If the field is not initialized manually, it will be computed at clock-registry time as the period of the clock in question based on the scale and freq parameters to __clocksource_update_freq_scale() function. If either of those two parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative to dividing by zero. No matter how the uncertainty_margin field is calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100 microseconds. Note that manually initialized uncertainty_margin fields are not adjusted, but there is a WARN_ON_ONCE() that triggers if any such field is less than twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage production use of the one-nanosecond uncertainty_margin values that are used to test the clock-skew code itself. The actual clock-skew check uses the sum of the uncertainty_margin fields of the two clocksource structures being compared. Integer overflow is avoided because the largest computed value of the uncertainty_margin fields is one billion (10^9), and double that value fits into an unsigned int. However, if someone manually specifies (say) UINT_MAX, they will get what they deserve. Note that the refined_jiffies uncertainty_margin field is initialized to TICK_NSEC, which means that skew checks involving this clocksource will be sufficently forgiving. In a similar vein, the clocksource_tsc_early uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which replicates the current behavior and allows custom setting if needed in order to address the rare skews detected for this clocksource in current mainline. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
2021-06-22clocksource: Check per-CPU clock synchronization when marked unstablePaul E. McKenney
Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? Therefore apply CPU-to-CPU synchronization checking to newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
2021-06-22futex: Provide FUTEX_LOCK_PI2 to support clock selectionThomas Gleixner
The FUTEX_LOCK_PI futex operand uses a CLOCK_REALTIME based absolute timeout since it was implemented, but it does not require that the FUTEX_CLOCK_REALTIME flag is set, because that was introduced later. In theory as none of the user space implementations can set the FUTEX_CLOCK_REALTIME flag on this operand, it would be possible to creatively abuse it and make the meaning invers, i.e. select CLOCK_REALTIME when not set and CLOCK_MONOTONIC when set. But that's a nasty hackery. Another option would be to have a new FUTEX_CLOCK_MONOTONIC flag only for FUTEX_LOCK_PI, but that's also awkward because it does not allow libraries to handle the timeout clock selection consistently. So provide a new FUTEX_LOCK_PI2 operand which implements the timeout semantics which the other operands use and leave FUTEX_LOCK_PI alone. Reported-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210422194705.440773992@linutronix.de
2021-06-22Merge branch kvm-arm64/mmu/mte into kvmarm-master/nextMarc Zyngier
KVM/arm64 support for MTE, courtesy of Steven Price. It allows the guest to use memory tagging, and offers a new userspace API to save/restore the tags. * kvm-arm64/mmu/mte: KVM: arm64: Document MTE capability and ioctl KVM: arm64: Add ioctl to fetch/store tags in a guest KVM: arm64: Expose KVM_ARM_CAP_MTE KVM: arm64: Save/restore MTE registers KVM: arm64: Introduce MTE VM feature arm64: mte: Sync tags for pages where PTE is untagged Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-22KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capabilityBharata B Rao
Now that we have H_RPT_INVALIDATE fully implemented, enable support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com
2021-06-22KVM: arm64: Add ioctl to fetch/store tags in a guestSteven Price
The VMM may not wish to have it's own mapping of guest memory mapped with PROT_MTE because this causes problems if the VMM has tag checking enabled (the guest controls the tags in physical RAM and it's unlikely the tags are correct for the VMM). Instead add a new ioctl which allows the VMM to easily read/write the tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE while the VMM can still read/write the tags for the purpose of migration. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
2021-06-22KVM: arm64: Introduce MTE VM featureSteven Price
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This will expose the feature to the guest and automatically tag memory pages touched by the VM as PG_mte_tagged (and clear the tag storage) to ensure that the guest cannot see stale tags, and so that the tags are correctly saved/restored across swap. Actually exposing the new capability to user space happens in a later patch. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> [maz: move VM_SHARED sampling into the critical section] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-3-steven.price@arm.com
2021-06-22btrfs: rip out may_commit_transactionJosef Bacik
may_commit_transaction was introduced before the ticketing infrastructure existed. There was a problem where we'd legitimately be out of space, but every reservation would trigger a transaction commit and then fail. Thus if you had 1000 things trying to make a reservation, they'd all do the flushing loop and thus commit the transaction 1000 times before they'd get their ENOSPC. This helper was introduced to short circuit this, if there wasn't space that could be reclaimed by committing the transaction then simply ENOSPC out. This made true ENOSPC tests much faster as we didn't waste a bunch of time. However many of our bugs over the years have been from cases where we didn't account for some space that would be reclaimed by committing a transaction. The delayed refs rsv space, delayed rsv, many pinned bytes miscalculations, etc. And in the meantime the original problem has been solved with ticketing. We no longer will commit the transaction 1000 times. Instead we'll get 1000 waiters, we will go through the flushing mechanisms, and if there's no progress after 2 loops we ENOSPC everybody out. The ticketing infrastructure gives us a deterministic way to see if we're making progress or not, thus we avoid a lot of extra work. So simplify this step by simply unconditionally committing the transaction. This removes what is arguably our most common source of early ENOSPC bugs and will allow us to drastically simplify many of the things we track because we simply won't need them with this stuff gone. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-22btrfs: fix typos in commentsDavid Sterba
Fix typos that have snuck in since the last round. Found by codespell. Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-22locking/lockdep: Improve noinstr vs errorsPeter Zijlstra
Better handle the failure paths. vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section debug_locks_off+0x19/0x40: instrument_atomic_write at include/linux/instrumented.h:86 (inlined by) __debug_locks_off at include/linux/debug_locks.h:17 (inlined by) debug_locks_off at lib/debug_locks.c:41 Fixes: 6eebad1ad303 ("lockdep: __always_inline more for noinstr") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210621120120.784404944@infradead.org
2021-06-22spi: add ancillary device supportSebastian Reichel
Introduce support for ancillary devices, similar to existing implementation for I2C. This is useful for devices having multiple chip-selects, for example some microcontrollers provide a normal SPI interface and a flashing SPI interface. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20210621175359.126729-2-sebastian.reichel@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-22lib/dump_stack: move cpu lock to printk.cJohn Ogness
dump_stack() implements its own cpu-reentrant spinning lock to best-effort serialize stack traces in the printk log. However, there are other functions (such as show_regs()) that can also benefit from this serialization. Move the cpu-reentrant spinning lock (cpu lock) into new helper functions printk_cpu_lock_irqsave()/printk_cpu_unlock_irqrestore() so that it is available for others as well. For !CONFIG_SMP the cpu lock is a NOP. Note that having multiple cpu locks in the system can easily lead to deadlock. Code needing a cpu lock should use the printk cpu lock, since the printk cpu lock could be acquired from any code and any context. Also note that it is not necessary for a cpu lock to disable interrupts. However, in upcoming work this cpu lock will be used for emergency tasks (for example, atomic consoles during kernel crashes) and any interruptions while holding the cpu lock should be avoided if possible. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [pmladek@suse.com: Backported on top of 5.13-rc1.] Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210617095051.4808-2-john.ogness@linutronix.de
2021-06-21net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()Guillaume Nault
Similar to commit 3b707c3008ca ("net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP"), add ARPHRD_IP6GRE to dev_is_mac_header_xmit(), to make ip6gre compatible with act_mirred and __bpf_redirect(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net: dsa: export the dsa_port_is_{user,cpu,dsa} helpersVladimir Oltean
The difference between dsa_is_user_port and dsa_port_is_user is that the former needs to look up the list of ports of the DSA switch tree in order to find the struct dsa_port, while the latter directly receives it as an argument. dsa_is_user_port is already in widespread use and has its place, so there isn't any chance of converting all callers to a single form. But being able to do: dsa_port_is_user(dp) instead of dsa_is_user_port(dp->ds, dp->index) is much more efficient too, especially when the "dp" comes from an iterator over the DSA switch tree - this reduces the complexity from quadratic to linear. Move these helpers from dsa2.c to include/net/dsa.h so that others can use them too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net: sched: add barrier to ensure correct ordering for lockless qdiscYunsheng Lin
The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc"). But it turns out that spin_trylock() only has load-acquire semantic, for strongly-ordered system(like x86), the compiler barrier implicitly contained in spin_trylock() seems enough to ensure the correct ordering. But for weakly-orderly system (like arm64), the store-release semantic is needed to ensure the correct ordering as clear_bit() and test_bit() is store operation, see queued_spin_lock(). So add the explicit barrier to ensure the correct ordering for the above case. Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge series "Extend regulator notification support" from Matti Vaittinen ↵Mark Brown
<matti.vaittinen@fi.rohmeurope.com>: Extend regulator notification support This series extends the regulator notification and error flag support. Initial discussion on the topic can be found here: https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.camel@fi.rohmeurope.com/ In a nutshell - the series adds: 1. WARNING level events/error flags. (Patch 3) Current regulator 'ERROR' event notifications for over/under voltage, over current and over temperature are used to indicate condition where monitored entity is so badly "off" that it actually indicates a hardware error which can not be recovered. The most typical hanling for that is believed to be a (graceful) system-shutdown. Here we add set of 'WARNING' level flags to allow sending notifications to consumers before things are 'that badly off' so that consumer drivers can implement recovery-actions. 2. Device-tree properties for specifying limit values. (Patches 1, 5) Add limits for above mentioned 'ERROR' and 'WARNING' levels (which send notifications to consumers) and also for a 'PROTECTION' level (which will be used to immediately shut-down the regulator(s) W/O informing consumer drivers. Typically implemented by hardware). Property parsing is implemented in regulator core which then calls callback operations for limit setting from the IC drivers. A warning is emitted if protection is requested by device tree but the underlying IC does not support configuring requested protection. 3. Helpers which can be registered by IC. (Patch 4) Target is to avoid implementing IRQ handling and IRQ storm protection in each IC driver. (Many of the ICs implementin these IRQs do not allow masking or acking the IRQ but keep the IRQ asserted for the whole duration of problem keeping the processor in IRQ handling loop). 4. Emergency poweroff function (refactored out of the thermal_core to kernel/reboot.c) which is called if IC fires error IRQs but IC reading fails and given retry-count is exceeded. (Patches 2, 4) Please note that the mutex in the emergency shutdown was replaced by a simple atomic in order to allow call from any context. The helper was attempted to be done so it could be used to implement roughly same logic as is used in qcom-labibb regulator. This means amongst other things a safety shut-down if IC registers are not readable. Using these shut-down retry counters are optional. The idea is that the helper could be also used by simpler ICs which do not provide status register(s) which can be used to check if error is still active. ICs which do not have such status register can simply omit the 'renable' callback (and retry-counts etc) - and helper assumes the situation is Ok and re-enables IRQ after given time period. If problem persists the handler is ran again and another notification is sent - but at least the delay allows processor to avoid IRQ loop. Patch 7 takes this notification support in use at BD9576MUF. Patch 8 is related to MFD change which is not really related to the RFC here. It was added to this series in order to avoid potential conflicts. Patch 9 adds a maintainers entry. Changelog v10-RESEND: - rebased on v5.13-rc4 Changelog v10: - rebased on v5.13-rc2 - Move rdev_*() print macros to the internal.h and use rdev_dbg() from irq_helpers.c - Export rdev_get_name() and move it from coupler.h to driver.h for others to use. (It was already in coupler.h but not exported - usage was limited and coupler.h does not sound like optimal place as rdev_name is not only used by coupled regulators) - Send all regulator notifications from irq_helpers.c at one OR'd event for the sake of simplicity. For BD9576 this does not matter as it has own IRQ for each event case. Header defining events says they may be OR'd. - Change WARN() at protection shutdown to pr_emerg as suggested by Petr. Changelog v9: - rebases on v5.13-rc1 - Update thermal documentation - Fix regulator notification event number Changelog v8: - split shutdown API adding and thermal core taking it in use to own patches. - replace the spinlock with atomic when ensuring the emergency shutdown is only called once. Changelog v7: general: - rebased on v5.12-rc7 - new patch for refactoring the hw-failure reboot logic out of thermal_core.c for others to use. notification helpers: - fix regulator error_flags query - grammar/typos - do not BUG() but attempt to shut-down the system - use BITS_PER_TYPE() Changelog v6: Add MAINTAINERS entry Changes to IRQ notifiers - move devm functions to drivers/regulator/devres.c - drop irq validity check - use devm_add_action_or_reset() - fix styling issues - fix kerneldocs Changelog v5: - Fix the badly formatted pr_emerg() call. Changelog v4: - rebased on v5.12-rc6 - dropped RFC - fix external FET DT-binding. - improve prints for cases when expecting HW failure. - styling and typos Changelog v3: Regulator core: - Fix dangling pointer access at regulator_irq_helper() stpmic1_regulator: - fix function prototype (compile error) bd9576-regulator: - Update over current limits to what was given in new data-sheet (REV00K) - Allow over-current monitoring without external FET. Set limits to values given in data-sheet (REV00K). Changelog v2: Generic: - rebase on v5.12-rc2 + BD9576 series - Split devm variant of delayed wq to own series Regulator framework: - Provide non devm variant of IRQ notification helpers - shorten dt-property names as suggested by Rob - unconditionally call map_event in IRQ handling and require it to be populated BD9576 regulators: - change the FET resistance property to micro-ohms - fix voltage computation in OC limit setting
2021-06-21skmsg: Improve udp_bpf_recvmsg() accuracyCong Wang
I tried to reuse sk_msg_wait_data() for different protocols, but it turns out it can not be simply reused. For example, UDP actually uses two queues to receive skb: udp_sk(sk)->reader_queue and sk->sk_receive_queue. So we have to check both of them to know whether we have received any packet. Also, UDP does not lock the sock during BH Rx path, it makes no sense for its ->recvmsg() to lock the sock. It is always possible for ->recvmsg() to be called before packets actually arrive in the receive queue, we just use best effort to make it accurate here. Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-2-xiyou.wangcong@gmail.com
2021-06-21btrfs: pass btrfs_inode to btrfs_writepage_endio_finish_ordered()Qu Wenruo
There is a pretty bad abuse of btrfs_writepage_endio_finish_ordered() in end_compressed_bio_write(). It passes compressed pages to btrfs_writepage_endio_finish_ordered(), which is only supposed to accept inode pages. Thankfully the important info here is the inode, so let's pass btrfs_inode directly into btrfs_writepage_endio_finish_ordered(), and make @page parameter optional. By this, end_compressed_bio_write() can happily pass page=NULL while still getting everything done properly. Also, to cooperate with such modification, replace @page parameter for trace_btrfs_writepage_end_io_hook() with btrfs_inode. Although this removes page_index info, the existing start/len should be enough for most usage. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21regulator: add property parsing and callbacks to set protection limitsMatti Vaittinen
Add DT property parsing code and setting callback for regulator over/under voltage, over-current and temperature error limits. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Link: https://lore.kernel.org/r/e7b8007ba9eae7076178bf3363fb942ccb1cc9a5.1622628334.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-21regulator: IRQ based event/error notification helpersMatti Vaittinen
Provide helper function for IC's implementing regulator notifications when an IRQ fires. The helper also works for IRQs which can not be acked. Helper can be set to disable the IRQ at handler and then re-enabling it on delayed work later. The helper also adds regulator_get_error_flags() errors in cache for the duration of IRQ disabling. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/ebdf86d8c22b924667ec2385330e30fcbfac0119.1622628334.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>