summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-06-21regulator: move rdev_print helpers to internal.hMatti Vaittinen
The rdev print helpers are a nice way to print messages related to a specific regulator device. Move them from core.c to internal.h As the rdev print helpers use rdev_get_name() export it from core.c. Also move the declaration from coupler.h to driver.h because the rdev name is not just a coupled regulator property. I guess the main audience for rdev_get_name() will be the regulator core and drivers. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Link: https://lore.kernel.org/r/dc7fd70dc31de4d0e820b7646bb78eeb04f80735.1622628333.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-21regulator: add warning flagsMatti Vaittinen
Add 'warning' level events and error flags to regulator core. Current regulator core notifications are used to inform consumers about errors where HW is misbehaving in such way it is assumed to be broken/unrecoverable. There are PMICs which are designed for system(s) that may have use for regulator indications sent before HW is damaged so that some board/consumer specific recovery-event can be performed while continuing most of the normal operations. Add new WARNING level events and notifications to be used for that purpose. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Link: https://lore.kernel.org/r/9b54aa5589ae4b5945d53d114bac3fae55fa4818.1622628333.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-21reboot: Add hardware protection power-offMatti Vaittinen
There can be few cases when we need to shut-down the system in order to protect the hardware. Currently this is done at least by the thermal core when temperature raises over certain limit. Some PMICs can also generate interrupts for example for over-current or over-voltage, voltage drops, short-circuit, ... etc. On some systems these are a sign of hardware failure and only thing to do is try to protect the rest of the hardware by shutting down the system. Add shut-down logic which can be used by all subsystems instead of implementing the shutdown in each subsystem. The logic is stolen from thermal_core with difference of using atomic_t instead of a mutex in order to allow calls directly from IRQ context and changing the WARN() to pr_emerg() as discussed here: https://lore.kernel.org/lkml/YJuPwAZroVZ%2Fw633@alley/ and here: https://lore.kernel.org/linux-iommu/20210331093104.383705-4-geert+renesas@glider.be/ Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/e83ec1ca9408f90c857ea9dcdc57b14d9037b03f.1622628333.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-21printk: Remove trailing semicolon in macrosHuilong Deng
Macros should not use a trailing semicolon. Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2021-06-21xfrm: replay: remove last replay indirectionFlorian Westphal
This replaces the overflow indirection with the new xfrm_replay_overflow helper. After this, the 'repl' pointer in xfrm_state is no longer needed and can be removed as well. xfrm_replay_overflow() is added in two incarnations, one is used when the kernel is compiled with xfrm hardware offload support enabled, the other when its disabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid replay indirectionFlorian Westphal
Add and use xfrm_replay_check helper instead of indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove recheck indirectionFlorian Westphal
Adds new xfrm_replay_recheck() helper and calls it from xfrm input path instead of the indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove advance indirectionFlorian Westphal
Similar to other patches: add a new helper to avoid an indirection. v2: fix 'net/xfrm/xfrm_replay.c:519:13: warning: 'seq' may be used uninitialized in this function' warning. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid xfrm replay notify indirectionFlorian Westphal
replay protection is implemented using a callback structure and then called via x->repl->notify(), x->repl->recheck(), and so on. all the differect functions are always built-in, so this could be direct calls instead. This first patch prepares for removal of the x->repl structure. Add an enum with the three available replay modes to the xfrm_state structure and then replace all x->repl->notify() calls by the new xfrm_replay_notify() helper. The helper checks the enum internally to adapt behaviour as needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-20cifs: decoding negTokenInit with generic ASN1 decoderHyunchul Lee
Decode negTokenInit with lib/asn1_decoder. For that, add OIDs in linux/oid_registry.h and a negTokenInit ASN1 file, "spnego_negtokeninit.asn1". And define decoder's callback functions, which are the gssapi_this_mech for checking SPENGO oid and the neg_token_init_mech_type for getting authentication mechanisms supported by a server. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18Merge tag 'net-5.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.13-rc7, including fixes from wireless, bpf, bluetooth, netfilter and can. Current release - regressions: - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class() to fix modifying offloaded qdiscs - lantiq: net: fix duplicated skb in rx descriptor ring - rtnetlink: fix regression in bridge VLAN configuration, empty info is not an error, bot-generated "fix" was not needed - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem creation Current release - new code bugs: - ethtool: fix NULL pointer dereference during module EEPROM dump via the new netlink API - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose queue should not be visible to the stack - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs - mlx5e: verify dev is present in get devlink port ndo, avoid a panic Previous releases - regressions: - neighbour: allow NUD_NOARP entries to be force GCed - further fixes for fallout from reorg of WiFi locking (staging: rtl8723bs, mac80211, cfg80211) - skbuff: fix incorrect msg_zerocopy copy notifications - mac80211: fix NULL ptr deref for injected rate info - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs Previous releases - always broken: - bpf: more speculative execution fixes - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local - udp: fix race between close() and udp_abort() resulting in a panic - fix out of bounds when parsing TCP options before packets are validated (in netfilter: synproxy, tc: sch_cake and mptcp) - mptcp: improve operation under memory pressure, add missing wake-ups - mptcp: fix double-lock/soft lookup in subflow_error_report() - bridge: fix races (null pointer deref and UAF) in vlan tunnel egress - ena: fix DMA mapping function issues in XDP - rds: fix memory leak in rds_recvmsg Misc: - vrf: allow larger MTUs - icmp: don't send out ICMP messages with a source address of 0.0.0.0 - cdc_ncm: switch to eth%d interface naming" * tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits) net: ethernet: fix potential use-after-free in ec_bhf_remove selftests/net: Add icmp.sh for testing ICMP dummy address responses icmp: don't send out ICMP messages with a source address of 0.0.0.0 net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY net: ll_temac: Fix TX BD buffer overwrite net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Make sure to free skb when it is completely used MAINTAINERS: add Guvenc as SMC maintainer bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path bnxt_en: Fix TQM fastpath ring backing store computation bnxt_en: Rediscover PHY capabilities after firmware reset cxgb4: fix wrong shift. mac80211: handle various extensible elements correctly mac80211: reset profile_periodicity/ema_ap cfg80211: avoid double free of PMSR request cfg80211: make certificate generation more robust mac80211: minstrel_ht: fix sample time check net: qed: Fix memcpy() overflow of qed_dcbx_params() net: cdc_eem: fix tx fixup skb leak net: hamradio: fix memory leak in mkiss_close ...
2021-06-18net: wwan: Allow WWAN drivers to provide blocking tx and poll functionStephan Gerhold
At the moment, the WWAN core provides wwan_port_txon/off() to implement blocking writes. The tx() port operation should not block, instead wwan_port_txon/off() should be called when the TX queue is full or has free space again. However, in some cases it is not straightforward to make use of that functionality. For example, the RPMSG API used by rpmsg_wwan_ctrl.c does not provide any way to be notified when the TX queue has space again. Instead, it only provides the following operations: - rpmsg_send(): blocking write (wait until there is space) - rpmsg_trysend(): non-blocking write (return error if no space) - rpmsg_poll(): set poll flags depending on TX queue state Generally that's totally sufficient for implementing a char device, but it does not fit well to the currently provided WWAN port ops. Most of the time, using the non-blocking rpmsg_trysend() in the WWAN tx() port operation works just fine. However, with high-frequent writes to the char device it is possible to trigger a situation where this causes issues. For example, consider the following (somewhat unrealistic) example: # dd if=/dev/zero bs=1000 of=/dev/wwan0qmi0 dd: error writing '/dev/wwan0qmi0': Resource temporarily unavailable 1+0 records out This fails immediately after writing the first record. It's likely only a matter of time until this triggers issues for some real application (e.g. ModemManager sending a lot of large QMI packets). The rpmsg_char device does not have this problem, because it uses rpmsg_trysend() and rpmsg_poll() to support non-blocking operations. Make it possible to use the same in the RPMSG WWAN driver by adding two new optional wwan_port_ops: - tx_blocking(): send data blocking if allowed - tx_poll(): set additional TX poll flags This integrates nicely with the RPMSG API and does not require any change in existing WWAN drivers. With these changes, the dd example above blocks instead of exiting with an error. Cc: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18rpmsg: core: Add driver_data for rpmsg_device_idStephan Gerhold
Most device_id structs provide a driver_data field that can be used by drivers to associate data more easily for a particular device ID. Add the same for the rpmsg_device_id. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Revert "net: add pf_family_names[] for protocol family"David S. Miller
This reverts commit 1f3c98eaddec857e16a7a1c6cd83317b3dc89438. Does not build... Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: add pf_family_names[] for protocol familyYejune Deng
Modify the pr_info content from int to char *, this looks more readable. Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18icmp: don't send out ICMP messages with a source address of 0.0.0.0Toke Høiland-Jørgensen
When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance. Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails. Below is a quick example reproducing this issue with network namespaces: ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek <jch@irif.fr> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18mptcp: dump csum fields in mptcp_dump_mpextGeliang Tang
In mptcp_dump_mpext, dump the csum fields, csum and csum_reqd in struct mptcp_dump_mpext too. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18mptcp: receive checksum for MP_CAPABLE with dataGeliang Tang
This patch added a new member named csum in struct mptcp_options_received. When parsing the MP_CAPABLE with data, if the checksum is enabled, adjust the expected_opsize. If the receiving option length matches the length with the data checksum, get the checksum value and save it in mp_opt->csum. And in mptcp_incoming_options, pass it to mpext->csum. We always parse any csum/nocsum combination and delay the presence check to later code, to allow reset if missing. Additionally, in the TX path, use the newly introduce ext field to avoid MPTCP csum recomputation on TCP retransmission and unneeded csum update on when setting the data fin_flag. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18mptcp: add csum_reqd in mptcp_out_optionsGeliang Tang
This patch added a new member csum_reqd in struct mptcp_out_options and struct mptcp_subflow_request_sock. Initialized it with the helper function mptcp_is_checksum_enabled(). In mptcp_write_options, if this field is enabled, send out the MP_CAPABLE suboption with the MPTCP_CAP_CHECKSUM_REQD flag. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18mptcp: generate the data checksumGeliang Tang
This patch added a new member named csum in struct mptcp_ext, implemented a new function named mptcp_generate_data_checksum(). Generate the data checksum in mptcp_sendmsg_frag, save it in mpext->csum. Note that we must generate the csum for zero window probe, too. Do the csum update incrementally, to avoid multiple csum computation when the data is appended to existing skb. Note that in a later patch we will skip unneeded csum related operation. Changes not included here to keep the delta small. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18mptcp: add csum_enabled in mptcp_sockGeliang Tang
This patch added a new member named csum_enabled in struct mptcp_sock, used a dummy mptcp_is_checksum_enabled() helper to initialize it. Also added a new member named mptcpi_csum_enabled in struct mptcp_info to expose the csum_enabled flag. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18seg6: add support for SRv6 End.DT46 BehaviorAndrea Mayer
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and End.DT46 Behaviors. The current SRv6 code in the Linux kernel only implements End.DT4 and End.DT6 which can be used respectively to support IPv4-in-IPv6 and IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic. The proposed End.DT46 implementation is meant to support the decapsulation of IPv4 and IPv6 traffic coming from a single SRv6 tunnel. The implementation of the SRv6 End.DT46 Behavior in the Linux kernel greatly simplifies the setup and operations of SRv6 VPNs. The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6} Behaviors implemented so far, because it makes use of a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT46 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e.: $ sysctl -wq net.vrf.strict_mode=1 Note that the same approach is used for the SRv6 End.DT4 Behavior and for the End.DT6 Behavior in VRF mode. The command used to instantiate an SRv6 End.DT46 Behavior is straightforward, i.e.: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100. [1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking ======================================================================= This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on the performance of SRv6 End.DT4 and End.DT6 Behaviors. In order to verify this, we tested the performance of the newly introduced SRv6 End.DT46 Behavior and compared it with the performance of SRv6 End.DT{4,6} Behaviors, considering both the patched kernel and the kernel before applying the End.DT46 patch (referred to as vanilla kernel). In details, the following decapsulation scenarios were considered: 1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel; 1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel; 2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel; 2.b) SRv6 End.DT4 Behavior on patched kernel; 3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the End.DT46 patch); 3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch). All tests were performed on a testbed deployed on the CloudLab [2] facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4 GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size ~ 100 bytes. Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps; Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps; Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps; Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps; Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps; Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps; Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we observe that the performance degradation incurred in using End.DT46 rather than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal, around 0.9% and 1.5%. Such very minimal performance degradation is the price to be paid if one prefers to use a single tunnel capable of handling both types of traffic (IPv4 and IPv6). Comparing the results for End.DT4 and End.DT6 under the patched and the vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the End.DT46 patch has no impact on the performance of End.DT4 and End.DT6. [2] https://www.cloudlab.us Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Merge tag 'pm-5.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Remove recently added frequency invariance support from the CPPC cpufreq driver, because it has turned out to be problematic and it cannot be fixed properly on time for 5.13 (Viresh Kumar)" * tag 'pm-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "cpufreq: CPPC: Add support for frequency invariance"
2021-06-18xsk: Fix missing validation for skb and unaligned modeMagnus Karlsson
Fix a missing validation of a Tx descriptor when executing in skb mode and the umem is in unaligned mode. A descriptor could point to a buffer straddling the end of the umem, thus effectively tricking the kernel to read outside the allowed umem region. This could lead to a kernel crash if that part of memory is not mapped. In zero-copy mode, the descriptor validation code rejects such descriptors by checking a bit in the DMA address that tells us if the next page is physically contiguous or not. For the last page in the umem, this bit is not set, therefore any descriptor pointing to a packet straddling this last page boundary will be rejected. However, the skb path does not use this bit since it copies out data and can do so to two different pages. (It also does not have the array of DMA address, so it cannot even store this bit.) The code just returned that the packet is always physically contiguous. But this is unfortunately also returned for the last page in the umem, which means that packets that cross the end of the umem are being allowed, which they should not be. Fix this by introducing a check for this in the SKB path only, not penalizing the zero-copy path. Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
2021-06-18blk-mq: fix an IS_ERR() vs NULL bugDan Carpenter
The __blk_mq_alloc_disk() function doesn't return NULLs it returns error pointers. Fixes: b461dfc49eb6 ("blk-mq: add the blk_mq_alloc_disk APIs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/YMyjci35WBqrtqG+@mwanda Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18Merge tag 'devm-helpers-v5.14-1' into review-hansHans de Goede
Signed tag for the immutable devm-helpers branch for merging into the extcon and pdx86 trees.
2021-06-18netfilter: conntrack: pass hook state to log functionsFlorian Westphal
The packet logger backend is unable to provide the incoming (or outgoing) interface name because that information isn't available. Pass the hook state, it contains the network namespace, the protocol family, the network interfaces and other things. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-18sched: Change task_struct::statePeter Zijlstra
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
2021-06-18sched: Add get_current_state()Peter Zijlstra
Remove yet another few p->state accesses. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.347475156@infradead.org
2021-06-18sched: Introduce task_is_running()Peter Zijlstra
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18Merge branch 'sched/urgent' into sched/core, to resolve conflictsIngo Molnar
This commit in sched/urgent moved the cfs_rq_is_decayed() function: a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") and this fresh commit in sched/core modified it in the old location: 9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are") Merge the two variants. Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-17driver core: add a helper to setup both the of_node and fwnode of a deviceIoana Ciornei
There are many places where both the fwnode_handle and the of_node of a device need to be populated. Add a function which does both so that we have consistency. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2021-06-17 The following pull-request contains BPF updates for your *net-next* tree. We've added 50 non-merge commits during the last 25 day(s) which contain a total of 148 files changed, 4779 insertions(+), 1248 deletions(-). The main changes are: 1) BPF infrastructure to migrate TCP child sockets from a listener to another in the same reuseport group/map, from Kuniyuki Iwashima. 2) Add a provably sound, faster and more precise algorithm for tnum_mul() as noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan. 3) Streamline error reporting changes in libbpf as planned out in the 'libbpf: the road to v1.0' effort, from Andrii Nakryiko. 4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu. 5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek. 6) Support new LLVM relocations in libbpf to make them more linker friendly, also add a doc to describe the BPF backend relocations, from Yonghong Song. 7) Silence long standing KUBSAN complaints on register-based shifts in interpreter, from Daniel Borkmann and Eric Biggers. 8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when target arch cannot be determined, from Lorenz Bauer. 9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson. 10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi. 11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF programs to bpf_helpers.h header, from Florent Revest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17net: fix mistake path for netdev_features_stringsJian Shen
Th_strings arrays netdev_features_strings, tunable_strings, and phy_tunable_strings has been moved to file net/ethtool/common.c. So fixes the comment. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercallAshish Kalra
This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2Maxim Levitsky
This is a new version of KVM_GET_SREGS / KVM_SET_SREGS. It has the following changes: * Has flags for future extensions * Has vcpu's PDPTRs, allowing to save/restore them on migration. * Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS) New capability, KVM_CAP_SREGS2 is added to signal the userspace of this ioctl. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUIDVitaly Kuznetsov
Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows for limiting Hyper-V features to those exposed to the guest in Hyper-V CPUIDs (0x40000003, 0x40000004, ...). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17asm-generic/hyperv: add HV_STATUS_ACCESS_DENIED definitionVitaly Kuznetsov
From TLFSv6.0b, this status means: "The caller did not possess sufficient access rights to perform the requested operation." Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17kvm: add PM-notifierSergey Senozhatsky
Add KVM PM-notifier so that architectures can have arch-specific VM suspend/resume routines. Such architectures need to select CONFIG_HAVE_KVM_PM_NOTIFIER and implement kvm_arch_pm_notifier(). Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210606021045.14159-1-senozhatsky@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17math64.h: Add mul_s64_u64_shr()Ilias Stamatis
This function is needed for KVM's nested virtualization. The nested TSC scaling implementation requires multiplying the signed TSC offset with the unsigned TSC multiplier. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-2-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: mmu: Add slots_arch_lock for memslot arch fieldsBen Gardon
Add a new lock to protect the arch-specific fields of memslots if they need to be modified in a kvm->srcu read critical section. A future commit will use this lock to lazily allocate memslot rmaps for x86. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-5-bgardon@google.com> [Add Documentation/ hunk. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17Merge tag 'fixes_for_v5.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and fanotify fixes from Jan Kara: "A fixup finishing disabling of quotactl_path() syscall (I've missed archs using different way to declare syscalls) and a fix of an fd leak in error handling path of fanotify" * tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: finish disable quotactl_path syscall fanotify: fix copy_event_to_user() fid error clean up
2021-06-17ACPI: scan: Make acpi_walk_dep_device_list()Rafael J. Wysocki
Because acpi_walk_dep_device_list() is only called by the code in the file in which it is defined, make it static, drop the export of it and drop its header from acpi.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2021-06-17ACPI: scan: Define acpi_bus_put_acpi_device() as static inlineRafael J. Wysocki
Since acpi_bus_put_acpi_device() is a synonym for acpi_dev_put(), define it as static inline in analogy with the latter. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2021-06-17nvme.h: add missing nvme_lba_range_type endianness annotationsWesley Sheng
Signed-off-by: Wesley Sheng <wesley.sheng@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add ZBD over ZNS backend supportChaitanya Kulkarni
NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add Command Set Identifier supportChaitanya Kulkarni
NVMe TP 4056 allows controllers to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controllers need to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now support the default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belong to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17block: export blk_next_bio()Chaitanya Kulkarni
The block layer provides emulation of zone management operations targeting all zones of a zoned block device only for the zone reset operation (REQ_OP_ZONE_RESET). In order to correctly implement exporting of zoned block devices with NVMeOF, emulating zone management operations targeting all zones of a device is also necessary for the open, close and finish zone operations (REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH). Instead of duplicating the code, export the existing helper from block layer so we can use a bio chaining pattern that is present in the block layer for REQ_OP_ZONE RESET all emulation in the NVMeOF zoned block device backend. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17sched/cpufreq: Consider reduced CPU capacity in energy calculationLukasz Luba
Energy Aware Scheduling (EAS) needs to predict the decisions made by SchedUtil. The map_util_freq() exists to do that. There are corner cases where the max allowed frequency might be reduced (due to thermal). SchedUtil as a CPUFreq governor, is aware of that but EAS is not. This patch aims to address it. SchedUtil stores the maximum allowed frequency in 'sugov_policy::next_freq' field. EAS has to predict that value, which is the real used frequency. That value is made after a call to cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits. In the existing code EAS is not able to predict that real frequency. This leads to energy estimation errors. To avoid wrong energy estimation in EAS (due to frequency miss prediction) make sure that the step which calculates Performance Domain frequency, is also aware of the allowed CPU capacity. Furthermore, modify map_util_freq() to not extend the frequency value. Instead, use map_util_perf() to extend the util value in both places: SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity. In the end, we achieve the same desirable behavior for both subsystems and alignment in regards to the real CPU frequency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part) Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com