summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-03-03mm: remove pointless includes from <linux/hmm.h>Christoph Hellwig
hmm.h pulls in the world for no good reason at all. Remove the includes and push a few ones into the users instead. Link: https://lkml.kernel.org/r/20220210072828.2930359-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03mm: remove the __KERNEL__ guard from <linux/mm.h>Christoph Hellwig
__KERNEL__ ifdefs don't make sense outside of include/uapi/. Link: https://lkml.kernel.org/r/20220210072828.2930359-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()Eric Dumazet
While investigating on why a synchronize_net() has been added recently in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report() might drop skbs in some cases. Discussion about removing synchronize_net() from ipv6_mc_down() will happen in a different thread. Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03Merge branches 'v5.18/vfio/next/mlx5-migration-v10', ↵Alex Williamson
'v5.18/vfio/next/pm-fixes' and 'v5.18/vfio/next/uml-build-fix' into v5.18/vfio/next/next
2022-03-03bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time()Martin KaFai Lau
* __sk_buff->delivery_time_type: This patch adds __sk_buff->delivery_time_type. It tells if the delivery_time is stored in __sk_buff->tstamp or not. It will be most useful for ingress to tell if the __sk_buff->tstamp has the (rcv) timestamp or delivery_time. If delivery_time_type is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp. Two non-zero types are defined for the delivery_time_type, BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC. For UNSPEC, it can only happen in egress because only mono delivery_time can be forwarded to ingress now. The clock of UNSPEC delivery_time can be deduced from the skb->sk->sk_clockid which is how the sch_etf doing it also. * Provide forwarded delivery_time to tc-bpf@ingress: With the help of the new delivery_time_type, the tc-bpf has a way to tell if the __sk_buff->tstamp has the (rcv) timestamp or the delivery_time. During bpf load time, the verifier will learn if the bpf prog has accessed the new __sk_buff->delivery_time_type. If it does, it means the tc-bpf@ingress is expecting the skb->tstamp could have the delivery_time. The kernel will then read the skb->tstamp as-is during bpf insn rewrite without checking the skb->mono_delivery_time. This is done by adding a new prog->delivery_time_access bit. The same goes for writing skb->tstamp. * bpf_skb_set_delivery_time(): The bpf_skb_set_delivery_time() helper is added to allow setting both delivery_time and the delivery_time_type at the same time. If the tc-bpf does not need to change the delivery_time_type, it can directly write to the __sk_buff->tstamp as the existing tc-bpf has already been doing. It will be most useful at ingress to change the __sk_buff->tstamp from the (rcv) timestamp to a mono delivery_time and then bpf_redirect_*(). bpf only has mono clock helper (bpf_ktime_get_ns), and the current known use case is the mono EDT for fq, and only mono delivery time can be kept during forward now, so bpf_skb_set_delivery_time() only supports setting BPF_SKB_DELIVERY_TIME_MONO. It can be extended later when use cases come up and the forwarding path also supports other clock bases. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingressMartin KaFai Lau
The current tc-bpf@ingress reads and writes the __sk_buff->tstamp as a (rcv) timestamp which currently could either be 0 (not available) or ktime_get_real(). This patch is to backward compatible with the (rcv) timestamp expectation at ingress. If the skb->tstamp has the delivery_time, the bpf insn rewrite will read 0 for tc-bpf running at ingress as it is not available. When writing at ingress, it will also clear the skb->mono_delivery_time bit. /* BPF_READ: a = __sk_buff->tstamp */ if (!skb->tc_at_ingress || !skb->mono_delivery_time) a = skb->tstamp; else a = 0 /* BPF_WRITE: __sk_buff->tstamp = a */ if (skb->tc_at_ingress) skb->mono_delivery_time = 0; skb->tstamp = a; [ A note on the BPF_CGROUP_INET_INGRESS which can also access skb->tstamp. At that point, the skb is delivered locally and skb_clear_delivery_time() has already been done, so the skb->tstamp will only have the (rcv) timestamp. ] If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time has to be cleared also. It could be done together during convert_ctx_access(). However, the latter patch will also expose the skb->mono_delivery_time bit as __sk_buff->delivery_time_type. Changing the delivery_time_type in the background may surprise the user, e.g. the 2nd read on __sk_buff->delivery_time_type may need a READ_ONCE() to avoid compiler optimization. Thus, in expecting the needs in the latter patch, this patch does a check on !skb->tstamp after running the tc-bpf and clears the skb->mono_delivery_time bit if needed. The earlier discussion on v4 [0]. The bpf insn rewrite requires the skb's mono_delivery_time bit and tc_at_ingress bit. They are moved up in sk_buff so that bpf rewrite can be done at a fixed offset. tc_skip_classify is moved together with tc_at_ingress. To get one bit for mono_delivery_time, csum_not_inet is moved down and this bit is currently used by sctp. [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM optionMartin KaFai Lau
IOAM is a hop-by-hop option with a temporary iana allocation (49). Since it is hop-by-hop, it is done before the input routing decision. One of the traced data field is the (rcv) timestamp. When the locally generated skb is looping from egress to ingress over a virtual interface (e.g. veth, loopback...), skb->tstamp may have the delivery time before it is known that it will be delivered locally and received by another sk. Like handling the network tapping (tcpdump) in the earlier patch, this patch gets the timestamp if needed without over-writing the delivery_time in the skb->tstamp. skb_tstamp_cond() is added to do the ktime_get_real() with an extra cond arg to check on top of the netstamp_needed_key static key. skb_tstamp_cond() will also be used in a latter patch and it needs the netstamp_needed_key check. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: ip: Handle delivery_time in ip defragMartin KaFai Lau
A latter patch will postpone the delivery_time clearing until the stack knows the skb is being delivered locally. That will allow other kernel forwarding path (e.g. ip[6]_forward) to keep the delivery_time also. An earlier attempt was to do skb_clear_delivery_time() in ip_local_deliver() and ip6_input(). The discussion [0] requested to move it one step later into ip_local_deliver_finish() and ip6_input_finish() so that the delivery_time can be kept for the ip_vs forwarding path also. To do that, this patch also needs to take care of the (rcv) timestamp usecase in ip_is_fragment(). It needs to expect delivery_time in the skb->tstamp, so it needs to save the mono_delivery_time bit in inet_frag_queue such that the delivery_time (if any) can be restored in the final defragmented skb. [Note that it will only happen when the locally generated skb is looping from egress to ingress over a virtual interface (e.g. veth, loopback...), skb->tstamp may have the delivery time before it is known that it will be delivered locally and received by another sk.] [0]: https://lore.kernel.org/netdev/ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Set skb->mono_delivery_time and clear it after sch_handle_ingress()Martin KaFai Lau
The previous patches handled the delivery_time before sch_handle_ingress(). This patch can now set the skb->mono_delivery_time to flag the skb->tstamp is used as the mono delivery_time (EDT) instead of the (rcv) timestamp and also clear it with skb_clear_delivery_time() after sch_handle_ingress(). This will make the bpf_redirect_*() to keep the mono delivery_time and used by a qdisc (fq) of the egress-ing interface. A latter patch will postpone the skb_clear_delivery_time() until the stack learns that the skb is being delivered locally and that will make other kernel forwarding paths (ip[6]_forward) able to keep the delivery_time also. Thus, like the previous patches on using the skb->mono_delivery_time bit, calling skb_clear_delivery_time() is not limited within the CONFIG_NET_INGRESS to avoid too many code churns among this set. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Clear mono_delivery_time bit in __skb_tstamp_tx()Martin KaFai Lau
In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to the sk_error_queue. The outgoing skb may have the mono delivery_time while the (rcv) timestamp is expected for the clone, so the skb->mono_delivery_time bit needs to be cleared from the clone. This patch adds the skb->mono_delivery_time clearing to the existing __net_timestamp() and use it in __skb_tstamp_tx(). The __net_timestamp() fast path usage in dev.c is changed to directly call ktime_get_real() since the mono_delivery_time bit is not set at that point. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Handle delivery_time in skb->tstamp during network tapping with af_packetMartin KaFai Lau
A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp is used as the mono delivery_time (EDT) instead of the (rcv) timestamp. skb_clear_tstamp() will then keep this delivery_time during forwarding. This patch is to make the network tapping (with af_packet) to handle the delivery_time stored in skb->tstamp. Regardless of tapping at the ingress or egress, the tapped skb is received by the af_packet socket, so it is ingress to the af_packet socket and it expects the (rcv) timestamp. When tapping at egress, dev_queue_xmit_nit() is used. It has already expected skb->tstamp may have delivery_time, so it does skb_clone()+net_timestamp_set() to ensure the cloned skb has the (rcv) timestamp before passing to the af_packet sk. This patch only adds to clear the skb->mono_delivery_time bit in net_timestamp_set(). When tapping at ingress, it currently expects the skb->tstamp is either 0 or the (rcv) timestamp. Meaning, the tapping at ingress path has already expected the skb->tstamp could be 0 and it will get the (rcv) timestamp by ktime_get_real() when needed. There are two cases for tapping at ingress: One case is af_packet queues the skb to its sk_receive_queue. The skb is either not shared or new clone created. The newly added skb_clear_delivery_time() is called to clear the delivery_time (if any) and set the (rcv) timestamp if needed before the skb is queued to the sk_receive_queue. Another case, the ingress skb is directly copied to the rx_ring and tpacket_get_timestamp() is used to get the (rcv) timestamp. The newly added skb_tstamp() is used in tpacket_get_timestamp() to check the skb->mono_delivery_time bit before returning skb->tstamp. As mentioned earlier, the tapping@ingress has already expected the skb may not have the (rcv) timestamp (because no sk has asked for it) and has handled this case by directly calling ktime_get_real(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Add skb_clear_tstamp() to keep the mono delivery_timeMartin KaFai Lau
Right now, skb->tstamp is reset to 0 whenever the skb is forwarded. If skb->tstamp has the mono delivery_time, clearing it can hurt the performance when it finally transmits out to fq@phy-dev. The earlier patch added a skb->mono_delivery_time bit to flag the skb->tstamp carrying the mono delivery_time. This patch adds skb_clear_tstamp() helper which keeps the mono delivery_time and clears everything else. The delivery_time clearing will be postponed until the stack knows the skb will be delivered locally. It will be done in a latter patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Add skb->mono_delivery_time to distinguish mono delivery_time from ↵Martin KaFai Lau
(rcv) timestamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. For example, when forwarding from egress to ingress and then finally back to egress: tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns ^ ^ reset rest This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp is storing the mono delivery_time (EDT) instead of the (rcv) timestamp. The current use case is to keep the TCP mono delivery_time (EDT) and to be used with sch_fq. A latter patch will also allow tc-bpf@ingress to read and change the mono delivery_time. In the future, another bit (e.g. skb->user_delivery_time) can be added for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid. [ This patch is a prep work. The following patches will get the other parts of the stack ready first. Then another patch after that will finally set the skb->mono_delivery_time. ] skb_set_delivery_time() function is added. It is used by the tcp_output.c and during ip[6] fragmentation to assign the delivery_time to the skb->tstamp and also set the skb->mono_delivery_time. A note on the change in ip_send_unicast_reply() in ip_output.c. It is only used by TCP to send reset/ack out of a ctl_sk. Like the new skb_set_delivery_time(), this patch sets the skb->mono_delivery_time to 0 for now as a place holder. It will be enabled in a latter patch. A similar case in tcp_ipv6 can be done with skb_set_delivery_time() in tcp_v6_send_response(). [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: felix: migrate host FDB and MDB entries when changing tag protoVladimir Oltean
The "ocelot" and "ocelot-8021q" tagging protocols make use of different hardware resources, and host FDB entries have different destination ports in the switch analyzer module, practically speaking. So when the user requests a tagging protocol change, the driver must migrate all host FDB and MDB entries from the NPI port (in fact CPU port module) towards the same physical port, but this time used as a regular port. It is pointless for the felix driver to keep a copy of the host addresses, when we can create and export DSA helpers for walking through the addresses that it already needs to keep on the CPU port, for refcounting purposes. felix_classify_db() is moved up to avoid a forward declaration. We pass "bool change" because dp->fdbs and dp->mdbs are uninitialized lists when felix_setup() first calls felix_set_tag_protocol(), so we need to avoid calling dsa_port_walk_fdbs() during probe time. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03serial: make uart_console_write->putchar()'s character an unsigned charJiri Slaby
Currently, uart_console_write->putchar's second parameter (the character) is of type int. It makes little sense, provided uart_console_write() accepts the input string as "const char *s" and passes its content -- the characters -- to putchar(). So switch the character's type to unsigned char. We don't use char as that is signed on some platforms. That would cause troubles for drivers which (implicitly) cast the char to u16 when writing to the device. Sign extension would happen in that case and the value written would be completely different to the provided char. DZ is an example of such a driver -- on MIPS, it uses u16 for dz_out in dz_console_putchar(). Note we do the char -> uchar conversion implicitly in uart_console_write(). Provided we do not change size of the data type, sign extension does not happen there, so the problem is void. This makes the types consistent and unified with the rest of the uart layer, which uses unsigned char in most places already. One exception is xmit_buf, but that is going to be converted later. Cc: Paul Cercueil <paul@crapouillou.net> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: Russell King <linux@armlinux.org.uk> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Ludovic Desroches <ludovic.desroches@microchip.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Alexander Shiyan <shc_work@mail.ru> Cc: Baruch Siach <baruch@tkos.co.il> Cc: "Maciej W. Rozycki" <macro@orcam.me.uk> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Karol Gugala <kgugala@antmicro.com> Cc: Mateusz Holenko <mholenko@antmicro.com> Cc: Vladimir Zapolskiy <vz@mleia.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Taichi Sugaya <sugaya.taichi@socionext.com> Cc: Takao Orito <orito.takao@socionext.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: "Andreas Färber" <afaerber@suse.de> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andy Gross <agross@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Cc: Orson Zhai <orsonzhai@gmail.com> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Patrice Chotard <patrice.chotard@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Korsgaard <peter@korsgaard.com> Cc: Michal Simek <michal.simek@xilinx.com> Acked-by: Richard Genoud <richard.genoud@gmail.com> [atmel_serial] Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Neil Armstrong <narmstrong@baylibre.com> # meson_serial Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220303080831.21783-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-03vfio/pci: Expose vfio_pci_core_aer_err_detected()Yishai Hadas
Expose vfio_pci_core_aer_err_detected() to be used by drivers as part of their pci_error_handlers structure. Next patch for mlx5 driver will use it. Link: https://lore.kernel.org/all/20220224142024.147653-15-yishaih@nvidia.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Remove migration protocol v1 documentationJason Gunthorpe
v1 was never implemented and is replaced by v2. The old uAPI documentation is removed from the header file. The old uAPI definitions are still kept in the header file to ease transition for userspace copying these headers. They will be fully removed down the road. Link: https://lore.kernel.org/all/20220224142024.147653-12-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Extend the device migration protocol with RUNNING_P2PJason Gunthorpe
The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Define device migration protocol v2Jason Gunthorpe
Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctlJason Gunthorpe
Invoke a new device op 'device_feature' to handle just the data array portion of the command. This lifts the ioctl validation to the core code and makes it simpler for either the core code, or layered drivers, to implement their own feature values. Provide vfio_check_feature() to consolidate checking the flags/etc against what the driver supports. Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATSPetr Machata
The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, which should be carried by the RTM_SETSTATS message, and expresses a desire to toggle L3 offload xstats on or off. As part of the above, add an exported function rtnl_offload_xstats_notify() that drivers can use when they have installed or deinstalled the counters backing the HW stats. At this point, it is possible to enable, disable and query L3 offload xstats on netdevices. (However there is no driver actually implementing these.) Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: rtnetlink: Add RTM_SETSTATSPetr Machata
The offloaded HW stats are designed to allow per-netdevice enablement and disablement. These stats are only accessible through RTM_GETSTATS, and therefore should be toggled by a RTM_SETSTATS message. Add it, and the necessary skeleton handler. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: rtnetlink: Add UAPI for obtaining L3 offload xstatsPetr Machata
Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute, IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes place in a HW router. The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Additionally, as a netdevice is configured, it may become or cease being suitable for binding of a HW counter. Both of these aspects need to be communicated to the userspace. To that end, add another child attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO: - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO - attr nest IFLA_OFFLOAD_XSTATS_L3_STATS - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST - {0,1} as u8 - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED - {0,1} as u8 Thus this one attribute is a nest that can be used to carry information about various types of HW statistics, and indexing is very simply done by wrapping the information for a given statistics suite into the attribute that carries the suite is the RTM_GETSTATS query. At the same time, because _HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is possible through filtering to request only the metadata about individual statistics suites, without having to hit the HW to get the actual counters. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dev: Add hardware stats supportPetr Machata
Offloading switch device drivers may be able to collect statistics of the traffic taking place in the HW datapath that pertains to a certain soft netdevice, such as VLAN. Add the necessary infrastructure to allow exposing these statistics to the offloaded netdevice in question. The API was shaped by the following considerations: - Collection of HW statistics is not free: there may be a finite number of counters, and the act of counting may have a performance impact. It is therefore necessary to allow toggling whether HW counting should be done for any particular SW netdevice. - As the drivers are loaded and removed, a particular device may get offloaded and unoffloaded again. At the same time, the statistics values need to stay monotonic (modulo the eventual 64-bit wraparound), increasing only to reflect traffic measured in the device. To that end, the netdevice keeps around a lazily-allocated copy of struct rtnl_link_stats64. Device drivers then contribute to the values kept therein at various points. Even as the driver goes away, the struct stays around to maintain the statistics values. - Different HW devices may be able to count different things. The motivation behind this patch in particular is exposure of HW counters on Nvidia Spectrum switches, where the only practical approach to counting traffic on offloaded soft netdevices currently is to use router interface counters, and count L3 traffic. Correspondingly that is the statistics suite added in this patch. Other devices may be able to measure different kinds of traffic, and for that reason, the APIs are built to allow uniform access to different statistics suites. - Because soft netdevices and offloading drivers are only loosely bound, a netdevice uses a notifier chain to communicate with the drivers. Several new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages to the offloading drivers. - Devices can have various conditions for when a particular counter is available. As the device is configured and reconfigured, the device offload may become or cease being suitable for counter binding. A netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to ping offloading drivers and determine whether anyone currently implements a given statistics suite. This information can then be propagated to user space. When the driver decides to unoffload a netdevice, it can use a newly-added function, netdev_offload_xstats_report_delta(), to record outstanding collected statistics, before destroying the HW counter. This patch adds a helper, call_netdevice_notifiers_info_robust(), for dispatching a notifier with the possibility of unwind when one of the consumers bails. Given the wish to eventually get rid of the global notifier block altogether, this helper only invokes the per-netns notifier block. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: rtnetlink: RTM_GETSTATS: Allow filtering inside nestsPetr Machata
The filter_mask field of RTM_GETSTATS header determines which top-level attributes should be included in the netlink response. This saves processing time by only including the bits that the user cares about instead of always dumping everything. This is doubly important for HW-backed statistics that would typically require a trip to the device to fetch the stats. So far there was only one HW-backed stat suite per attribute. However, IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in the following patches. It would therefore be advantageous to be able to filter within that nest, and select just one or the other HW-backed statistics suite. Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload. The scheme is as follows: - RTM_GETSTATS - struct if_stats_msg - attr nest IFLA_STATS_GET_FILTERS - attr IFLA_STATS_LINK_OFFLOAD_XSTATS - u32 filter_mask This scheme reuses the existing enumerators by nesting them in a dedicated context attribute. This is covered by policies as usual, therefore a gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS nest has filtering enabled, because for the SW counters the issue does not seem to be that important. rtnl_offload_xstats_get_size() and _fill() are extended to observe the requested filters. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03page_pool: Add function to batch and return statsJoe Damato
Adds a function page_pool_get_stats which can be used by drivers to obtain stats for a specified page_pool. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03page_pool: Add recycle statsJoe Damato
Add per-cpu stats tracking page pool recycling events: - cached: recycling placed page in the page pool cache - cache_full: page pool cache was full - ring: page placed into the ptr ring - ring_full: page released from page pool because the ptr ring was full - released_refcnt: page released (and not recycled) because refcnt > 1 Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03page_pool: Add allocation statsJoe Damato
Add per-pool statistics counters for the allocation path of a page pool. These stats are incremented in softirq context, so no locking or per-cpu variables are needed. This code is disabled by default and a kernel config option is provided for users who wish to enable them. The statistics added are: - fast: successful fast path allocations - slow: slow path order-0 allocations - slow_high_order: slow path high order allocations - empty: ptr ring is empty, so a slow path allocation was forced. - refill: an allocation which triggered a refill of the cache - waive: pages obtained from the ptr ring that cannot be added to the cache due to a NUMA mismatch. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-02tcp: Remove the unused apiTao Chen
Last tcp_write_queue_head() use was removed in commit 114f39feab36 ("tcp: restore autocorking"), so remove it. Signed-off-by: Tao Chen <chentao3@hotmail.com> Link: https://lore.kernel.org/r/SYZP282MB33317DEE1253B37C0F57231E86029@SYZP282MB3331.AUSP282.PROD.OUTLOOK.COM Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02flow_dissector: Add support for HSRKurt Kanzenbach
Network drivers such as igb or igc call eth_get_headlen() to determine the header length for their to be constructed skbs in receive path. When running HSR on top of these drivers, it results in triggering BUG_ON() in skb_pull(). The reason is the skb headlen is not sufficient for HSR to work correctly. skb_pull() notices that. For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which is not correct. The problem is, the flow dissection code does not take HSR into account. Therefore, add support for it. Reported-by: Anthony Harivel <anthony.harivel@linutronix.de> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03PM: EM: add macro to set .active_power() callback conditionallyLukasz Luba
The Energy Model is able to use new power values coming from DT. Add a new macro which is helpful in setting the .active_power() callback conditionally in setup time. The dual-macro implementation handles both kernel configurations: w/ EM and w/o EM built-in. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-03-03OPP: Add "opp-microwatt" supporting codeLukasz Luba
Add new property to the OPP: power value. The OPP entry in the DT can have "opp-microwatt". Add the needed code to handle this new property in the existing infrastructure. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-03-02drm/amdkfd: Add SMI add event helperPhilip Yang
To remove duplicate code, unify event message format and simplify new event add in the following patches. Use KFD_SMI_EVENT_MSG_SIZE to define msg size, the same size will be used in user space to alloc the msg receive buffer. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-03firmware: xilinx: Add ZynqMP SHA API for SHA3 functionalityHarsha
This patch adds zynqmp_pm_sha_hash API in the ZynqMP firmware to compute SHA3 hash of given data. Signed-off-by: Harsha <harsha.harsha@xilinx.com> Signed-off-by: Kalyani Akula <kalyani.akula@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: crypto_xor - use helpers for unaligned accessesArd Biesheuvel
Dereferencing a misaligned pointer is undefined behavior in C, and may result in codegen on architectures such as ARM that trigger alignments traps and expensive fixups in software. Instead, use the get_aligned()/put_aligned() accessors, which are cheap or even completely free when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y. In the converse case, the prior alignment checks ensure that the casts are safe, and so no unaligned accessors are necessary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: api - allow algs only in specific constructions in FIPS modeNicolai Stange
Currently we do not distinguish between algorithms that fail on the self-test vs. those which are disabled in FIPS mode (not allowed). Both are marked as having failed the self-test. Recently the need arose to allow the usage of certain algorithms only as arguments to specific template instantiations in FIPS mode. For example, standalone "dh" must be blocked, but e.g. "ffdhe2048(dh)" is allowed. Other potential use cases include "cbcmac(aes)", which must only be used with ccm(), or "ghash", which must be used only for gcm(). This patch allows this scenario by adding a new flag FIPS_INTERNAL to indicate those algorithms that are not FIPS-allowed. They can then be used as template arguments only, i.e. when looked up via crypto_grab_spawn() to be more specific. The FIPS_INTERNAL bit gets propagated upwards recursively into the surrounding template instances, until the construction eventually matches an explicit testmgr entry with ->fips_allowed being set, if any. The behaviour to skip !->fips_allowed self-test executions in FIPS mode will be retained. Note that this effectively means that FIPS_INTERNAL algorithms are handled very similarly to the INTERNAL ones in this regard. It is expected that the FIPS_INTERNAL algorithms will receive sufficient testing when the larger constructions they're a part of, if any, get exercised by testmgr. Note that as a side-effect of this patch algorithms which are not FIPS-allowed will now return ENOENT instead of ELIBBAD. Hopefully this is not an issue as some people were relying on this already. Link: https://lore.kernel.org/r/YeEVSaMEVJb3cQkq@gondor.apana.org.au Originally-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: dh - split out deserialization code from crypto_dh_decode()Nicolai Stange
A subsequent commit will introduce "dh" wrapping templates of the form "ffdhe2048(dh)", "ffdhe3072(dh)" and so on in order to provide built-in support for the well-known safe-prime ffdhe group parameters specified in RFC 7919. Those templates' ->set_secret() will wrap the inner "dh" implementation's ->set_secret() and set the ->p and ->g group parameters as appropriate on the way inwards. More specifically, - A ffdheXYZ(dh) user would call crypto_dh_encode() on a struct dh instance having ->p == ->g == NULL as well as ->p_size == ->g_size == 0 and pass the resulting buffer to the outer ->set_secret(). - This outer ->set_secret() would then decode the struct dh via crypto_dh_decode_key(), set ->p, ->g, ->p_size as well as ->g_size as appropriate for the group in question and encode the struct dh again before passing it further down to the inner "dh"'s ->set_secret(). The problem is that crypto_dh_decode_key() implements some basic checks which would reject parameter sets with ->p_size == 0 and thus, the ffdheXYZ templates' ->set_secret() cannot use it as-is for decoding the passed buffer. As the inner "dh"'s ->set_secret() will eventually conduct said checks on the final parameter set anyway, the outer ->set_secret() really only needs the decoding functionality. Split out the pure struct dh decoding part from crypto_dh_decode_key() into the new __crypto_dh_decode_key(). __crypto_dh_decode_key() gets defined in crypto/dh_helper.c, but will have to get called from crypto/dh.c and thus, its declaration must be somehow made available to the latter. Strictly speaking, __crypto_dh_decode_key() is internal to the dh_generic module, yet it would be a bit over the top to introduce a new header like e.g. include/crypto/internal/dh.h containing just a single prototype. Add the __crypto_dh_decode_key() declaration to include/crypto/dh.h instead. Provide a proper kernel-doc annotation, even though __crypto_dh_decode_key() is purposedly not on the function list specified in Documentation/crypto/api-kpp.rst. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: dh - constify struct dh's pointer membersNicolai Stange
struct dh contains several pointer members corresponding to DH parameters: ->key, ->p and ->g. A subsequent commit will introduce "dh" wrapping templates of the form "ffdhe2048(dh)", "ffdhe3072(dh)" and so on in order to provide built-in support for the well-known safe-prime ffdhe group parameters specified in RFC 7919. These templates will need to set the group parameter related members of the (serialized) struct dh instance passed to the inner "dh" kpp_alg instance, i.e. ->p and ->g, to some constant, static storage arrays. Turn the struct dh pointer members' types into "pointer to const" in preparation for this. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: dh - remove struct dh's ->q memberNicolai Stange
The only current user of the DH KPP algorithm, the keyctl(KEYCTL_DH_COMPUTE) syscall, doesn't set the domain parameter ->q in struct dh. Remove it and any associated (de)serialization code in crypto_dh_encode_key() and crypto_dh_decode_key. Adjust the encoded ->secret values in testmgr's DH test vectors accordingly. Note that the dh-generic implementation would have initialized its struct dh_ctx's ->q from the decoded struct dh's ->q, if present. If this struct dh_ctx's ->q would ever have been non-NULL, it would have enabled a full key validation as specified in NIST SP800-56A in dh_is_pubkey_valid(). However, as outlined above, ->q is always NULL in practice and the full key validation code is effectively dead. A later patch will make dh_is_pubkey_valid() to calculate Q from P on the fly, if possible, so don't remove struct dh_ctx's ->q now, but leave it there until that has happened. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: kpp - provide support for KPP spawnsNicolai Stange
The upcoming support for the RFC 7919 ffdhe group parameters will be made available in the form of templates like "ffdhe2048(dh)", "ffdhe3072(dh)" and so on. Template instantiations thereof would wrap the inner "dh" kpp_alg and also provide kpp_alg services to the outside again. The primitves needed for providing kpp_alg services from template instances have been introduced with the previous patch. Continue this work now and implement everything needed for enabling template instances to make use of inner KPP algorithms like "dh". More specifically, define a struct crypto_kpp_spawn in close analogy to crypto_skcipher_spawn, crypto_shash_spawn and alike. Implement a crypto_grab_kpp() and crypto_drop_kpp() pair for binding such a spawn to some inner kpp_alg and for releasing it respectively. Template implementations can instantiate transforms from the underlying kpp_alg by means of the new crypto_spawn_kpp(). Finally, provide the crypto_spawn_kpp_alg() helper for accessing a spawn's underlying kpp_alg during template instantiation. Annotate everything with proper kernel-doc comments, even though include/crypto/internal/kpp.h is not considered for the generated docs. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-03crypto: kpp - provide support for KPP template instancesNicolai Stange
The upcoming support for the RFC 7919 ffdhe group parameters will be made available in the form of templates like "ffdhe2048(dh)", "ffdhe3072(dh)" and so on. Template instantiations thereof would wrap the inner "dh" kpp_alg and also provide kpp_alg services to the outside again. Furthermore, it might be perhaps be desirable to provide KDF templates in the future, which would similarly wrap an inner kpp_alg and present themselves to the outside as another kpp_alg, transforming the shared secret on its way out. Introduce the bits needed for supporting KPP template instances. Everything related to inner kpp_alg spawns potentially being held by such template instances will be deferred to a subsequent patch in order to facilitate review. Define struct struct kpp_instance in close analogy to the already existing skcipher_instance, shash_instance and alike, but wrapping a struct kpp_alg. Implement the new kpp_register_instance() template instance registration primitive. Provide some helper functions for - going back and forth between a generic struct crypto_instance and the new struct kpp_instance, - obtaining the instantiating kpp_instance from a crypto_kpp transform and - for accessing a given kpp_instance's implementation specific context data. Annotate everything with proper kernel-doc comments, even though include/crypto/internal/kpp.h is not considered for the generated docs. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-02ACPI: bus: Introduce acpi_bus_for_each_dev()Rafael J. Wysocki
In order to avoid exposing acpi_bus_type to modules, introduce an acpi_bus_for_each_dev() helper for iterating over all ACPI device objects and make typec_link_ports() use it instead of the raw bus_for_each_dev() along with acpi_bus_type. Having done that, drop the acpi_bus_type export. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
2022-03-02NFS: Optimise away the previous cookie fieldTrond Myklebust
Replace the 'previous cookie' field in struct nfs_entry with the array->last_cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Fix up forced readdirplusTrond Myklebust
Avoid clearing the entire readdir page cache if we're just doing forced readdirplus for the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Convert readdir page cache to use a cookie based indexTrond Myklebust
Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Improve heuristic for readdirplusTrond Myklebust
The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Adjust the amount of readahead performed by NFS readdirTrond Myklebust
The current NFS readdir code will always try to maximise the amount of readahead it performs on the assumption that we can cache anything that isn't immediately read by the process. There are several cases where this assumption breaks down, including when the 'ls -l' heuristic kicks in to try to force use of readdirplus as a batch replacement for lookup/getattr. This patch therefore tries to tone down the amount of readahead we perform, and adjust it to try to match the amount of data being requested by user space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Don't re-read the entire page cache to find the next cookieTrond Myklebust
If the page cache entry that was last read gets invalidated for some reason, then make sure we can re-create it on the next call to readdir. This, combined with the cache page validation, allows us to reuse the cached value of page-index on successive calls to nfs_readdir. Credit is due to Benjamin Coddington for showing that the concept works, and that it allows for improved cache sharing between processes even in the case where pages are lost due to LRU or active invalidation. Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02ASoC: soc-acpi: remove sof_fw_filenamePierre-Louis Bossart
We've been using a default firmware name for each PCI/ACPI/OF platform for a while. The machine-specific sof_fw_filename is in practice not different from the default, and newer devices don't set this field, so let's remove the redundant definitions. When OEMs modify the base firmware, they can keep the same firmware name but store the file in a separate directory. Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20220301194903.60859-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-01scsi: core: sd: Add silence_suspend flag to suppress some PM messagesAdrian Hunter
Kernel messages produced during runtime PM can cause a never-ending cycle because user space utilities (e.g. journald or rsyslog) write the messages back to storage, causing runtime resume, more messages, and so on. Messages that tell of things that are expected to happen are arguably unnecessary, so add a flag to suppress them. This flag is used by the UFS driver. Link: https://lore.kernel.org/r/20220228113652.970857-2-adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>