summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-30drop_monitor: Prepare probe functions for devlink tracepointIdo Schimmel
Drop monitor supports two alerting modes: Summary and packet. Prepare a probe function for each, so that they could be later registered on the devlink tracepoint by calling register_trace_devlink_trap_report(), based on the configured alerting mode. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30devlink: Add a tracepoint for trap reportsIdo Schimmel
Add a tracepoint for trap reports so that drop monitor could register its probe on it. Use trace_devlink_trap_report_enabled() to avoid wasting cycles setting the trap metadata if the tracepoint is not enabled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge tag 'linux-can-next-for-5.10-20200930' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-09-30 this is a pull request of 13 patches for net-next. The first 10 target the mcp25xxfd driver (which is renamed to mcp251xfd during this series). The first two patches are by Thomas Kopp, which adds reference to the just related errata and updates the documentation and log messages. Dan Carpenter's patch fixes a resource leak during ifdown. A patch by me adds the missing initialization of a variable. Oleksij Rempel updates the DT binding documentation as requested by Rob Herring. The next 5 patches are by Thomas Kopp and me. During review Geert Uytterhoeven suggested to use "microchip,mcp251xfd" instead of "microchip,mcp25xxfd" as the DT autodetection compatible to avoid clashes with future but incompatible devices. We decided not only to rename the compatible but the whole driver from "mcp25xxfd" to "mcp251xfd". This is done in several patches. Joakim Zhang contributes three patches for the flexcan driver. The first one adds support for the ECC feature, which is implemented on some modern IP cores, by initializing the controller's memory during ifup. The next patch adds support for the i.MX8MP (which supports ECC) and the last patch properly disables the runtime PM if device registration fails. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge branch 'ionic-watchdog-training'David S. Miller
Shannon Nelson says: ==================== ionic watchdog training Our link watchdog displayed a couple of unfriendly behaviors in some recent stress testing. These patches change the startup and stop timing in order to be sure that expected structures are ready to be used by the watchdog. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30ionic: prevent early watchdog checkShannon Nelson
In one corner case scenario, the driver device lif setup can get delayed such that the ionic_watchdog_cb() timer goes off before the ionic->lif is set, thus causing a NULL pointer panic. We catch the problem by checking for a NULL lif just a little earlier in the callback. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30ionic: stop watchdog timer earlier on removeShannon Nelson
We need to be better at making sure we don't have a link check watchdog go off while we're shutting things down, so let's stop the timer as soon as we start the remove. Meanwhile, since that was the only thing in ionic_dev_teardown(), simplify and remove that function. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver'David S. Miller
Geetha sowjanya says: ==================== Fix bugs in Octeontx2 netdev driver In existing Octeontx2 network drivers code has issues like stale entries in broadcast replication list, missing L3TYPE for IPv6 frames, running tx queues on error and race condition in mbox reset. This patch set fixes the above issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30octeontx2-pf: Fix synchnorization issue in mboxHariprasad Kelam
Mbox implementation in octeontx2 driver has three states alloc, send and reset in mbox response. VF allocate and sends message to PF for processing, PF ACKs them back and reset the mbox memory. In some case we see synchronization issue where after msgs_acked is incremented and before mbox_reset API is called, if current execution is scheduled out and a different thread is scheduled in which checks for msgs_acked. Since the new thread sees msgs_acked == msgs_sent it will try to allocate a new message and to send a new mbox message to PF.Now if mbox_reset is scheduled in, PF will see '0' in msgs_send. This patch fixes the issue by calling mbox_reset before incrementing msgs_acked flag for last processing message and checks for valid message size. Fixes: d424b6c02 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30octeontx2-pf: Fix the device state on errorHariprasad Kelam
Currently in otx2_open on failure of nix_lf_start transmit queues are not stopped which are already started in link_event. Since the tx queues are not stopped network stack still try's to send the packets leading to driver crash while access the device resources. Fixes: 50fe6c02e ("octeontx2-pf: Register and handle link notifications") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30octeontx2-pf: Fix TCP/UDP checksum offload for IPv6 framesGeetha sowjanya
For TCP/UDP checksum offload feature in Octeontx2 expects L3TYPE to be set irrespective of IP header checksum is being offloaded or not. Currently for IPv6 frames L3TYPE is not being set resulting in packet drop with checksum error. This patch fixes this issue. Fixes: 3ca6c4c88 ("octeontx2-pf: Add packet transmission support") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30octeontx2-af: Fix enable/disable of default NPC entriesSubbaraya Sundeep
Packet replication feature present in Octeontx2 is a hardware linked list of PF and its VF interfaces so that broadcast packets are sent to all interfaces present in the list. It is driver job to add and delete a PF/VF interface to/from the list when the interface is brought up and down. This patch fixes the npc_enadis_default_entries function to handle broadcast replication properly if packet replication feature is present. Fixes: 40df309e4166 ("octeontx2-af: Support to enable/disable default MCAM entries") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge branch '100GbE' of https://github.com/anguy11/net-queueDavid S. Miller
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-09-30 This series contains updates to ice driver only. Jake increases the wait time for firmware response as it can take longer than the current wait time. Preserves the NVM capabilities of the device in safe mode so the device reports its NVM update capabilities properly when in this state. v2: Added cover letter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30MAINTAINERS: Add Pali Rohár as aardvark PCI maintainerPali Rohár
Link: https://lore.kernel.org/r/20200925092115.16546-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Petazzzoni <thomas.petazzoni@bootlin.com>
2020-09-30arm64: permit ACPI core to map kernel memory used for table overridesArd Biesheuvel
Jonathan reports that the strict policy for memory mapped by the ACPI core breaks the use case of passing ACPI table overrides via initramfs. This is due to the fact that the memory type used for loading the initramfs in memory is not recognized as a memory type that is typically used by firmware to pass firmware tables. Since the purpose of the strict policy is to ensure that no AML or other ACPI code can manipulate any memory that is used by the kernel to keep its internal state or the state of user tasks, we can relax the permission check, and allow mappings of memory that is reserved and marked as NOMAP via memblock, and therefore not covered by the linear mapping to begin with. Fixes: 1583052d111f ("arm64/acpi: disallow AML memory opregions to access kernel memory") Fixes: 325f5585ec36 ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-30Merge branch 'tcp-exponential-backoff-in-tcp_send_ack'David S. Miller
Eric Dumazet says: ==================== tcp: exponential backoff in tcp_send_ack() We had outages caused by repeated skb allocation failures in tcp_send_ack() It is time to add exponential backoff to reduce number of attempts. Before doing so, first patch removes icsk_ack.blocked to make room for a new field (icsk_ack.retry) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30tcp: add exponential backoff in __tcp_send_ack()Eric Dumazet
Whenever host is under very high memory pressure, __tcp_send_ack() skb allocation fails, and we setup a 200 ms (TCP_DELACK_MAX) timer before retrying. On hosts with high number of TCP sockets, we can spend considerable amount of cpu cycles in these attempts, add high pressure on various spinlocks in mm-layer, ultimately blocking threads attempting to free space from making any progress. This patch adds standard exponential backoff to avoid adding fuel to the fire. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30inet: remove icsk_ack.blockedEric Dumazet
TCP has been using it to work around the possibility of tcp_delack_timer() finding the socket owned by user. After commit 6f458dfb4092 ("tcp: improve latencies of timer triggered events") we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery, so we can get rid of icsk_ack.blocked This frees space that following patch will reuse. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Another batch of clk driver fixes: - Make sure DRAM and ChipID region doesn't get disabled on Exynos - Fix a SATA failure on Tegra - Fix the emac_ptp clk divider on stratix10" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED clk: tegra: Fix missing prototype for tegra210_clk_register_emc() clk: tegra: Always program PLL_E when enabled clk: tegra: Capitalization fixes clk: samsung: Keep top BPLL mux on Exynos542x enabled
2020-09-30net: macb: move pdata to private headerAlexandre Belloni
struct macb_platform_data is only used by macb_pci to register the platform device, move its definition to cadence/macb.h and remove platform_data/macb.h Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30Merge branch 'mlxsw-PFC-and-headroom-selftests'David S. Miller
Petr Machata says: ==================== mlxsw: PFC and headroom selftests Recent changes in the headroom management code made it clear that an automated way of testing this functionality is needed. This patchset brings two tests: a synthetic headroom behavior test, which verifies mechanics of headroom management. And a PFC test, which verifies whether this behavior actually translates into a working lossless configuration. Both of these tests rely on mlnx_qos[1], a tool that interfaces with Linux DCB API. The tool was originally written to work with Mellanox NICs, but does not actually rely on anything Mellanox-specific, and can be used for mlxsw as well as for any other NIC-like driver. Unlike Open LLDP it does support buffer commands and permits a fire-and-forget approach to configuration, which makes it very handy for writing of selftests. Patches #1-#3 extend the selftest devlink_lib.sh in various ways. Patch #4 then adds a helper wrapper for mlnx_qos to mlxsw's qos_lib.sh. Patch #5 adds a test for management of port headroom. Patch #6 adds a PFC test. [1] https://github.com/Mellanox/mlnx-tools/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: mlxsw: Add a PFC testPetr Machata
Add a test for PFC. Runs 10MB of traffic through a bottleneck and checks that none of it gets lost. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: mlxsw: Add headroom handling testPetr Machata
Add a test for headroom configuration. This covers projection of ETS configuration to ingress, PFC, adjustments for MTU, the qdisc / TC mode and the effect of egress SPAN session on buffer configuration. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qosPetr Machata
mlnx_qos is a script for configuration of DCB. Despite the name it is not actually Mellanox-specific in any way. It is currently the only ad-hoc tool available (in contrast to a daemon that manages an interface on an ongoing basis). However, it is very verbose and parsing out error messages is not really possible. Add a wrapper that makes it easier to use the tool. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: forwarding: devlink_lib: Support port-less topologiesPetr Machata
Some selftests may not need any actual ports. Technically those are not forwarding selftests, but devlink_lib can still be handy. Fall back on NETIF_NO_CABLE in those cases. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: forwarding: devlink_lib: Add devlink_cell_size_get()Petr Machata
Add a helper that answers the cell size of the devlink device. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30selftests: forwarding: devlink_lib: Split devlink_..._set() into save & setPetr Machata
Changing pool type from static to dynamic causes reinterpretation of threshold values. They therefore need to be saved before pool type is changed, then the pool type can be changed, and then the new values need to be set up. For that reason, set cannot subsume save, because it would be saving the wrong thing, with possibly a nonsensical value, and restore would then fail to restore the nonsensical value. Thus extract a _save() from each of the relevant _set()'s. This way it is possible to save everything up front, then to tweak it, and then restore in the required order. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30can: flexcan: disable runtime PM if register flexcandev failedJoakim Zhang
Disable runtime PM if register flexcandev failed, and balance reference of usage_count. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20200929211557.14153-4-qiangqing.zhang@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: flexcan: add flexcan driver for i.MX8MPJoakim Zhang
Add flexcan driver for i.MX8MP, which supports CAN FD and ECC. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20200929211557.14153-3-qiangqing.zhang@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: flexcan: initialize all flexcan memory for ECC functionJoakim Zhang
One issue was reported at a baremetal environment, which is used for FPGA verification. "The first transfer will fail for extended ID format(for both 2.0B and FD format), following frames can be transmitted and received successfully for extended format, and standard format don't have this issue. This issue occurred randomly with high possiblity, when it occurs, the transmitter will detect a BIT1 error, the receiver a CRC error. According to the spec, a non-correctable error may cause this transfer failure." With FLEXCAN_QUIRK_DISABLE_MECR quirk, it supports correctable errors, disable non-correctable errors interrupt and freeze mode. Platform has ECC hardware support, but select this quirk, this issue may not come to light. Initialize all FlexCAN memory before accessing them, at least it can avoid non-correctable errors detected due to memory uninitialized. The internal region can't be initialized when the hardware doesn't support ECC. According to IMX8MPRM, Rev.C, 04/2020. There is a NOTE at the section 11.8.3.13 Detection and correction of memory errors: "All FlexCAN memory must be initialized before starting its operation in order to have the parity bits in memory properly updated. CTRL2[WRMFRZ] grants write access to all memory positions that require initialization, ranging from 0x080 to 0xADF and from 0xF28 to 0xFFF when the CAN FD feature is enabled. The RXMGMASK, RX14MASK, RX15MASK, and RXFGMASK registers need to be initialized as well. MCR[RFEN] must not be set during memory initialization." Memory range from 0x080 to 0xADF, there are reserved memory (unimplemented by hardware, e.g. only configure 64 MBs), these memory can be initialized or not. In this patch, initialize all flexcan memory which includes reserved memory. In this patch, create FLEXCAN_QUIRK_SUPPORT_ECC for platforms which has ECC feature. If you have a ECC platform in your hand, please select this qurik to initialize all flexcan memory firstly, then you can select FLEXCAN_QUIRK_DISABLE_MECR to only enable correctable errors. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20200929211557.14153-2-qiangqing.zhang@nxp.com [mkl: wrap long lines] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp251xfd: rename all remaining occurrence to mcp251xfdMarc Kleine-Budde
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver, which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming, but incompatible chips. In the previous patch the autodetect DT compatbile has been renamed to "microchip,mcp251xfd", this patch changes all non user facing occurrence of "mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD". [1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Link: https://lore.kernel.org/r/20200930091424.792165-10-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp251xfd: rename all user facing strings to mcp251xfdMarc Kleine-Budde
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver, which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming, but incompatible chips. In the previous patch the autodetect DT compatbile has been renamed to "microchip,mcp251xfd", this patch changes all user facing strings from "mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD", including: - kconfig symbols - name of kernel module - DT and SPI compatible [1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Link: https://lore.kernel.org/r/20200930091424.792165-9-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30can: mcp251xfd: rename driver files and subdir to mcp251xfdMarc Kleine-Budde
In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver, which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming, but incompatible chips. In the previous patch the autodetect DT compatbile has been renamed to "microchip,mcp251xfd", this patch changes the name of the driver subdir and the individual files accordinly. [1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com Link: https://lore.kernel.org/r/20200930091424.792165-8-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-30selftests/bpf: Test "incremental" btf_dump in C formatAndrii Nakryiko
Add test validating that btf_dump works fine with BTFs that are modified and incrementally generated. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929232843.1249318-5-andriin@fb.com
2020-09-30libbpf: Make btf_dump work with modifiable BTFAndrii Nakryiko
Ensure that btf_dump can accommodate new BTF types being appended to BTF instance after struct btf_dump was created. This came up during attemp to use btf_dump for raw type dumping in selftests, but given changes are not excessive, it's good to not have any gotchas in API usage, so I decided to support such use case in general. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
2020-09-30Merge branch 'Various BPF helper improvements'Alexei Starovoitov
Daniel Borkmann says: ==================== This series adds two BPF helpers, that is, one for retrieving the classid of an skb and another one to redirect via the neigh subsystem, and improves also the cookie helpers by removing the atomic counter. I've also added the bpf_tail_call_static() helper to the libbpf API that we've been using in Cilium for a while now, and last but not least the series adds a few selftests. For details, please check individual patches, thanks! v3 -> v4: - Removed out_rec error path (Martin) - Integrate BPF_F_NEIGH flag into rejecting invalid flags (Martin) - I think this way it's better to avoid bit overlaps given it's right in the place that would need to be extended on new flags v2 -> v3: - Removed double skb->dev = dev assignment (David) - Added headroom check for v6 path (David) - Set set flowi4_proto for ip_route_output_flow (David) - Rebased onto latest bpf-next v1 -> v2: - Rework cookie generator to support nested contexts (Eric) - Use ip_neigh_gw6() and container_of() (David) - Rename __throw_build_bug() and improve comments (Andrii) - Use bpf_tail_call_static() also in BPF samples (Maciej) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30bpf, selftests: Add redirect_neigh selftestDaniel Borkmann
Add a small test that exercises the new redirect_neigh() helper for the IPv4 and IPv6 case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/0fc7d9c5f9a6cc1c65b0d3be83b44b1ec9889f43.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, selftests: Use bpf_tail_call_static where appropriateDaniel Borkmann
For those locations where we use an immediate tail call map index use the newly added bpf_tail_call_static() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, libbpf: Add bpf_tail_call_static helper for bpf programsDaniel Borkmann
Port of tail_call_static() helper function from Cilium's BPF code base [0] to libbpf, so others can easily consume it as well. We've been using this in production code for some time now. The main idea is that we guarantee that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the JITed BPF insns with direct jumps instead of having to fall back to using expensive retpolines. By using inline asm, we guarantee that the compiler won't merge the call from different paths with potentially different content of r2/r3. We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable()) in different places as a neat trick to trigger compilation errors when compiler does not remove code at compilation time. This works for the BPF back end as it does not implement the __builtin_trap(). [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add redirect_neigh helper as redirect drop-inDaniel Borkmann
Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add classid helper only based on skb->skDaniel Borkmann
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-30RISC-V: Check clint_time_val before useAnup Patel
The NoMMU kernel is broken for QEMU virt machine from Linux-5.9-rc6 because clint_time_val is used even before CLINT driver is probed at following places: 1. rand_initialize() calls get_cycles() which in-turn uses clint_time_val 2. boot_init_stack_canary() calls get_cycles() which in-turn uses clint_time_val The issue#1 (above) is fixed by providing custom random_get_entropy() for RISC-V NoMMU kernel. For issue#2 (above), we remove dependency of boot_init_stack_canary() on get_cycles() and this is aligned with the boot_init_stack_canary() implementations of ARM, ARM64 and MIPS kernel. Fixes: d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems") Signed-off-by: Anup Patel <anup.patel@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-09-30btrfs: fix filesystem corruption after a device replaceFilipe Manana
We use a device's allocation state tree to track ranges in a device used for allocated chunks, and we set ranges in this tree when allocating a new chunk. However after a device replace operation, we were not setting the allocated ranges in the new device's allocation state tree, so that tree is empty after a device replace. This means that a fitrim operation after a device replace will trim the device ranges that have allocated chunks and extents, as we trim every range for which there is not a range marked in the device's allocation state tree. It is also important during chunk allocation, since the device's allocation state is used to determine if a range is already allocated when allocating a new chunk. This is trivial to reproduce and the following script triggers the bug: $ cat reproducer.sh #!/bin/bash DEV1="/dev/sdg" DEV2="/dev/sdh" DEV3="/dev/sdi" wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null # Create a raid1 test fs on 2 devices. mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null mount $DEV1 /mnt/btrfs xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo echo "Starting to replace $DEV1 with $DEV3" btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs echo echo "Running fstrim" fstrim /mnt/btrfs echo echo "Unmounting filesystem" umount /mnt/btrfs echo "Mounting filesystem in degraded mode using $DEV3 only" wipefs -a $DEV1 $DEV2 &> /dev/null mount -o degraded $DEV3 /mnt/btrfs if [ $? -ne 0 ]; then dmesg | tail echo echo "Failed to mount in degraded mode" exit 1 fi echo echo "File foo data (expected all bytes = 0xab):" od -A d -t x1 /mnt/btrfs/foo umount /mnt/btrfs When running the reproducer: $ ./replace-test.sh wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec) Starting to replace /dev/sdg with /dev/sdi Running fstrim Unmounting filesystem Mounting filesystem in degraded mode using /dev/sdi only mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error. [19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started [19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished [19582.208293] BTRFS info (device sdi): allowing degraded mounts [19582.208298] BTRFS info (device sdi): disk space caching is enabled [19582.208301] BTRFS info (device sdi): has skinny extents [19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing [19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed [19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0 [19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5 [19582.231576] BTRFS error (device sdi): open_ctree failed Failed to mount in degraded mode So fix by setting all allocated ranges in the replace target device when the replace operation is finishing, when we are holding the chunk mutex and we can not race with new chunk allocations. A test case for fstests follows soon. Fixes: 1c11b63eff2a67 ("btrfs: replace pending/pinned chunks lists with io tree") CC: stable@vger.kernel.org # 5.2+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locksJosef Bacik
When closing and freeing the source device we could end up doing our final blkdev_put() on the bdev, which will grab the bd_mutex. As such we want to be holding as few locks as possible, so move this call outside of the dev_replace->lock_finishing_cancel_unmount lock. Since we're modifying the fs_devices we need to make sure we're holding the uuid_mutex here, so take that as well. There's a report from syzbot probably hitting one of the cases where the bd_mutex and device_list_mutex are taken in the wrong order, however it's not with device replace, like this patch fixes. As there's no reproducer available so far, we can't verify the fix. https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/ dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419 WARNING: possible circular locking dependency detected 5.9.0-rc5-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/6878 is trying to acquire lock: ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804 but task is already holding lock: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline] find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #3 (sb_internal#2){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_intwrite include/linux/fs.h:1690 [inline] start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline] find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __flush_work+0x60e/0xac0 kernel/workqueue.c:3041 wb_shutdown+0x180/0x220 mm/backing-dev.c:355 bdi_unregister+0x174/0x590 mm/backing-dev.c:872 del_gendisk+0x820/0xa10 block/genhd.c:933 loop_remove drivers/block/loop.c:2192 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lo_open+0x19/0xd0 drivers/block/loop.c:1893 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507 blkdev_get fs/block_dev.c:1639 [inline] blkdev_open+0x227/0x300 fs/block_dev.c:1753 do_dentry_open+0x4b9/0x11b0 fs/open.c:817 do_open fs/namei.c:3251 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3368 do_filp_open+0x17e/0x3c0 fs/namei.c:3395 do_sys_openat2+0x16d/0x420 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_sys_open fs/open.c:1192 [inline] __se_sys_open fs/open.c:1188 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1188 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(sb_internal#2); lock(&fs_devs->device_list_mutex); lock(&bdev->bd_mutex); *** DEADLOCK *** 3 locks held by syz-executor.0/6878: #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365 #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178 #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 stack backtrace: CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x460027 RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027 RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0 RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460 R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460 Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add syzbot reference ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30ARM: imx6q: Fixup RCU usage for cpuidleUlf Hansson
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to expose us for tracepoints. For imx6q cpuidle, this leads to an RCU splat according to below. [6.870684] [<c0db7690>] (_raw_spin_lock) from [<c011f6a4>] (imx6q_enter_wait+0x18/0x9c) [6.878846] [<c011f6a4>] (imx6q_enter_wait) from [<c09abfb0>] (cpuidle_enter_state+0x168/0x5e4) To fix the problem, let's assign the corresponding idlestate->flags the CPUIDLE_FLAG_RCU_IDLE bit, which enables us to call rcu_idle_enter|exit() at the proper point. Reported-by: Dong Aisheng <aisheng.dong@nxp.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30Documentation: PM: Fix a reStructuredText syntax errorYoann Congal
Fix a reStructuredText syntax error in the cpuidle PM admin-guide documentation: the ``...'' quotation marks are parsed as partial ''...'' reStructuredText markup and break the output formatting. This change them to "...". Signed-off-by: Yoann Congal <yoann.congal@smile.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30cpufreq: intel_pstate: Fix missing return statementZhang Rui
Fix missing return statement when writing "off" to intel_pstate status sysfs I/F. Fixes: 55671ea3257a ("cpufreq: intel_pstate: Free memory only when turning off") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-30bpf: fix raw_tp test run in preempt kernelSong Liu
In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers: [ 35.874974] BUG: using smp_processor_id() in preemptible [00000000] code: new_name/87 [ 35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1 [ 35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 35.916941] Call Trace: [ 35.919660] dump_stack+0x77/0x9b [ 35.923273] check_preemption_disabled+0xb4/0xc0 [ 35.928376] bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.933872] ? selinux_bpf+0xd/0x70 [ 35.937532] __do_sys_bpf+0x6bb/0x21e0 [ 35.941570] ? find_held_lock+0x2d/0x90 [ 35.945687] ? vfs_write+0x150/0x220 [ 35.949586] do_syscall_64+0x2d/0x40 [ 35.953443] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling migrate_disable() before smp_processor_id(). Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-30ice: preserve NVM capabilities in safe modeJacob Keller
If the driver initializes in safe mode, it will call ice_set_safe_mode_caps. This results in clearing the capabilities structures, in order to set them up for operating in safe mode, ensuring many features are disabled. This has a side effect of also clearing the capability bits that relate to NVM update. The result is that the device driver will not indicate support for unified update, even if the firmware is capable. Fix this by adding the relevant capability fields to the list of values we preserve. To simplify the code, use a common_cap structure instead of a handful of local variables. To reduce some duplication of the capability name, introduce a couple of macros used to restore the capabilities values from the cached copy. Fixes: de9b277ee032 ("ice: Add support for unified NVM update flow capability") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Brijesh Behera <brijeshx.behera@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-30ice: increase maximum wait time for flash write commandsJacob Keller
The ice driver needs to wait for a firmware response to each command to write a block of data to the scratch area used to update the device firmware. The driver currently waits for up to 1 second for this to be returned. It turns out that firmware might take longer than 1 second to return a completion in some cases. If this happens, the flash update will fail to complete. Fix this by increasing the maximum time that the driver will wait for both writing a block of data, and for activating the new NVM bank. The timeout for an erase command is already several minutes, as the firmware had to erase the entire bank which was already expected to take a minute or more in the worst case. In the case where firmware really won't respond, we will now take longer to fail. However, this ensures that if the firmware is simply slow to respond, the flash update can still complete. This new maximum timeout should not adversely increase the update time, as the implementation for wait_event_interruptible_timeout, and should wake very soon after we get a completion event. It is better for a flash update be slow but still succeed than to fail because we gave up too quickly. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Brijesh Behera <brijeshx.behera@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>