summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-25ipv4: annotate data races arount inet->min_ttlEric Dumazet
No report yet from KCSAN, yet worth documenting the races. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25ipv6: guard IPV6_MINHOPCOUNT with a static keyEric Dumazet
RFC 5082 IPV6_MINHOPCOUNT is rarely used on hosts. Add a static key to remove from TCP fast path useless code, and potential cache line miss to fetch tcp_inet6_sk(sk)->min_hopcount Note that once ip6_min_hopcount static key has been enabled, it stays enabled until next boot. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25ipv6: annotate data races around np->min_hopcountEric Dumazet
No report yet from KCSAN, yet worth documenting the races. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25net: annotate accesses to sk->sk_rx_queue_mappingEric Dumazet
sk->sk_rx_queue_mapping can be modified locklessly, add a couple of READ_ONCE()/WRITE_ONCE() to document this fact. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25net: avoid dirtying sk->sk_rx_queue_mappingEric Dumazet
sk_rx_queue_mapping is located in a cache line that should be kept read mostly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25net: avoid dirtying sk->sk_napi_idEric Dumazet
sk_napi_id is located in a cache line that can be kept read mostly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookieEric Dumazet
Increase cache locality by moving rx_dst_coookie next to sk->sk_rx_dst This removes one or two cache line misses in IPv6 early demux (TCP/UDP) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindexEric Dumazet
Increase cache locality by moving rx_dst_ifindex next to sk->sk_rx_dst This is part of an effort to reduce cache line misses in TCP fast path. This removes one cache line miss in early demux. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25ax88796c: fix fetching error stats from percpu containersAlexander Lobakin
rx_dropped, tx_dropped, rx_frame_errors and rx_crc_errors are being wrongly fetched from the target container rather than source percpu ones. No idea if that goes from the vendor driver or was brainoed during the refactoring, but fix it either way. Fixes: a97c69ba4f30e ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Łukasz Stelmach <l.stelmach@samsung.com> Link: https://lore.kernel.org/r/20211023121148.113466-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25Merge branch 'bpftool: Switch to libbpf's hashmap for referencing BPF objects'Andrii Nakryiko
Quentin Monnet says: ==================== When listing BPF objects, bpftool can print a number of properties about items holding references to these objects. For example, it can show pinned paths for BPF programs, maps, and links; or programs and maps using a given BTF object; or the names and PIDs of processes referencing BPF objects. To collect this information, bpftool uses hash maps (to be clear: the data structures, inside bpftool - we are not talking of BPF maps). It uses the implementation available from the kernel, and picks it up from tools/include/linux/hashtable.h. This patchset converts bpftool's hash maps to a distinct implementation instead, the one coming with libbpf. The main motivation for this change is that it should ease the path towards a potential out-of-tree mirror for bpftool, like the one libbpf already has. Although it's not perfect to depend on libbpf's internal components, bpftool is intimately tied with the library anyway, and this looks better than depending too much on (non-UAPI) kernel headers. The first two patches contain preparatory work on the Makefile and on the initialisation of the hash maps for collecting pinned paths for objects. Then the transition is split into several steps, one for each kind of properties for which the collection is backed by hash maps. v2: - Move hashmap cleanup for pinned paths for links from do_detach() to do_show(). - Handle errors on hashmap__append() (in three of the patches). - Rename bpftool_hash_fn() and bpftool_equal_fn() as hash_fn_for_key_id() and equal_fn_for_key_id(), respectively. - Add curly braces for hashmap__for_each_key_entry() { } in show_btf_plain() and show_btf_json(), where the flow was difficult to read. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-25bpftool: Switch to libbpf's hashmap for PIDs/names referencesQuentin Monnet
In order to show PIDs and names for processes holding references to BPF programs, maps, links, or BTF objects, bpftool creates hash maps to store all relevant information. This commit is part of a set that transitions from the kernel's hash map implementation to the one coming with libbpf. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This is the third and final step of the transition, in which we convert the hash maps used for storing the information about the processes holding references to BPF objects (programs, maps, links, BTF), and at last we drop the inclusion of tools/include/linux/hashtable.h. Note: Checkpatch complains about the use of __weak declarations, and the missing empty lines after the bunch of empty function declarations when compiling without the BPF skeletons (none of these were introduced in this patch). We want to keep things as they are, and the reports should be safe to ignore. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-6-quentin@isovalent.com
2021-10-25bpftool: Switch to libbpf's hashmap for programs/maps in BTF listingQuentin Monnet
In order to show BPF programs and maps using BTF objects when the latter are being listed, bpftool creates hash maps to store all relevant items. This commit is part of a set that transitions from the kernel's hash map implementation to the one coming with libbpf. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This commit focuses on the two hash maps used by bpftool when listing BTF objects to store references to programs and maps, and convert them to the libbpf's implementation. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-5-quentin@isovalent.com
2021-10-25bpftool: Switch to libbpf's hashmap for pinned paths of BPF objectsQuentin Monnet
In order to show pinned paths for BPF programs, maps, or links when listing them with the "-f" option, bpftool creates hash maps to store all relevant paths under the bpffs. So far, it would rely on the kernel implementation (from tools/include/linux/hashtable.h). We can make bpftool rely on libbpf's implementation instead. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This commit is the first step of the conversion: the hash maps for pinned paths for programs, maps, and links are converted to libbpf's hashmap.{c,h}. Other hash maps used for the PIDs of process holding references to BPF objects are left unchanged for now. On the build side, this requires adding a dependency to a second header internal to libbpf, and making it a dependency for the bootstrap bpftool version as well. The rest of the changes are a rather straightforward conversion. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-4-quentin@isovalent.com
2021-10-25bpftool: Do not expose and init hash maps for pinned path in main.cQuentin Monnet
BPF programs, maps, and links, can all be listed with their pinned paths by bpftool, when the "-f" option is provided. To do so, bpftool builds hash maps containing all pinned paths for each kind of objects. These three hash maps are always initialised in main.c, and exposed through main.h. There appear to be no particular reason to do so: we can just as well make them static to the files that need them (prog.c, map.c, and link.c respectively), and initialise them only when we want to show objects and the "-f" switch is provided. This may prevent unnecessary memory allocations if the implementation of the hash maps was to change in the future. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-3-quentin@isovalent.com
2021-10-25bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS)Quentin Monnet
The dependency is only useful to make sure that the $(LIBBPF_HDRS_DIR) directory is created before we try to install locally the required libbpf internal header. Let's create this directory properly instead. This is in preparation of making $(LIBBPF_INTERNAL_HDRS) a dependency to the bootstrap bpftool version, in which case we want no dependency on $(LIBBPF). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-2-quentin@isovalent.com
2021-10-25cxgb3: Remove seeprom_write and use VPD APIHeiner Kallweit
Using the VPD API allows to simplify the code and completely get rid of t3_seeprom_write(). Link: https://lore.kernel.org/r/a0291004-dda3-ea08-4d6c-a2f8826c8527@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25cxgb3: Use VPD API in t3_seeprom_wp()Heiner Kallweit
Use standard VPD API to replace t3_seeprom_write(), this prepares for removing this function. Chelsio T3 maps the EEPROM write protect flag to an arbitrary place in VPD address space, therefore we have to use pci_write_vpd_any(). Link: https://lore.kernel.org/r/f768fdbe-3a16-d539-57d2-c7c908294336@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25cxgb3: Remove t3_seeprom_read and use VPD APIHeiner Kallweit
Using the VPD API allows to simplify the code and completely get rid of t3_seeprom_read(). Note that we don't have to use pci_read_vpd_any() here because a VPD quirk sets dev->vpd.len to the full EEPROM size. Tested with a T320 card. Link: https://lore.kernel.org/r/68ef15bb-b6bf-40ad-160c-aaa72c4a70f8@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25PCI/VPD: Use pci_read_vpd_any() in pci_vpd_size()Heiner Kallweit
Use new function pci_read_vpd_any() to simplify the code. [bhelgaas: squash in fix for stack overflow reported & tested by Qian [1] and Kunihiko [2]: [1] https://lore.kernel.org/netdev/e89087c5-c495-c5ca-feb1-54cf3a8775c5@quicinc.com/ [2] https://lore.kernel.org/r/2f7e3770-ab47-42b5-719c-f7c661c07d28@socionext.com Link: https://lore.kernel.org/r/6211be8a-5d10-8f3a-6d33-af695dc35caf@gmail.com Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Reported-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Tested-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> ] Link: https://lore.kernel.org/r/049fa71c-c7af-9c69-51c0-05c1bc2bf660@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26pinctrl: intel: Kconfig: Add configuration menu to Intel pin controlCai Huoqing
Adding a configuration menu to hold many Intel pin control drivers helps to make the display more concise. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-26pinctrl: tegra: Use correct offset for pin groupPrathamesh Shete
Function tegra_pinctrl_gpio_request_enable() and tegra_pinctrl_gpio_disable_free() uses pin offset instead of group offset, causing the driver to use wrong offset to enable gpio. Add a helper function tegra_pinctrl_get_group() to parse the pin group and determine correct offset. Signed-off-by: Kartik K <kkartik@nvidia.com> Signed-off-by: Prathamesh Shete <pshete@nvidia.com> Link: https://lore.kernel.org/r/20211025110959.27751-1-pshete@nvidia.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-25nvdimm/pmem: stop using q_usage_count as external pgmap refcountChristoph Hellwig
Originally all DAX access when through block_device operations and thus needed a queue reference. But since commit cccbce671582 ("filesystem-dax: convert to dax_direct_access()") all this happens at the DAX device level which uses its own refcounting. Having the external refcount thus wasn't needed but has otherwise been harmless for long time. But now that "block: drain file system I/O on del_gendisk" waits for q_usage_count to reach 0 in del_gendisk this whole scheme can't work anymore (and pmem is the only driver abusing q_usage_count like that). So switch to the internal reference and remove the unbalanced blk_freeze_queue_start that is taken care of by del_gendisk. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019073641.2323410-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-26m68knommu: Remove MCPU32 config symbolGeert Uytterhoeven
As of commit a3595962d82495f5 ("m68knommu: remove obsolete 68360 support"), nothing selects MCPU32 anymore. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-10-26m68k: set a default value for MEMORY_RESERVERandy Dunlap
'make randconfig' can produce a .config file with "CONFIG_MEMORY_RESERVE=" (no value) since it has no default. When a subsequent 'make all' is done, kconfig restarts the config and prompts for a value for MEMORY_RESERVE. This breaks scripting/automation where there is no interactive user input. Add a default value for MEMORY_RESERVE. (Any integer value will work here for kconfig.) Fixes a kconfig warning: .config:214:warning: symbol value '' invalid for MEMORY_RESERVE * Restart config... Memory reservation (MiB) (MEMORY_RESERVE) [] (NEW) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") # from beginning of git history Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-10-25fortify: strlen: Avoid shadowing previous localsQian Cai
The __compiletime_strlen() macro expansion will shadow p_size and p_len local variables. No callers currently use any of the shadowed names for their "p" variable, so there are no code generation problems. Add "__" prefixes to variable definitions __compiletime_strlen() to avoid new W=2 warnings: ./include/linux/fortify-string.h: In function 'strnlen': ./include/linux/fortify-string.h:17:9: warning: declaration of 'p_size' shadows a previous local [-Wshadow] 17 | size_t p_size = __builtin_object_size(p, 1); \ | ^~~~~~ ./include/linux/fortify-string.h:77:17: note: in expansion of macro '__compiletime_strlen' 77 | size_t p_len = __compiletime_strlen(p); | ^~~~~~~~~~~~~~~~~~~~ ./include/linux/fortify-string.h:76:9: note: shadowed declaration is here 76 | size_t p_size = __builtin_object_size(p, 1); | ^~~~~~ Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20211025210528.261643-1-quic_qiancai@quicinc.com
2021-10-25Merge branch 'Parallelize verif_scale selftests'Alexei Starovoitov
Andrii Nakryiko says: ==================== Reduce amount of waiting time when running test_progs in parallel mode (-j) by splitting bpf_verif_scale selftests into multiple tests. Previously it was structured as a test with multiple subtests, but subtests are not easily parallelizable with test_progs' infra. Also in practice each scale subtest is really an independent test with nothing shared across all substest. This patch set changes how test_progs test discovery works. Now it is possible to define multiple tests within a single source code file. One of the patches also marks tc_redirect selftests as serial, because it's extremely harmful to the test system when run in parallel mode. ==================== Acked-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-25selftests/bpf: Split out bpf_verif_scale selftests into multiple testsAndrii Nakryiko
Instead of using subtests in bpf_verif_scale selftest, turn each scale sub-test into its own test. Each subtest is compltely independent and just reuses a bit of common test running logic, so the conversion is trivial. For convenience, keep all of BPF verifier scale tests in one file. This conversion shaves off a significant amount of time when running test_progs in parallel mode. E.g., just running scale tests (-t verif_scale): BEFORE ====== Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED real 0m22.894s user 0m0.012s sys 0m22.797s AFTER ===== Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED real 0m12.044s user 0m0.024s sys 0m27.869s Ten second saving right there. test_progs -j is not yet ready to be turned on by default, unfortunately, and some tests fail almost every time, but this is a good improvement nevertheless. Ignoring few failures, here is sequential vs parallel run times when running all tests now: SEQUENTIAL ========== Summary: 206/953 PASSED, 4 SKIPPED, 0 FAILED real 1m5.625s user 0m4.211s sys 0m31.650s PARALLEL ======== Summary: 204/952 PASSED, 4 SKIPPED, 2 FAILED real 0m35.550s user 0m4.998s sys 0m39.890s Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-5-andrii@kernel.org
2021-10-25selftests/bpf: Mark tc_redirect selftest as serialAndrii Nakryiko
It seems to cause a lot of harm to kprobe/tracepoint selftests. Yucong mentioned before that it does manipulate sysfs, which might be the reason. So let's mark it as serial, though ideally it would be less intrusive on the system at test. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-4-andrii@kernel.org
2021-10-25selftests/bpf: Support multiple tests per fileAndrii Nakryiko
Revamp how test discovery works for test_progs and allow multiple test entries per file. Any global void function with no arguments and serial_test_ or test_ prefix is considered a test. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-3-andrii@kernel.org
2021-10-25selftests/bpf: Normalize selftest entry pointsAndrii Nakryiko
Ensure that all test entry points are global void functions with no input arguments. Mark few subtest entry points as static. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-2-andrii@kernel.org
2021-10-25signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.Eric W. Biederman
Update save_v86_state to always complete all of it's work except possibly some of the copies to userspace even if save_v86_state takes a fault. This ensures that the kernel is always in a sane state, even if userspace has done something silly. When save_v86_state takes a fault update it to force userspace to take a SIGSEGV and terminate the userspace application. As Andy pointed out in review of the first version of this change there are races between sigaction and the application terinating. Now that the code has been modified to always perform all save_v86_state's work (except possibly copying to userspace) those races do not matter from a kernel perspective. Forcing the userspace application to terminate (by resetting it's handler to SIGDFL) is there to keep everything as close to the current behavior as possible while removing the unique (and difficult to maintain) use of do_exit. If this new SIGSEGV happens during handle_signal the next time around the exit_to_user_mode_loop, SIGSEGV will be delivered to userspace. All of the callers of handle_vm86_trap and handle_vm86_fault run the exit_to_user_mode_loop before they return to userspace any signal sent to the current task during their execution will be delivered to the current task before that tasks exits to usermode. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: H Peter Anvin <hpa@zytor.com> v1: https://lkml.kernel.org/r/20211020174406.17889-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/877de1xcr6.fsf_-_@disp2133 Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ONEric W. Biederman
The function save_v86_state is only called when userspace was operating in vm86 mode before entering the kernel. Not having vm86 state in the task_struct should never happen. So transform the hand rolled BUG_ON into an actual BUG_ON to make it clear what is happening. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: H Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20211020174406.17889-9-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25signal/sparc: In setup_tsb_params convert open coded BUG into BUGEric W. Biederman
The function setup_tsb_params has exactly one caller tsb_grow. The function tsb_grow passes in a tsb_bytes value that is between 8192 and 1048576 inclusive, and is guaranteed to be a power of 2. The function setup_tsb_params verifies this property with a switch statement and then prints an error and causes the task to exit if this is not true. In practice that print statement can never be reached because tsb_grow never passes in a bad tsb_size. So if tsb_size ever gets a bad value that is a kernel bug. So replace the do_exit which is effectively an open coded version of BUG() with an actuall call to BUG(). Making it clearer that this is a case that can never, and should never happen. Cc: David Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Link: https://lkml.kernel.org/r/20211020174406.17889-8-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25signal/powerpc: On swapcontext failure force SIGSEGVEric W. Biederman
If the register state may be partial and corrupted instead of calling do_exit, call force_sigsegv(SIGSEGV). Which properly kills the process with SIGSEGV and does not let any more userspace code execute, instead of just killing one thread of the process and potentially confusing everything. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.") Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt") Link: https://lkml.kernel.org/r/20211020174406.17889-7-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)Eric W. Biederman
Today the sh code allocates memory the first time a process uses the fpu. If that memory allocation fails, kill the affected task with force_sig(SIGKILL) rather than do_group_exit(SIGKILL). Calling do_group_exit from an exception handler can potentially lead to dead locks as do_group_exit is not designed to be called from interrupt context. Instead use force_sig(SIGKILL) to kill the userspace process. Sending signals in general and force_sig in particular has been tested from interrupt context so there should be no problems. Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: linux-sh@vger.kernel.org Fixes: 0ea820cf9bf5 ("sh: Move over to dynamically allocated FPU context.") Link: https://lkml.kernel.org/r/20211020174406.17889-6-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULTEric W. Biederman
When an instruction to save or restore a register from the stack fails in _save_fp_context or _restore_fp_context return with -EFAULT. This change was made to r2300_fpu.S[1] but it looks like it got lost with the introduction of EX2[2]. This is also what the other implementation of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and what is needed for the callers to be able to handle the error. Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because it does not terminate the entire process it just terminates a single thread. As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack remove the problematic and now unused helper function. Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Maciej Rozycki <macro@orcam.me.uk> Cc: linux-mips@vger.kernel.org [1] 35938a00ba86 ("MIPS: Fix ISA I FP sigcontext access violation handling") [2] f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout") Cc: stable@vger.kernel.org Fixes: f92722dc4545 ("MIPS: Correct MIPS I FP sigcontext layout") Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lkml.kernel.org/r/20211020174406.17889-5-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25net/mlx5: SF_DEV Add SF device trace pointsParav Pandit
Add SF device add and delete specific trace points. echo mlx5:mlx5_sf_dev_add >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: SF, Add SF trace pointsParav Pandit
Add support for trace events for SFs to improve debugging. This covers (a) port add and free trace points (b) device level trace points (c) SF hardware context add, free trace points. (d) SF function activate/deacticate and state trace points SF events examples: echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure max_macs paramShay Drory
Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure event_eq_size paramShay Drory
Event EQ is an EQ which received the notification of almost all the events generated by the NIC. Currently, each event EQ is taking 512KB of memory. This size is not needed in most use cases, and is critical with large scale. Hence, allow user to configure the size of the event EQ. For example to reduce event EQ size to 64, execute:: $ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64 $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Let user configure io_eq_size paramShay Drory
Currently, each I/O EQ is taking 128KB of memory. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the size of I/O EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64 $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Bridge, support replacing existing FDB entryVlad Buslov
The SWITCHDEV_FDB_ADD_TO_DEVICE is used for both adding new and replacing existing entry. Implement support for replacing existing FDB entries in mlx5 offload code. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Bridge, extract code to lookup and del/notify entryVlad Buslov
Following two patterns in bridge code are used in multiple places where similar code is duplicated: - Lookup FDB entry from hashtable by address+vid pair. - Notify software bridge and then delete existing FDB entry. In order to improve code quality and prepare for following patch series that also uses described patterns, extract the codes to dedicated helper functions. This commit doesn't change functionality. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Add periodic update of host time to firmwareAya Levin
Firmware logs its asserts also to non-volatile memory. In order to reduce drift between the NIC and the host, the driver sets the host epoch-time to the firmware every hour. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Print health buffer by log levelAya Levin
Add log macro which gets log level as a parameter. Use the severity read from the health buffer and the new log macro to log the health buffer with severity as log level. Prior to this patch, health buffer was printed in error log level regardless of its severity. Now the user may filter dmesg (--level) or change kernel log level to focus on different severity levels of firmware errors. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Extend health buffer dumpAya Levin
Enhance health buffer to include: - assert_var5: expose the 6'th assert variable. - time: error's time-stamp in seconds (epoch time). - rfr: Recovery Flow Requiered. When set, indicates that the error cannot be recovered without flow involving reset. - severity: error's severity value, ranging from emergency to debug. Expose them in the health buffer dump (dmesg and devlink fw reporter). Health buffer in dmesg: mlx5_core 0000:08:00.0: print_health_info:425:(pid 912): Health issue observed, firmware internal error, severity(3) ERROR: mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[0] 0x08040700 mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[1] 0x00000000 mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[2] 0x00000000 mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[3] 0x00000000 mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[4] 0x00000000 mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[5] 0x00000000 mlx5_core 0000:08:00.0: print_health_info:432:(pid 912): assert_exit_ptr 0x00aaf800 mlx5_core 0000:08:00.0: print_health_info:434:(pid 912): assert_callra 0x00aaf70c mlx5_core 0000:08:00.0: print_health_info:436:(pid 912): fw_ver 16.32.492 mlx5_core 0000:08:00.0: print_health_info:437:(pid 912): time 1634819758 mlx5_core 0000:08:00.0: print_health_info:438:(pid 912): hw_id 0x0000020d mlx5_core 0000:08:00.0: print_health_info:439:(pid 912): rfr 0 mlx5_core 0000:08:00.0: print_health_info:440:(pid 912): severity 3 (ERROR) mlx5_core 0000:08:00.0: print_health_info:441:(pid 912): irisc_index 9 mlx5_core 0000:08:00.0: print_health_info:442:(pid 912): synd 0x1: firmware internal error mlx5_core 0000:08:00.0: print_health_info:444:(pid 912): ext_synd 0x802b mlx5_core 0000:08:00.0: print_health_info:445:(pid 912): raw fw_ver 0x102001ec Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Reduce flow counters bulk query buffer size for SFsAvihai Horon
Currently, the flow counters bulk query buffer takes a little more than 512KB of memory, which is aligned to the next power of 2, to 1MB. The buffer size determines the maximum number of flow counters that can be queried at a time. Thus, having a bigger buffer can improve performance for users that need to query many flow counters. SFs don't use many flow counters and don't need a big buffer. Since this size is critical with large scale, reduce the size of the bulk query buffer for SFs. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Fix unused function warning of mlx5i_flow_type_maskShay Drory
The cited commit is causing unused-function warning[1] when CONFIG_MLX5_EN_RXNFC is not set. Fix this by moving the function into the ifdef, where it's only used [1] warning: ‘mlx5i_flow_type_mask’ defined but not used [-Wunused-function] Fixes: 9fbe1c25ecca ("net/mlx5i: Enable Rx steering for IPoIB via ethtool") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5: Remove unnecessary checks for slow path flagPaul Blakey
After previous changes, caller (mlx5e_tc_offload_fdb_rules()) already checks for the slow path flag, and if set won't call offload/unoffload sample. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25net/mlx5e: don't write directly to netdev->dev_addrJakub Kicinski
Use a local buffer and eth_hw_addr_set() Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>