Age | Commit message (Collapse) | Author |
|
In preparation for fixing efi_memmap_alloc() leaks, add support for
recording whether the memmap was dynamically allocated from slab,
memblock, or is the original physical memmap provided by the platform.
Given this tracking is established in efi_memmap_alloc() and needs to be
carried to efi_memmap_install(), use 'struct efi_memory_map_data' to
convey the flags.
Some small cleanups result from this reorganization, specifically the
removal of local variables for 'phys' and 'size' that are already
tracked in @data.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-12-ardb@kernel.org
|
|
In preparation for garbage collecting dynamically allocated EFI memory
maps, where the allocation method of memblock vs slab needs to be
recalled, convert the existing 'late' flag into a 'flags' bitmask.
Arrange for the flag to be passed via 'struct efi_memory_map_data'. This
structure grows additional flags in follow-on changes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-11-ardb@kernel.org
|
|
A previous commit f99afd08a45f ("efi: Update efi_mem_type() to return an
error rather than 0") changed the return value from EFI_RESERVED_TYPE to
-EINVAL when the searched physical address is not present in any memory
descriptor. But the comment preceding the function never changed. Let's
change the comment now to reflect the new return value -EINVAL.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-10-ardb@kernel.org
|
|
The new of_devlink support breaks PCIe probing on ARM platforms booting
via UEFI if the firmware exposes a EFI framebuffer that is backed by a
PCI device. The reason is that the probing order gets reversed,
resulting in a resource conflict on the framebuffer memory window when
the PCIe probes last, causing it to give up entirely.
Given that we rely on PCI quirks to deal with EFI framebuffers that get
moved around in memory, we cannot simply drop the memory reservation, so
instead, let's use the device link infrastructure to register this
dependency, and force the probing to occur in the expected order.
Co-developed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-9-ardb@kernel.org
|
|
We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.
Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.
Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.
Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
|
|
The EFI code creates RWX mappings for all memory regions that are
occupied after the stub completes, and in the mixed mode case, it
even creates RWX mappings for all of the remaining DRAM as well.
Let's try to avoid this, by setting the NX bit for all memory
regions except the ones that are marked as EFI runtime services
code [which means text+rodata+data in practice, so we cannot mark
them read-only right away]. For cases of buggy firmware where boot
services code is called during SetVirtualAddressMap(), map those
regions with exec permissions as well - they will be unmapped in
efi_free_boot_services().
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-7-ardb@kernel.org
|
|
The mixed mode thunking routine requires a part of it to be
mapped 1:1, and for this reason, we currently map the entire
kernel .text read/write in the EFI page tables, which is bad.
In fact, the kernel_map_pages_in_pgd() invocation that installs
this mapping is entirely redundant, since all of DRAM is already
1:1 mapped read/write in the EFI page tables when we reach this
point, which means that .rodata is mapped read-write as well.
So let's remap both .text and .rodata read-only in the EFI
page tables.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-6-ardb@kernel.org
|
|
The following commit:
15f003d20782 ("x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()")
modified kernel_map_pages_in_pgd() to manage writable permissions
of memory mappings in the EFI page table in a different way, but
in the process, it removed the ability to clear NX attributes from
read-only mappings, by clobbering the clear mask if _PAGE_RW is not
being requested.
Failure to remove the NX attribute from read-only mappings is
unlikely to be a security issue, but it does prevent us from
tightening the permissions in the EFI page tables going forward,
so let's fix it now.
Fixes: 15f003d20782 ("x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-5-ardb@kernel.org
|
|
The only users of these got removed, so they also need to be
removed to avoid warnings:
arch/x86/boot/compressed/eboot.c: In function 'setup_efi_pci':
arch/x86/boot/compressed/eboot.c:117:16: error: unused variable 'nr_pci' [-Werror=unused-variable]
unsigned long nr_pci;
^~~~~~
arch/x86/boot/compressed/eboot.c: In function 'setup_uga':
arch/x86/boot/compressed/eboot.c:244:16: error: unused variable 'nr_ugas' [-Werror=unused-variable]
unsigned long nr_ugas;
^~~~~~~
Fixes: 2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-4-ardb@kernel.org
|
|
Reduce the stack frame of the EFI stub's mixed mode thunk routine by
8 bytes, by moving the GDT and return addresses to EBP and EBX, which
we need to preserve anyway, since their top halves will be cleared by
the call into 32-bit firmware code. Doing so results in the UEFI code
being entered with a 16 byte aligned stack, as mandated by the UEFI
spec, fixing the last occurrence in the 64-bit kernel where we violate
this requirement.
Also, move the saved GDT from a global variable to an unused part of the
stack frame, and touch up some other parts of the code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-3-ardb@kernel.org
|
|
Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
sched_idle_cpu() isn't used for non SMP configuration and with a recent
change, we have started getting following warning:
kernel/sched/fair.c:5221:12: warning: ‘sched_idle_cpu’ defined but not used [-Wunused-function]
Fix that by defining sched_idle_cpu() only for SMP configurations.
Fixes: 323af6deaf70 ("sched/fair: Load balance aggressively for SCHED_IDLE CPUs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/f0554f590687478b33914a4aff9f0e6a62886d44.1579499907.git.viresh.kumar@linaro.org
|
|
|
|
Today, I erroneously built a time-travel configuration with btrfs
enabled, and noticed it cannot boot in time-travel=inf-cpu mode,
both xor and raid6 speed measurement gets stuck.
For xor, work around it by picking the first algorithm if inf-cpu
mode is enabled.
For raid6, I didn't find such a workaround, so disallow enabling
time-travel mode if RAID6_PQ_BENCHMARK is enabled.
With this, and RAID6_PQ_BENCHMARK disabled, I can boot a kernel
that has btrfs enabled in time-travel=inf-cpu mode.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This reverts commit 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS").
There are two issues with this commit, uncovered by Anton in tests
on some (Debian) systems:
1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
isn't set. Don't recall now if it just wasn't needed on my system, or
if I never tested this case.
2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
unexpected since whatever wanted to run is likely to have to run
before the kernel init etc. that calls the constructors in this case.
Basically, some constructors that gcc emits (libc has?) need to run
very early during init; the failure mode otherwise was that the ptrace
fork test already failed:
----------------------
$ ./linux mem=512M
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
Aborted
----------------------
Thinking more about this, it's clear that we simply cannot support
CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
involve not use of the __attribute__((constructor)), but instead
some constructor code/entry generated by gcc. Therefore, we cannot
distinguish between kernel constructors and system constructors.
Thus, revert this commit.
Cc: stable@vger.kernel.org [5.4+]
Fixes: 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS")
Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
UML_NET_VECTOR now supports filters compiled with pcap outside of UML;
it also supports: EoGRE, EoL2TPv3, RAW (+/- BPF), TAP and BESS.
While vector drivers are not 1:1 replacements for the existing drivers,
you can achieve the same topologies and the same connectivity at much
higher performance (2.5 to 9 Gbit on mid-range Ryzen desktop) - the old
drivers test out in the 500Mbit range on the same hardware.
For all these reasons, the non-vector based transports are now
unnecessary, and some, most notably pcap and vde are maintenance
burdens. Thus, it makes sense to at least start thinking about removing
the non-vector transports, so for now, mark them as obsolete.
Link: https://lore.kernel.org/lkml/15f048d3-07ab-61c1-c6e0-0712e626dd33@cambridgegreys.com/T/#u
Suggested-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
In some cases, for example when the program(s) running inside UML
isn't/aren't interactive (like the hwsim tests for wpa_supplicant)
there's really no value in having the serial lines configured to
be raw as they are now by default. Setting them to non-raw lets
one abort the whole UML with Ctrl-C, which is really the right
thing to do in these cases, basically the whole UML instance is
more like a single (testing) program.
Add a "ssl-non-raw" option to UML to support such a mode.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Three fixes for RISC-V:
- Don't free and reuse memory containing the code that CPUs parked at
boot reside in.
- Fix rv64 build problems for ubsan and some modules by adding
logical and arithmetic shift helpers for 128-bit values. These are
from libgcc and are similar to what's present for ARM64.
- Fix vDSO builds to clean up their own temporary files"
* tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Less inefficient gcc tishift helpers (and export their symbols)
riscv: delete temporary files
riscv: make sure the cores stay looping in .Lsecondary_park
|
|
Pull networking fixes from David Miller:
1) Fix non-blocking connect() in x25, from Martin Schiller.
2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.
3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.
4) Limit size of TSO packets properly in lan78xx driver, from Eric
Dumazet.
5) r8152 probe needs an endpoint sanity check, from Johan Hovold.
6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
John Fastabend.
7) hns3 needs short frames padded on transmit, from Yunsheng Lin.
8) Fix netfilter ICMP header corruption, from Eyal Birger.
9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.
10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.
11) Fix memory leak in act_ctinfo, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
cxgb4: reject overlapped queues in TC-MQPRIO offload
cxgb4: fix Tx multi channel port rate limit
net: sched: act_ctinfo: fix memory leak
bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
bnxt_en: Fix ipv6 RFS filter matching logic.
bnxt_en: Fix NTUPLE firmware command failures.
net: systemport: Fixed queue mapping in internal ring map
net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
net: hns: fix soft lockup when there is not enough memory
net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
net/sched: act_ife: initalize ife->metalist earlier
netfilter: nat: fix ICMP header corruption on ICMP errors
net: wan: lapbether.c: Use built-in RCU list checking
netfilter: nf_tables: fix flowtable list del corruption
netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
netfilter: nft_tunnel: ERSPAN_VERSION must not be null
netfilter: nft_tunnel: fix null-attribute check
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two runtime PM fixes and one leak fix"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: iop3xx: Fix memory leak in probe error path
i2c: tegra: Properly disable runtime PM on driver's probe error
i2c: tegra: Fix suspending in active runtime PM state
|
|
Scenario 1, ARMv7
=================
If code in arch/arm/kernel/ftrace.c would operate on mcount() pointer
the following may be generated:
00000230 <prealloc_fixed_plts>:
230: b5f8 push {r3, r4, r5, r6, r7, lr}
232: b500 push {lr}
234: f7ff fffe bl 0 <__gnu_mcount_nc>
234: R_ARM_THM_CALL __gnu_mcount_nc
238: f240 0600 movw r6, #0
238: R_ARM_THM_MOVW_ABS_NC __gnu_mcount_nc
23c: f8d0 1180 ldr.w r1, [r0, #384] ; 0x180
FTRACE currently is not able to deal with it:
WARNING: CPU: 0 PID: 0 at .../kernel/trace/ftrace.c:1979 ftrace_bug+0x1ad/0x230()
...
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.116-... #1
...
[<c0314e3d>] (unwind_backtrace) from [<c03115e9>] (show_stack+0x11/0x14)
[<c03115e9>] (show_stack) from [<c051a7f1>] (dump_stack+0x81/0xa8)
[<c051a7f1>] (dump_stack) from [<c0321c5d>] (warn_slowpath_common+0x69/0x90)
[<c0321c5d>] (warn_slowpath_common) from [<c0321cf3>] (warn_slowpath_null+0x17/0x1c)
[<c0321cf3>] (warn_slowpath_null) from [<c038ee9d>] (ftrace_bug+0x1ad/0x230)
[<c038ee9d>] (ftrace_bug) from [<c038f1f9>] (ftrace_process_locs+0x27d/0x444)
[<c038f1f9>] (ftrace_process_locs) from [<c08915bd>] (ftrace_init+0x91/0xe8)
[<c08915bd>] (ftrace_init) from [<c0885a67>] (start_kernel+0x34b/0x358)
[<c0885a67>] (start_kernel) from [<00308095>] (0x308095)
---[ end trace cb88537fdc8fa200 ]---
ftrace failed to modify [<c031266c>] prealloc_fixed_plts+0x8/0x60
actual: 44:f2:e1:36
ftrace record flags: 0
(0) expected tramp: c03143e9
Scenario 2, ARMv4T
==================
ftrace: allocating 14435 entries in 43 pages
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2029 ftrace_bug+0x204/0x310
CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.5 #1
Hardware name: Cirrus Logic EDB9302 Evaluation Board
[<c0010a24>] (unwind_backtrace) from [<c000ecb0>] (show_stack+0x20/0x2c)
[<c000ecb0>] (show_stack) from [<c03c72e8>] (dump_stack+0x20/0x30)
[<c03c72e8>] (dump_stack) from [<c0021c18>] (__warn+0xdc/0x104)
[<c0021c18>] (__warn) from [<c0021d7c>] (warn_slowpath_null+0x4c/0x5c)
[<c0021d7c>] (warn_slowpath_null) from [<c0095360>] (ftrace_bug+0x204/0x310)
[<c0095360>] (ftrace_bug) from [<c04dabac>] (ftrace_init+0x3b4/0x4d4)
[<c04dabac>] (ftrace_init) from [<c04cef4c>] (start_kernel+0x20c/0x410)
[<c04cef4c>] (start_kernel) from [<00000000>] ( (null))
---[ end trace 0506a2f5dae6b341 ]---
ftrace failed to modify
[<c000c350>] perf_trace_sys_exit+0x5c/0xe8
actual: 1e:ff:2f:e1
Initializing ftrace call sites
ftrace record flags: 0
(0)
expected tramp: c000fb24
The analysis for this problem has been already performed previously,
refer to the link below.
Fix the above problems by allowing only selected reloc types in
__mcount_loc. The list itself comes from the legacy recordmcount.pl
script.
Link: https://lore.kernel.org/lkml/56961010.6000806@pengutronix.de/
Cc: stable@vger.kernel.org
Fixes: ed60453fa8f8 ("ARM: 6511/1: ftrace: add ARM support for C version of recordmcount")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|
Ido Schimmel says:
====================
mlxsw: Add tunnel devlink-trap support
This patch set from Amit adds support in mlxsw for tunnel traps and a
few additional layer 3 traps that can report drops and exceptions via
devlink-trap.
These traps allow the user to more quickly diagnose problems relating to
tunnel decapsulation errors, such as packet being too short to
decapsulate or a packet containing wrong GRE key in its GRE header.
Patch set overview:
Patches #1-#4 add three additional layer 3 traps. Two of which are
mlxsw-specific as they relate to hardware-specific errors. The patches
include documentation of each trap and selftests.
Patches #5-#8 are preparations. They ensure that the correct ECN bits
are set in the outer header during IPinIP encapsulation and that packets
with an invalid ECN combination in underlay and overlay are trapped to
the kernel and not decapsulated in hardware.
Patches #9-#15 add support for two tunnel related traps. Each trap is
documented and selftested using both VXLAN and IPinIP tunnels, if
applicable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test that the trap is triggered under the right conditions and that
devlink counters increase when action is trap.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a trap for NVE packets that the device decided to drop because their
overlay source MAC is multicast.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test that the trap is triggered under the right conditions and that
devlink counters increase.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test that the trap is triggered under the right conditions and that
devlink counters increase.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the trap IDs and trap group used to report tunnel drops. Register
tunnel packet traps and associated tunnel trap group with devlink
during driver initialization.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add packet traps that can report packets that were dropped during tunnel
decapsulation.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move L3_DROPS case to appear after L2_DROPS case.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initialize ECN mapping registers during router init according to
INET_ECN_encapsulate() and INET_ECN_decapsulate().
For IP-in-IP encapsulation, this is required to ensure that ECN bits in
the underlay are set in accordance with the kernel. For decapsulation,
this is required to ensure that packets with invalid ECN combination in
underlay and overlay are trapped to the kernel and not forwarded.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This register configures the actions that are done during IPinIP
decapsulation based on the ECN bits.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This register performs mapping from overlay ECN to underlay ECN during
IPinIP encapsulation.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a trap for packets that the device decided to drop because they are
not supposed to be routed. For example, IGMP queries can be flooded by
the device in layer 2 and reach the router. Such packets should not be
routed and instead dropped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add test cases to check that packets routed through disabled RIFs and
packets routed from disabled RIFs are dropped and devlink counters
increase when the action is trap.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IRIF_DISABLED and ERIF_DISABLED are driver specific traps. Packets are
dropped for these reasons when they need to be routed through/from
existing router interfaces (RIF) which are disabled.
Add devlink driver-specific traps and mlxsw trap IDs used to report
these traps.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 E-Switch chains and prios
This series has two parts,
1) A merge commit with mlx5-next branch that include updates for mlx5
HW layouts needed for this and upcoming submissions.
2) From Paul, Increase the number of chains and prios
Currently the Mellanox driver supports offloading tc rules that
are defined on the first 4 chains and the first 16 priorities.
The restriction stems from the firmware flow level enforcement
requiring a flow table of a certain level to point to a flow
table of a higher level. This limitation may be ignored by setting
the ignore_flow_level bit when creating flow table entries.
Use unmanaged tables and ignore flow level to create more tables than
declared by fs_core steering. Manually manage the connections between the
tables themselves.
HW table is instantiated for every tc <chain,prio> tuple. The miss rule
of every table either jumps to the next <chain,prio> table, or continues
to slow_fdb. This logic is realized by following this sequence:
1. Create an auto-grouped flow table for the specified priority with
reserved entries
Reserved entries are allocated at the end of the flow table.
Flow groups are evaluated in sequence and therefore it is guaranteed
that the flow group defined on the last FTEs will be the last to evaluate.
Define a "match all" flow group on the reserved entries, providing
the platform to add table miss actions.
2. Set the miss rule action to jump to the next <chain,prio> table
or the slow_fdb.
3. Link the previous priority table to point to the new table by
updating its miss rule.
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A queue can't belong to multiple traffic classes. So, reject
any such configuration that results in overlapped queues for a
traffic class.
Fixes: b1396c2bd675 ("cxgb4: parse and configure TC-MQPRIO offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
T6 can support 2 egress traffic management channels per port to
double the total number of traffic classes that can be configured.
In this configuration, if the class belongs to the other channel,
then all the queues must be bound again explicitly to the new class,
for the rate limit parameters on the other channel to take effect.
So, always explicitly bind all queues to the port rate limit traffic
class, regardless of the traffic management channel that it belongs
to. Also, only bind queues to port rate limit traffic class, if all
the queues don't already belong to an existing different traffic
class.
Fixes: 4ec4762d8ec6 ("cxgb4: add TC-MATCHALL classifier egress offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
found a warning by the following command:
./scripts/checkpatch.pl -f drivers/net/phy/adin.c
WARNING: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst
#628: FILE: drivers/net/phy/adin.c:628:
+ msleep(10);
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement a cleanup method to properly free ci->params
BUG: memory leak
unreferenced object 0xffff88811746e2c0 (size 64):
comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00 .4`.............
backtrace:
[<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline]
[<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline]
[<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
[<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline]
[<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline]
[<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236
[<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944
[<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000
[<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410
[<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465
[<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
[<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
[<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
[<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
[<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
[<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
[<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline]
[<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659
[<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
[<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
[<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
[<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline]
[<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline]
[<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
Rate adaptation for Felix DSA switch
When operating the MAC at 2.5Gbps (2500Base-X and USXGMII/QSXGMII) and
in combination with certain PHYs, it is possible that the copper side
may operate at lower link speeds. In this case, it is the PHY who has a
MAC inside of it that emits pause frames towards the switch's MAC,
telling it to slow down so that the transmission is lossless.
These patches are the support needed for the switch side of things to
work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the serdes link is set to 2500 using interfce type 2500base-X, lower
link speeds over on the line side should still be supported.
Rate adaptation is done out of band, in our case using AQR PHYs this is
done using flow control.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
between line side 100/1000 links and MAC running at 2.5G.
This is independent of the flow control configuration settled on line
side though AN.
In general, allowing the MAC to handle flow control even if not
negotiated with the link partner should not be a problem, so the patch
just enables it in all cases.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next, they are:
1) Incorrect uapi header comment in bitwise, from Jeremy Sowden.
2) Fetch flow statistics if flow is still active.
3) Restrict flow matching on hardware based on input device.
4) Add nf_flow_offload_work_alloc() helper function.
5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown
instead.
6) Use atomic bitwise operation to operate with flow flags.
7) Add nf_flowtable_hw_offload() helper function to check for the
NF_FLOWTABLE_HW_OFFLOAD flag.
8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable
software datapath.
9) Remove indirect calls in xt_hashlimit, from Florian Westphal.
10) Add nf_flow_offload_tuple() helper to consolidate code.
11) Add nf_flow_table_offload_cmd() helper function.
12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash
set types, from Jeremy Sowden.
13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden.
14) Replace goto by return in error path of nft_bitwise_dump(), from
Jeremy Sowden.
15) Add bitwise operation netlink attribute, also from Jeremy.
16) Add nft_bitwise_init_bool(), from Jeremy Sowden.
17) Add nft_bitwise_eval_bool(), also from Jeremy.
18) Add nft_bitwise_dump_bool(), from Jeremy Sowden.
19) Disallow hardware offload for other that NFT_BITWISE_BOOL,
from Jeremy Sowden.
20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy.
21) Add support for bitwise shift operation, from Jeremy Sowden.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The existing __lshrti3 was really inefficient, and the other two helpers
are also needed to compile some modules.
Add the missing versions, and export all of the symbols like arm64
already does.
This code is based on the assembly generated by libgcc builds.
This fixes a build break triggered by ubsan:
riscv64-unknown-linux-gnu-ld: lib/ubsan.o: in function `.L2':
ubsan.c:(.text.unlikely+0x38): undefined reference to `__ashlti3'
riscv64-unknown-linux-gnu-ld: ubsan.c:(.text.unlikely+0x42): undefined reference to `__ashrti3'
Signed-off-by: Olof Johansson <olof@lixom.net>
[paul.walmsley@sifive.com: use SYM_FUNC_{START,END} instead of
ENTRY/ENDPROC; note libgcc origin]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"Raw NAND:
- GPMI: Fix the suspend/resume
SPI-NOR:
- Fix quad enable on Spansion like flashes
- Fix selection of 4-byte addressing opcodes on Spansion"
* tag 'mtd/fixes-for-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume
mtd: rawnand: gpmi: Fix suspend/resume problem
mtd: spi-nor: Fix quad enable for Spansion like flashes
mtd: spi-nor: Fix selection of 4-byte addressing opcodes on Spansion
|