Age | Commit message (Collapse) | Author |
|
I found an ACPI cache leak in ACPI early termination and boot continuing case.
When early termination occurs due to malicious ACPI table, Linux kernel
terminates ACPI function and continues to boot process. While kernel terminates
ACPI function, kmem_cache_destroy() reports Acpi-Operand cache leak.
Boot log of ACPI operand cache leak is as follows:
>[ 0.464168] ACPI: Added _OSI(Module Device)
>[ 0.467022] ACPI: Added _OSI(Processor Device)
>[ 0.469376] ACPI: Added _OSI(3.0 _SCP Extensions)
>[ 0.471647] ACPI: Added _OSI(Processor Aggregator Device)
>[ 0.477997] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174)
>[ 0.482706] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461)
>[ 0.487503] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543)
>[ 0.492136] ACPI Error: Method parse/execution failed [\_SB._INI] (Node ffff88021710a618), AE_AML_INTERNAL (20170303/psparse-543)
>[ 0.497683] ACPI: Interpreter enabled
>[ 0.499385] ACPI: (supports S0)
>[ 0.501151] ACPI: Using IOAPIC for interrupt routing
>[ 0.503342] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174)
>[ 0.506522] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461)
>[ 0.510463] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543)
>[ 0.514477] ACPI Error: Method parse/execution failed [\_PIC] (Node ffff88021710ab18), AE_AML_INTERNAL (20170303/psparse-543)
>[ 0.518867] ACPI Exception: AE_AML_INTERNAL, Evaluating _PIC (20170303/bus-991)
>[ 0.522384] kmem_cache_destroy Acpi-Operand: Slab cache still has objects
>[ 0.524597] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 #26
>[ 0.526795] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006
>[ 0.529668] Call Trace:
>[ 0.530811] ? dump_stack+0x5c/0x81
>[ 0.532240] ? kmem_cache_destroy+0x1aa/0x1c0
>[ 0.533905] ? acpi_os_delete_cache+0xa/0x10
>[ 0.535497] ? acpi_ut_delete_caches+0x3f/0x7b
>[ 0.537237] ? acpi_terminate+0xa/0x14
>[ 0.538701] ? acpi_init+0x2af/0x34f
>[ 0.540008] ? acpi_sleep_proc_init+0x27/0x27
>[ 0.541593] ? do_one_initcall+0x4e/0x1a0
>[ 0.543008] ? kernel_init_freeable+0x19e/0x21f
>[ 0.546202] ? rest_init+0x80/0x80
>[ 0.547513] ? kernel_init+0xa/0x100
>[ 0.548817] ? ret_from_fork+0x25/0x30
>[ 0.550587] vgaarb: loaded
>[ 0.551716] EDAC MC: Ver: 3.0.0
>[ 0.553744] PCI: Probing PCI hardware
>[ 0.555038] PCI host bridge to bus 0000:00
> ... Continue to boot and log is omitted ...
I analyzed this memory leak in detail and found acpi_ns_evaluate() function
only removes Info->return_object in AE_CTRL_RETURN_VALUE case. But, when errors
occur, the status value is not AE_CTRL_RETURN_VALUE, and Info->return_object is
also not null. Therefore, this causes acpi operand memory leak.
This cache leak causes a security threat because an old kernel (<= 4.9) shows
memory locations of kernel functions in stack dump. Some malicious users
could use this information to neutralize kernel ASLR.
I made a patch to fix ACPI operand cache leak.
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
sparc:allmodconfig fails to build as follows.
ERROR: "mdesc_release" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_hvapi_register" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_get_property" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_node_by_name" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "mdesc_grab" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_info" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_submit" [drivers/sbus/char/oradax.ko] undefined!
ERROR: "sun4v_ccb_kill" [drivers/sbus/char/oradax.ko] undefined!
The symbols are only available with SPARC64 builds, thus the driver
depends on it.
Fixes: dd0273284c74 ("sparc64: Oracle DAX driver")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Khalid Aziz says:
====================
Application Data Integrity feature introduced by SPARC M7
V12 changes:
This series is same as v10 and v11 and was simply rebased on 4.16-rc2
kernel and patch 11 was added to update signal delivery code to use the
new helper functions added by Eric Biederman. Can mm maintainers please
review patches 2, 7, 8 and 9 which are arch independent, and
include/linux/mm.h and mm/ksm.c changes in patch 10 and ack these if
everything looks good?
SPARC M7 processor adds additional metadata for memory address space
that can be used to secure access to regions of memory. This additional
metadata is implemented as a 4-bit tag attached to each cacheline size
block of memory. A task can set a tag on any number of such blocks.
Access to such block is granted only if the virtual address used to
access that block of memory has the tag encoded in the uppermost 4 bits
of VA. Since sparc processor does not implement all 64 bits of VA, top 4
bits are available for ADI tags. Any mismatch between tag encoded in VA
and tag set on the memory block results in a trap. Tags are verified in
the VA presented to the MMU and tags are associated with the physical
page VA maps on to. If a memory page is swapped out and page frame gets
reused for another task, the tags are lost and hence must be saved when
swapping or migrating the page.
A userspace task enables ADI through mprotect(). This patch series adds
a page protection bit PROT_ADI and a corresponding VMA flag
VM_SPARC_ADI. VM_SPARC_ADI is used to trigger setting TTE.mcd bit in the
sparc pte that enables ADI checking on the corresponding page. MMU
validates the tag embedded in VA for every page that has TTE.mcd bit set
in its pte. After enabling ADI on a memory range, the userspace task can
set ADI version tags using stxa instruction with ASI_MCD_PRIMARY or
ASI_MCD_ST_BLKINIT_PRIMARY ASI.
Once userspace task calls mprotect() with PROT_ADI, kernel takes
following overall steps:
1. Find the VMAs covering the address range passed in to mprotect and
set VM_SPARC_ADI flag. If address range covers a subset of a VMA, the
VMA will be split.
2. When a page is allocated for a VA and the VMA covering this VA has
VM_SPARC_ADI flag set, set the TTE.mcd bit so MMU will check the
vwersion tag.
3. Userspace can now set version tags on the memory it has enabled ADI
on. Userspace accesses ADI enabled memory using a virtual address that
has the version tag embedded in the high bits. MMU validates this
version tag against the actual tag set on the memory. If tag matches,
MMU performs the VA->PA translation and access is granted. If there is a
mismatch, hypervisor sends a data access exception or precise memory
corruption detected exception depending upon whether precise exceptions
are enabled or not (controlled by MCDPERR register). Kernel sends
SIGSEGV to the task with appropriate si_code.
4. If a page is being swapped out or migrated, kernel must save any ADI
tags set on the page. Kernel maintains a page worth of tag storage
descriptors. Each descriptors pointsto a tag storage space and the
address range it covers. If the page being swapped out or migrated has
ADI enabled on it, kernel finds a tag storage descriptor that covers the
address range for the page or allocates a new descriptor if none of the
existing descriptors cover the address range. Kernel saves tags from the
page into the tag storage space descriptor points to.
5. When the page is swapped back in or reinstantiated after migration,
kernel restores the version tags on the new physical page by retrieving
the original tag from tag storage pointed to by a tag storage descriptor
for the virtual address range for new page.
User task can disable ADI by calling mprotect() again on the memory
range with PROT_ADI bit unset. Kernel clears the VM_SPARC_ADI flag in
VMAs, merges adjacent VMAs if necessary, and clears TTE.mcd bit in the
corresponding ptes.
IOMMU does not support ADI checking. Any version tags embedded in the
top bits of VA meant for IOMMU, are cleared and replaced with sign
extension of the first non-version tag bit (bit 59 for SPARC M7) for
IOMMU addresses.
This patch series adds support for this feature in 11 patches:
Patch 1/11
Tag mismatch on access by a task results in a trap from hypervisor as
data access exception or a precide memory corruption detected
exception. As part of handling these exceptions, kernel sends a
SIGSEGV to user process with special si_code to indicate which fault
occurred. This patch adds three new si_codes to differentiate between
various mismatch errors.
Patch 2/11
When a page is swapped or migrated, metadata associated with the page
must be saved so it can be restored later. This patch adds a new
function that saves/restores this metadata when updating pte upon a
swap/migration.
Patch 3/11
SPARC M7 processor adds new fields to control registers to support ADI
feature. It also adds a new exception for precise traps on tag
mismatch. This patch adds definitions for the new control register
fields, new ASIs for ADI and an exception handler for the precise trap
on tag mismatch.
Patch 4/11
New hypervisor fault types were added by sparc M7 processor to support
ADI feature. This patch adds code to handle these fault types for data
access exception handler.
Patch 5/11
When ADI is in use for a page and a tag mismatch occurs, processor
raises "Memory corruption Detected" trap. This patch adds a handler
for this trap.
Patch 6/11
ADI usage is governed by ADI properties on a platform. These
properties are provided to kernel by firmware. Thsi patch adds new
auxiliary vectors that provide these values to userpsace.
Patch 7/11
arch_validate_prot() is used to validate the new protection bits asked
for by the userspace app. Validating protection bits may need the
context of address space the bits are being applied to. One such
example is PROT_ADI bit on sparc processor that enables ADI protection
on an address range. ADI protection applies only to addresses covered
by physical RAM and not other PFN mapped addresses or device
addresses. This patch adds "address" to the parameters being passed to
arch_validate_prot() to provide that context.
Patch 8/11
When protection bits are changed on a page, kernel carries forward all
protection bits except for read/write/exec. Additional code was added
to allow kernel to clear PKEY bits on x86 but this requirement to
clear other bits is not unique to x86. This patch extends the existing
code to allow other architectures to clear any other protection bits
as well on protection bit change.
Patch 9/11
When a processor supports additional metadata on memory pages, that
additional metadata needs to be copied to new memory pages when those
pages are moved. This patch allows architecture specific code to
replace the default copy_highpage() routine with arch specific
version that copies the metadata as well besides the data on the page.
Patch 10/11
This patch adds support for a user space task to enable ADI and enable
tag checking for subsets of its address space. As part of enabling
this feature, this patch adds to support manipulation of precise
exception for memory corruption detection, adds code to save and
restore tags on page swap and migration, and adds code to handle ADI
tagged addresses for DMA.
Patch 11/11
Update signal delivery code in arch/sparc/kernel/traps_64.c to use
the new helper function force_sig_fault() added by commit
f8ec66014ffd ("signal: Add send_sig_fault and force_sig_fault").
Changelog v12:
- Rebased to 4.16-rc2
- Added patch 11 to update signal delivery functions
Changelog v11:
- Rebased to 4.15
Changelog v10:
- Patch 1/10: Updated si_codes definitions for SEGV to match 4.14
- Patch 2/10: No changes
- Patch 3/10: Updated copyright
- Patch 4/10: No changes
- Patch 5/10: No changes
- Patch 6/10: Updated copyright
- Patch 7/10: No changes
- Patch 8/10: No changes
- Patch 9/10: No changes
- Patch 10/10: Added code to return from kernel path to set
PSTATE.mcde if kernel continues execution in another thread
(Suggested by Anthony)
Changelog v9:
- Patch 1/10: No changes
- Patch 2/10: No changes
- Patch 3/10: No changes
- Patch 4/10: No changes
- Patch 5/10: No changes
- Patch 6/10: No changes
- Patch 7/10: No changes
- Patch 8/10: No changes
- Patch 9/10: New patch
- Patch 10/10: Patch 9 from v8. Added code to copy ADI tags when
pages are migrated. Updated code to detect overflow and underflow
of addresses when allocating tag storage.
Changelog v8:
- Patch 1/9: No changes
- Patch 2/9: Fixed and erroneous "}"
- Patch 3/9: Minor print formatting change
- Patch 4/9: No changes
- Patch 5/9: No changes
- Patch 6/9: Added AT_ADI_UEONADI back
- Patch 7/9: Added addr parameter to powerpc arch_validate_prot()
- Patch 8/9: No changes
- Patch 9/9:
- Documentation updates
- Added an IPI on mprotect(...PROT_ADI...) call and
restore of TSTATE.MCDE on context switch
- Removed restriction on enabling ADI on read-only
memory
- Changed kzalloc() for tag storage to use GFP_NOWAIT
- Added code to handle overflow and underflow when
allocating tag storage
- Replaced sun_m7_patch_1insn_range() with
sun4v_patch_1insn_range()
- Added membar after restoring ADI tags in
copy_user_highpage()
Changelog v7:
- Patch 1/9: No changes
- Patch 2/9: Updated parameters to arch specific swap in/out
handlers
- Patch 3/9: No changes
- Patch 4/9: new patch split off from patch 4/4 in v6
- Patch 5/9: new patch split off from patch 4/4 in v6
- Patch 6/9: new patch split off from patch 4/4 in v6
- Patch 7/9: new patch
- Patch 8/9: new patch
- Patch 9/9:
- Enhanced arch_validate_prot() to enable ADI only on
writable addresses backed by physical RAM
- Added support for saving/restoring ADI tags for each
ADI block size address range on a page on swap in/out
- copy ADI tags on COW
- Updated values for auxiliary vectors to not conflict
with values on other architectures to avoid conflict
in glibc
- Disable same page merging on ADI enabled pages
- Enable ADI only on writable addresses backed by
physical RAM
- Split parts of patch off into separate patches
Changelog v6:
- Patch 1/4: No changes
- Patch 2/4: No changes
- Patch 3/4: Added missing nop in the delay slot in
sun4v_mcd_detect_precise
- Patch 4/4: Eliminated instructions to read and write PSTATE
as well as MCDPER and PMCDPER on every access to userspace
addresses by setting PSTATE and PMCDPER correctly upon entry
into kernel
Changelog v5:
- Patch 1/4: No changes
- Patch 2/4: Replaced set_swp_pte_at() with new architecture
functions arch_do_swap_page() and arch_unmap_one() that
suppoprt architecture specific actions to be taken on page
swap and migration
- Patch 3/4: Fixed indentation issues in assembly code
- Patch 4/4:
- Fixed indentation issues and instrcuctions in assembly
code
- Removed CONFIG_SPARC64 from mdesc.c
- Changed to maintain state of MCDPER register in thread
info flags as opposed to in mm context. MCDPER is a
per-thread state and belongs in thread info flag as
opposed to mm context which is shared across threads.
Added comments to clarify this is a lazily maintained
state and must be updated on context switch and
copy_process()
- Updated code to use the new arch_do_swap_page() and
arch_unmap_one() functions
Testing:
- All functionality was tested with 8K normal pages as well as hugepages
using malloc, mmap and shm.
- Multiple long duration stress tests were run using hugepages over 2+
months. Normal pages were tested with shorter duration stress tests.
- Tested swapping with malloc and shm by reducing max memory and
allocating three times the available system memory by active processes
using ADI on allocated memory. Ran through multiple hours long runs of
this test.
- Tested page migration with malloc and shm by migrating data pages of
active ADI test process using migratepages, back and forth between two
nodes every few seconds over an hour long run. Verified page migration
through /proc/<pid>/numa_maps.
- Tested COW support using test that forks children that read from
ADI enabled pages shared with parent and other children and write to
them as well forcing COW.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit f8ec66014ffd ("signal: Add send_sig_fault and force_sig_fault")
added new helper functions to streamline signal delivery. This patch
updates signal delivery for new/updated handlers for ADI related
exceptions to use the helper function.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.
This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. ADI is not enabled by
default for any task. A task must explicitly enable ADI on a memory
range and set version tag for ADI to be effective for the task.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some architectures can support metadata for memory pages and when a
page is copied, its metadata must also be copied. Sparc processors
from M7 onwards support metadata for memory pages. This metadata
provides tag based protection for access to memory pages. To maintain
this protection, the tag data must be copied to the new page when a
page is migrated across NUMA nodes. This patch allows arch specific
code to override default copy_highpage() and copy metadata along
with page data upon migration.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When protection bits are changed on a VMA, some of the architecture
specific flags should be cleared as well. An examples of this are the
PKEY flags on x86. This patch expands the current code that clears
PKEY flags for x86, to support similar functionality for other
architectures as well.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A protection flag may not be valid across entire address space and
hence arch_validate_prot() might need the address a protection bit is
being set on to ensure it is a valid protection flag. For example, sparc
processors support memory corruption detection (as part of ADI feature)
flag on memory addresses mapped on to physical RAM but not on PFN mapped
pages or addresses mapped on to devices. This patch adds address to the
parameters being passed to arch_validate_prot() so protection bits can
be validated in the relevant context.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ADI feature on M7 and newer processors has three important properties
relevant to userspace apps using ADI capabilities - (1) Size of block of
memory an ADI version tag applies to, (2) Number of uppermost bits in
virtual address used to encode ADI tag, and (3) The value M7 processor
will force the ADI tags to if it detects uncorrectable error in an ADI
tagged cacheline. Kernel can retrieve these properties for a platform
through machine description provided by the firmware. This patch adds
code to retrieve these properties and report them to userspace through
auxiliary vectors.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
M7 and newer processors add a "Memory corruption Detected" trap with
the addition of ADI feature. This trap is vectored into kernel by HV
through resumable error trap with error attribute for the resumable
error set to 0x00000800.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ADI (Application Data Integrity) feature on M7 and newer processors
adds new fault types for hypervisor - Invalid ASI and MCD disabled.
This patch expands data access exception handler to handle these
faults.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SPARC M7 processor adds new control register fields, ASIs and a new
trap to support the ADI (Application Data Integrity) feature. This
patch adds definitions for these register fields, ASIs and a handler
for the new precise memory corruption detected trap.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a processor supports special metadata for a page, for example ADI
version tags on SPARC M7, this metadata must be saved when the page is
swapped out. The same metadata must be restored when the page is swapped
back in. This patch adds two new architecture specific functions -
arch_do_swap_page() to be called when a page is swapped in, and
arch_unmap_one() to be called when a page is being unmapped for swap
out. These architecture hooks allow page metadata to be saved if the
architecture supports it.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SPARC M7 processor introduces a new feature - Application Data
Integrity (ADI). ADI allows MMU to catch rogue accesses to memory.
When a rogue access occurs, MMU blocks the access and raises an
exception. In response to the exception, kernel sends the offending
task a SIGSEGV with si_code that indicates the nature of exception.
This patch adds three new signal codes specific to ADI feature:
1. ADI is not enabled for the address and task attempted to access
memory using ADI
2. Task attempted to access memory using wrong ADI tag and caused
a deferred exception.
3. Task attempted to access memory using wrong ADI tag and caused
a precise exception.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
batadv_check_unicast_ttvn may redirect a packet to itself or another
originator. This involves rewriting the ttvn and the destination address in
the batadv unicast header. These field were not yet pulled (with skb rcsum
update) and thus any change to them also requires a change in the receive
checksum.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Fixes: a73105b8d4c7 ("batman-adv: improved client announcement mechanism")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
Fixes: a9a08845e9ac ("vfs: do bulk POLL* -> EPOLL* replacement")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
Handle polled interrupts correctly when loading the module.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 294d711ee8c0 ("net: dsa: mv88e6xxx: Poll when no interrupt defined")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stefano Brivio says:
====================
selftests: pmtu: Add further vti/vti6 MTU and PMTU tests
Patches 5/10 to 10/10 add tests to verify default MTU assignment
for vti4 and vti6 interfaces, to check that MTU values set on new
link and link changes are properly taken and validated, and to
verify PMTU exceptions on vti4 interfaces.
Patch 1/10 reverses function return codes as suggested by David
Ahern.
Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
passed namespace.
Patches 3/10 and 4/10 are preparation work to make it easier to
introduce those tests.
v2: Reverse return codes, and make output prettier in 4/9 by
using padded printf, test descriptions and buffered error
strings. Remove accidental output to /dev/kmsg from 10/10
(was 9/9).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test checks that MTU configured from userspace is used on
link creation and changes, and that when it's not passed from
userspace, it's calculated properly from the MTU of the lower
layer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as pmtu_vti4_link_add_mtu test, but for IPv6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test checks that MTU given on vti link creation is actually
configured, and that tunnel is not created with an invalid MTU
value.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test checks that PMTU exceptions are created only when
needed on IPv4 routes with vti and xfrm, and their PMTU value is
checked as well.
We can't adopt the same approach as test_pmtu_vti6_exception()
here, because on IPv4 administrative MTU changes won't be
reflected directly on PMTU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as pmtu_vti4_default_mtu, but on IPv6 with vti6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test checks that the MTU assigned by default to a vti (IPv4)
interface created on top of veth is simply veth's MTU minus the
length of the encapsulated IPv4 header.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce list of tests and their descriptions, and loop on it
in main body.
Tests will now just take care of calling setup with a list of
"units" they need, and return 0 on success, 1 on failure, 2 if
the test had to be skipped.
Main script body will take care of displaying results and
cleaning up after every test. Introduce guard variable so that
we don't clean up twice in case of interrupts or unexpected
failures.
The pmtu_vti6_exception test can now run its third step even if
the previous one failed, as we can return values from it.
Also introduce support to display test descriptions, and display
aligned OK/FAIL/SKIP test outcomes. Buffer error strings so that
in case of failure we can display them right under the outcome
for each test.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
...so that it can be used for any iproute command output.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In 7af137b72131 ("selftests: net: Introduce first PMTU test") I
accidentally assumed route_get_* helpers would run from a single
namespace. Make them a bit more generic, by passing the
namespace command prefix as a parameter instead.
Fixes: 7af137b72131 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David suggests it's more intuitive to return non-zero on
failures, and zero on success.
No need to introduce tail 'return 0' in functions, they will
return the exit code of the last command anyway.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Thomas Falcon says:
====================
ibmvnic: Update TX pool and TX routines
This patch restructures the TX pool data structure and provides a
separate TX pool array for TSO transmissions. This is already used
in some way due to our unique DMA situation, namely that we cannot
use single DMA mappings for packet data. Previously, both buffer
arrays used the same pool entry. This restructuring allows for
some additional cleanup in the driver code, especially in some
places in the device transmit routine.
In addition, it allows us to more easily track the consumer
and producer indexes of a particular pool. This has been
further improved by better tracking of in-use buffers to
prevent possible data corruption in case an invalid buffer
entry is used.
v5: Fix bisectability mistake in the first patch. Removed
TSO-specific data in a later patch when it is no longer used.
v4: Fix error in 7th patch that causes an oops by using
the older fixed value for number of buffers instead
of the respective field in the tx pool data structure
v3: Forgot to update TX pool cleaning function to handle new data
structures. Included 7th patch for that.
v2: Fix typo in 3/6 commit subject line
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Finally, remove the TSO-specific fields in the TX pool
strcutures. These are no longer needed with the introduction
of separate buffer pools for TSO transmissions.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update routine that cleans up any outstanding transmits that
have not received completions when the device needs to close.
Introduces a helper function that cleans one TX pool to make
code more readable.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Improve TX pool buffer accounting to prevent the producer
index from overruning the consumer. First, set the next free
index to an invalid value if it is in use. If next buffer
to be consumed is in use, drop the packet.
Finally, if the transmit fails for some other reason, roll
back the consumer index and set the free map entry to its original
value. This should also be done if the DMA map fails.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update TX and TX completion routines to account for TX pool
restructuring. TX routine first chooses the pool depending
on whether a packet is GSO or not, then uses it accordingly.
For the completion routine to know which pool it needs to use,
set the most significant bit of the correlator index to one
if the packet uses the TSO pool. On completion, unset the bit
and use the correlator index to release the buffer pool entry.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce function that initializes one TX pool. Use that to
create each pool entry in both the standard TX pool and TSO
pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce function that frees one TX pool. Use that to release
each pool in both the standard TX pool and TSO pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update TX pool reset routine to accommodate new TSO pool array. Introduce
a function that resets one TX pool, and use that function to initialize
each pool in both pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove some unused fields in the structure and include values
describing the individual buffer size and number of buffers in
a TX pool. This allows us to use these fields for TX pool buffer
accounting as opposed to using hard coded values. Include a new
pool array for TSO transmissions.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
use proc_remove_subtree() for subtree removal, both on setup failure
halfway through and on teardown. No need to make simple things
complex...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stephen Hemminger says:
====================
hv_netvsc: minor enhancements
A couple of small things for net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds tracepoints to the driver which has proved useful in
debugging startup and shutdown race conditions.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The caller has a valid pointer, pass it to rndis_filter_halt_device
and avoid any possible RCU races here.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'Commit 45dac1d6ea04 ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.
So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.
This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.
Fixes: 45dac1d6ea04 ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")
Reported-by: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit.
There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate.
This race results in xmit hangs when network coalescing is disabled. This
change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.
Reported-by: Noriho Tanaka <ntanaka@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Davide Caratti says:
====================
net/sched: fix NULL dereference in the error path of .init()
with several TC actions it's possible to see NULL pointer dereference,
when the .init() function calls tcf_idr_alloc(), fails at some point and
then calls tcf_idr_release(): this series fixes all them introducing
non-NULL tests in the .cleanup() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the following command
# tc action replace action skbmod swap mac index 100
is run for the first time, and tcf_skbmod_init() fails to allocate struct
tcf_skbmod_params, tcf_skbmod_cleanup() calls kfree_rcu(NULL), thus
causing the following error:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: __call_rcu+0x23/0x2b0
PGD 8000000034057067 P4D 8000000034057067 PUD 74937067 PMD 0
Oops: 0002 [#1] SMP PTI
Modules linked in: act_skbmod(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache jbd2 crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep pcbc snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd glue_helper snd cryptd virtio_balloon joydev soundcore pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_net virtio_blk ata_piix libata crc32c_intel virtio_pci serio_raw virtio_ring virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_skbmod]
CPU: 3 PID: 3144 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__call_rcu+0x23/0x2b0
RSP: 0018:ffffbd2e403e7798 EFLAGS: 00010246
RAX: ffffffffc0872080 RBX: ffff981d34bff780 RCX: 00000000ffffffff
RDX: ffffffff922a5f00 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000021f
R10: 000000003d003000 R11: 0000000000aaaaaa R12: 0000000000000000
R13: ffffffff922a5f00 R14: 0000000000000001 R15: ffff981d3b698c2c
FS: 00007f3678292740(0000) GS:ffff981d3fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000007c57a006 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tcf_skbmod_init+0x1d1/0x210 [act_skbmod]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
? filemap_map_pages+0x34a/0x3a0
? __handle_mm_fault+0xbfd/0xe20
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f36776a3ba0
RSP: 002b:00007fff4703b618 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fff4703b740 RCX: 00007f36776a3ba0
RDX: 0000000000000000 RSI: 00007fff4703b690 RDI: 0000000000000003
RBP: 000000005aaaba36 R08: 0000000000000002 R09: 0000000000000000
R10: 00007fff4703b0a0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff4703b754 R14: 0000000000000001 R15: 0000000000669f60
Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
RIP: __call_rcu+0x23/0x2b0 RSP: ffffbd2e403e7798
CR2: 0000000000000008
Fix it in tcf_skbmod_cleanup(), ensuring that kfree_rcu(p, ...) is called
only when p is not NULL.
Fixes: 86da71b57383 ("net_sched: Introduce skbmod action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the following command
# tc action add action sample rate 100 group 100 index 100
is run for the first time, and psample_group_get(100) fails to create a
new group, tcf_sample_cleanup() calls psample_group_put(NULL), thus
causing the following error:
BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
IP: psample_group_put+0x15/0x71 [psample]
PGD 8000000075775067 P4D 8000000075775067 PUD 7453c067 PMD 0
Oops: 0002 [#1] SMP PTI
Modules linked in: act_sample(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core mbcache jbd2 crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq ghash_clmulni_intel pcbc snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev pcspkr i2c_piix4 soundcore virtio_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net ata_piix virtio_console virtio_blk libata serio_raw crc32c_intel virtio_pci i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_tunnel_key]
CPU: 2 PID: 5740 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:psample_group_put+0x15/0x71 [psample]
RSP: 0018:ffffb8a80032f7d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000024
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffffc06d93c0
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
R10: 00000000bd003000 R11: ffff979fba04aa59 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff979fbba3f22c
FS: 00007f7638112740(0000) GS:ffff979fbfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001c CR3: 00000000734ea001 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tcf_sample_init+0x125/0x1d0 [act_sample]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
? filemap_map_pages+0x34a/0x3a0
? __handle_mm_fault+0xbfd/0xe20
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f7637523ba0
RSP: 002b:00007fff0473ef58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fff0473f080 RCX: 00007f7637523ba0
RDX: 0000000000000000 RSI: 00007fff0473efd0 RDI: 0000000000000003
RBP: 000000005aaaac80 R08: 0000000000000002 R09: 0000000000000000
R10: 00007fff0473e9e0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff0473f094 R14: 0000000000000001 R15: 0000000000669f60
Code: be 02 00 00 00 48 89 df e8 a9 fe ff ff e9 7c ff ff ff 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb 48 c7 c7 c0 93 6d c0 e8 db 20 8c ef <83> 6b 1c 01 74 10 48 c7 c7 c0 93 6d c0 ff 14 25 e8 83 83 b0 5b
RIP: psample_group_put+0x15/0x71 [psample] RSP: ffffb8a80032f7d0
CR2: 000000000000001c
Fix it in tcf_sample_cleanup(), ensuring that calls to psample_group_put(p)
are done only when p is not NULL.
Fixes: cadb9c9fdbc6 ("net/sched: act_sample: Fix error path in init")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the following command
# tc action add action tunnel_key unset index 100
is run for the first time, and tunnel_key_init() fails to allocate struct
tcf_tunnel_key_params, tunnel_key_release() dereferences NULL pointers.
This causes the following error:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: tunnel_key_release+0xd/0x40 [act_tunnel_key]
PGD 8000000033787067 P4D 8000000033787067 PUD 74646067 PMD 0
Oops: 0000 [#1] SMP PTI
Modules linked in: act_tunnel_key(E) act_csum ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep snd_seq aesni_intel snd_seq_device crypto_simd glue_helper snd_pcm cryptd joydev snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk drm virtio_console crc32c_intel ata_piix serio_raw i2c_core virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 3101 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:tunnel_key_release+0xd/0x40 [act_tunnel_key]
RSP: 0018:ffffba46803b7768 EFLAGS: 00010286
RAX: ffffffffc09010a0 RBX: 0000000000000000 RCX: 0000000000000024
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99ee336d7480
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
R10: 0000000000000220 R11: ffff99ee79d73131 R12: 0000000000000000
R13: ffff99ee32d67610 R14: ffff99ee7671dc38 R15: 00000000fffffff4
FS: 00007febcb2cd740(0000) GS:ffff99ee7fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000007c8e4005 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tunnel_key_init+0xd9/0x460 [act_tunnel_key]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7febca6deba0
RSP: 002b:00007ffe7b0dd128 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffe7b0dd250 RCX: 00007febca6deba0
RDX: 0000000000000000 RSI: 00007ffe7b0dd1a0 RDI: 0000000000000003
RBP: 000000005aaa90cb R08: 0000000000000002 R09: 0000000000000000
R10: 00007ffe7b0dcba0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe7b0dd264 R14: 0000000000000001 R15: 0000000000669f60
Code: 44 00 00 8b 0d b5 23 00 00 48 8b 87 48 10 00 00 48 8b 3c c8 e9 a5 e5 d8 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 9f b0 00 00 00 <83> 7b 10 01 74 0b 48 89 df 31 f6 5b e9 f2 fa 7f c3 48 8b 7b 18
RIP: tunnel_key_release+0xd/0x40 [act_tunnel_key] RSP: ffffba46803b7768
CR2: 0000000000000010
Fix this in tunnel_key_release(), ensuring 'param' is not NULL before
dereferencing it.
Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the following command
# tc action add action csum udp continue index 100
is run for the first time, and tcf_csum_init() fails allocating struct
tcf_csum, tcf_csum_cleanup() calls kfree_rcu(NULL,...). This causes the
following error:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: __call_rcu+0x23/0x2b0
PGD 80000000740b4067 P4D 80000000740b4067 PUD 32e7f067 PMD 0
Oops: 0002 [#1] SMP PTI
Modules linked in: act_csum(E) act_vlan ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd glue_helper cryptd snd joydev pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel libata virtio_pci serio_raw i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
CPU: 2 PID: 5763 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__call_rcu+0x23/0x2b0
RSP: 0018:ffffb275803e77c0 EFLAGS: 00010246
RAX: ffffffffc057b080 RBX: ffff9674bc6f5240 RCX: 00000000ffffffff
RDX: ffffffff928a5f00 RSI: 0000000000000008 RDI: 0000000000000008
RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000044
R10: 0000000000000220 R11: ffff9674b9ab4821 R12: 0000000000000000
R13: ffffffff928a5f00 R14: 0000000000000000 R15: 0000000000000001
FS: 00007fa6368d8740(0000) GS:ffff9674bfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000073dec001 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tcf_csum_init+0xfb/0x180 [act_csum]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
? filemap_map_pages+0x34a/0x3a0
? __handle_mm_fault+0xbfd/0xe20
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fa635ce9ba0
RSP: 002b:00007ffc185b0fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffc185b10f0 RCX: 00007fa635ce9ba0
RDX: 0000000000000000 RSI: 00007ffc185b1040 RDI: 0000000000000003
RBP: 000000005aaa85e0 R08: 0000000000000002 R09: 0000000000000000
R10: 00007ffc185b0a20 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc185b1104 R14: 0000000000000001 R15: 0000000000669f60
Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
RIP: __call_rcu+0x23/0x2b0 RSP: ffffb275803e77c0
CR2: 0000000000000010
fix this in tcf_csum_cleanup(), ensuring that kfree_rcu(param, ...) is
called only when param is not NULL.
Fixes: 9c5f69bbd75a ("net/sched: act_csum: don't use spinlock in the fast path")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the following command
# tc actions replace action vlan pop index 100
is run for the first time, and tcf_vlan_init() fails allocating struct
tcf_vlan_params, tcf_vlan_cleanup() calls kfree_rcu(NULL, ...). This causes
the following error:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: __call_rcu+0x23/0x2b0
PGD 80000000760a2067 P4D 80000000760a2067 PUD 742c1067 PMD 0
Oops: 0002 [#1] SMP PTI
Modules linked in: act_vlan(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel mbcache snd_hda_codec jbd2 snd_hda_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev soundcore virtio_balloon pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_blk virtio_net ata_piix crc32c_intel libata virtio_pci i2c_core virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
CPU: 3 PID: 3119 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__call_rcu+0x23/0x2b0
RSP: 0018:ffffaac3005fb798 EFLAGS: 00010246
RAX: ffffffffc0704080 RBX: ffff97f2b4bbe900 RCX: 00000000ffffffff
RDX: ffffffffabca5f00 RSI: 0000000000000010 RDI: 0000000000000010
RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000044
R10: 00000000fd003000 R11: ffff97f2faab5b91 R12: 0000000000000000
R13: ffffffffabca5f00 R14: ffff97f2fb80202c R15: 00000000fffffff4
FS: 00007f68f75b4740(0000) GS:ffff97f2ffd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000072b52001 CR4: 00000000001606e0
Call Trace:
__tcf_idr_release+0x79/0xf0
tcf_vlan_init+0x168/0x270 [act_vlan]
tcf_action_init_1+0x2cc/0x430
tcf_action_init+0xd3/0x1b0
tc_ctl_action+0x18b/0x240
rtnetlink_rcv_msg+0x29c/0x310
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1b9/0x270
? rtnl_calcit.isra.28+0x100/0x100
netlink_rcv_skb+0xd2/0x110
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2cd/0x3c0
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x27a/0x290
? filemap_map_pages+0x34a/0x3a0
? __handle_mm_fault+0xbfd/0xe20
__sys_sendmsg+0x51/0x90
do_syscall_64+0x6e/0x1a0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f68f69c5ba0
RSP: 002b:00007fffd79c1118 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fffd79c1240 RCX: 00007f68f69c5ba0
RDX: 0000000000000000 RSI: 00007fffd79c1190 RDI: 0000000000000003
RBP: 000000005aaa708e R08: 0000000000000002 R09: 0000000000000000
R10: 00007fffd79c0ba0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffd79c1254 R14: 0000000000000001 R15: 0000000000669f60
Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
RIP: __call_rcu+0x23/0x2b0 RSP: ffffaac3005fb798
CR2: 0000000000000018
fix this in tcf_vlan_cleanup(), ensuring that kfree_rcu(p, ...) is called
only when p is not NULL.
Fixes: 4c5b9d9642c8 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In VLAN_AWARE mode CPSW can insert VLAN header encapsulation word on Host
port 0 egress (RX) before the packet data if RX_VLAN_ENCAP bit is set in
CPSW_CONTROL register. VLAN header encapsulation word has following format:
HDR_PKT_Priority bits 29-31 - Header Packet VLAN prio (Highest prio: 7)
HDR_PKT_CFI bits 28 - Header Packet VLAN CFI bit.
HDR_PKT_Vid bits 27-16 - Header Packet VLAN ID
PKT_Type bits 8-9 - Packet Type. Indicates whether the packet is
VLAN-tagged, priority-tagged, or non-tagged.
00: VLAN-tagged packet
01: Reserved
10: Priority-tagged packet
11: Non-tagged packet
This feature can be used to implement TX VLAN offload in case of
VLAN-tagged packets and to insert VLAN tag in case Non-tagged packet was
received on port with PVID set. As per documentation, CPSW never modifies
packet data on Host egress (RX) and as result, without this feature
enabled, Host port will not be able to receive properly packets which
entered switch non-tagged through external Port with PVID set (when
non-tagged packet forwarded from external Port with PVID set to another
external Port - packet will be VLAN tagged properly).
Implementation details:
- on RX driver will check CPDMA status bit RX_VLAN_ENCAP BIT(19) in CPPI
descriptor to identify when VLAN header encapsulation word is present.
- PKT_Type = 0x01 or 0x02 then ignore VLAN header encapsulation word and
pass packet as is;
- if HDR_PKT_Vid = 0 then ignore VLAN header encapsulation word and pass
packet as is;
- In dual mac mode traffic is separated between ports using default port
vlans, which are not be visible to Host and so should not be reported.
Hence, check for default port vlans in dual mac mode and ignore VLAN header
encapsulation word;
- otherwise fill SKB with VLAN info using __vlan_hwaccel_put_tag();
- PKT_Type = 0x00 (VLAN-tagged) then strip out VLAN header from SKB.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|