Age | Commit message (Collapse) | Author |
|
Broadcom's Wirespeed feature allows us to configure how auto-negotiation
should behave with fewer working pairs of wires on a cable. Add support
code for retrieving and setting such downshift counters using the
recently added ethtool downshift tunables.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to need these functions to implement support for Broadcom
Wirespeed, aka downshift.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
type for HID compass sensor
There are 2 usage types (Magnetic Flux and Heading data field) for HID
compass sensor, thus the values of offset, scale, and sensitivity should
be separated according to their respective usage type. The changes made
are as below:
1. Hysteresis: A struct hid_sensor_common rot_attributes is created in
struct magn_3d_state to contain the sensitivity for IIO_ROT.
2. Scale: scale_pre_decml and scale_post_decml are separated for IIO_MAGN
and IIO_ROT.
3. Offset: Same as scale, value_offset is separated for IIO_MAGN and
IIO_ROT.
For sensitivity, HID_USAGE_SENSOR_ORIENT_MAGN_FLUX and
HID_USAGE_SENSOR_ORIENT_MAGN_HEADING are used for sensivitity fields based
on the HID Sensor Usages specifications. Hence, these changes are added on
the sensitivity field.
Signed-off-by: Ooi, Joyce <joyce.ooi@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
In commit 2331ccc5b323 ("tcp: enhance tcp collapsing"),
we made a first step allowing copying right skb to left skb head.
Since all skbs in socket write queue are headless (but possibly the very
first one), this strategy often does not work.
This patch extends tcp_collapse_retrans() to perform frag shifting,
thanks to skb_shift() helper.
This helper needs to not BUG on non headless skbs, as callers are ok
with that.
Tested:
Following packetdrill test now passes :
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.100 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
+0 write(4, ..., 200) = 200
+0 > P. 1:201(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 201:401(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 401:601(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 601:801(200) ack 1
+.001 write(4, ..., 200) = 200
+0 > P. 801:1001(200) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1001:1101(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1101:1201(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1201:1301(100) ack 1
+.001 write(4, ..., 100) = 100
+0 > P. 1301:1401(100) ack 1
+.099 < . 1:1(0) ack 201 win 257
+.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401>
+0 > P. 201:1001(800) ack 1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When busy polling while a link is down (during a link-flap test), TX
timeouts were observed as well as the following messages in the ring
buffer:
bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1
bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1
These were resolved by checking for link status and returning if link
was not up.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Rob Miller <rob.miller@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commits 93821778def10 ("udp: Fix rcv socket locking") and
f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.
This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")
Bug found by syzkaller team, thanks a lot guys !
Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9
Fixes: 93821778def1 ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This array is supposed to have 10 elements. Smatch complains that with
the current code we can have n == max_ints and read beyond the end of
the array.
Fixes: ac4f6eee8fe8 ("staging: iio: TAOS tsl258x: Device driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Brian Masney <masneyb@onstation.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
|
|
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Use the acpi cppc_lib interface to get CPPC performance limits and update
the per cpu priority for the ITMT scheduler. If the highest performance of
CPUs differs the ITMT feature is enabled.
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.
This is required to enable the BIOS support of the Intel Turbo Boost Max
Technology 3.0 feature.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a023623a727e86040a1715797055f6402caefd7e.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.
The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a696f6b17843cee9a542482fae6abab087be9587.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.
If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.
Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.
By default, system that is ITMT capable and single
socket has this feature turned on. It is more likely
to be lightly loaded and operates in Turbo range.
When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package. In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.
To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency. In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach. At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.
The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function. The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.
So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.
Request the rebuild when the x86 internal update flag is set.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-of-esdhc: Fix card detection
- dw_mmc: Fix DMA error path"
* tag 'mmc-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: dw_mmc: fix the error handling for dma operation
mmc: sdhci-of-esdhc: fixup PRESENT_STATE read
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB fixes and new device ids for 4.9-rc7.
The majority of these fixes are in the musb driver, fixing a number of
regressions that have been reported but took a while to resolve. The
other fixes are all small ones, to resolve other reported minor
issues.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: fix wrong parenthesis in ffs_func_req_match()
phy: twl4030-usb: Fix for musb session bit based PM
usb: musb: Drop pointless PM runtime code for dsps glue
usb: musb: Add missing pm_runtime_disable and drop 2430 PM timeout
usb: musb: Fix PM for hub disconnect
usb: musb: Fix sleeping function called from invalid context for hdrc glue
usb: musb: Fix broken use of static variable for multiple instances
USB: serial: cp210x: add ID for the Zone DPMX
usb: chipidea: move the lock initialization to core file
Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- DMA-on-stack fixes for a couple drivers, from Benjamin Tissoires
- small memory sanitization fix for sensor-hub driver, from Song
Hongyan
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-sensor-hub: clear memory to avoid random data
HID: rmi: make transfer buffers DMA capable
HID: magicmouse: make transfer buffers DMA capable
HID: lg: make transfer buffers DMA capable
HID: cp2112: make transfer buffers DMA capable
|
|
Split irqchip allows pic and ioapic routes to be used without them being
created, which results in NULL access. Check for NULL and avoid it.
(The setup is too racy for a nicer solutions.)
Found by syzkaller:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events irqfd_inject
task: ffff88006a06c7c0 task.stack: ffff880068638000
RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
RSP: 0000:ffff88006863ea20 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
Stack:
ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
Call Trace:
[...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
[...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
[...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
[...] spin_lock include/linux/spinlock.h:302
[...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
[...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
[...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
[...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
[...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
[...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
[...] kthread+0x328/0x3e0 kernel/kthread.c:209
[...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
bigger that the maximal number of VCPUs, resulting in out-of-bounds
access.
Found by syzkaller:
BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
Write of size 1 by task a.out/27101
CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
[...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
[...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
[...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
[...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
[...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
[...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
[...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
[...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
[...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: af1bae5497b9 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.
Found by syzkaller:
WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] panic+0x1b7/0x3a3 kernel/panic.c:179
[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
[...] complete_emulated_io arch/x86/kvm/x86.c:6870
[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
[...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff.
With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a
userspace can send an interrupt with dest_id that results in
out-of-bounds access.
Found by syzkaller:
BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750
Read of size 8 by task syz-executor/22923
CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
[...] print_address_description mm/kasan/report.c:194
[...] kasan_report_error mm/kasan/report.c:283
[...] kasan_report+0x231/0x500 mm/kasan/report.c:303
[...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329
[...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824
[...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72
[...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157
[...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74
[...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Update the I/O interception support to add the kvm_fast_pio_in function
to speed up the in instruction similar to the out instruction.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
AMD hardware adds two additional bits to aid in nested page fault handling.
Bit 32 - NPF occurred while translating the guest's final physical address
Bit 33 - NPF occurred while translating the guest page tables
The guest page tables fault indicator can be used as an aid for nested
virtualization. Using V0 for the host, V1 for the first level guest and
V2 for the second level guest, when both V1 and V2 are using nested paging
there are currently a number of unnecessary instruction emulations. When
V2 is launched shadow paging is used in V1 for the nested tables of V2. As
a result, KVM marks these pages as RO in the host nested page tables. When
V2 exits and we resume V1, these pages are still marked RO.
Every nested walk for a guest page table is treated as a user-level write
access and this causes a lot of NPFs because the V1 page tables are marked
RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
sees a write to a read-only page, emulates the V1 instruction and unprotects
the page (marking it RW). This patch looks for cases where we get a NPF due
to a guest page table walk where the page was marked RO. It immediately
unprotects the page and resumes the guest, leading to far fewer instruction
emulations when nested virtualization is used.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Otherwise each individual rotator char would be printed in a new line:
(...)
[ 0.642350] -
[ 0.644374] |
[ 0.646367] -
(...)
Signed-off-by: Nicolas Schichan <nicolas.schichan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.
In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.
Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.
This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.
Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make it possible to generate trace events for mdio read and write accesses.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VMware VMCI transport supports loopback inside virtual machines.
This patch implements loopback for virtio-vsock.
Flow control is handled by the virtio-vsock protocol as usual. The
sending process stops transmitting on a connection when the peer's
receive buffer space is exhausted.
Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
virtio-vsock when a test case using loopback failed. Although loopback
isn't the main point of AF_VSOCK, it is useful for testing and
virtio-vsock must match VMCI semantics so that userspace programs run
regardless of the underlying transport.
My understanding is that loopback is not supported on the host side with
VMCI. Follow that by implementing it only in the guest driver, not the
vhost host driver.
Cc: Jorgen Hansen <jhansen@vmware.com>
Reported-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since MIPSr6 the Wired register is split into 2 fields, with the upper
16 bits of the register indicating a limit on the value that the wired
entry count in the bottom 16 bits of the register can take. This means
that simply reading the wired register doesn't get us a valid TLB entry
index any longer, and we instead need to retrieve only the lower 16 bits
of the register. Introduce a new num_wired_entries() function which does
this on MIPSr6 or higher and simply returns the value of the wired
register on older architecture revisions, and make use of it when
reading the number of wired entries.
Since commit e710d6668309 ("MIPS: tlb-r4k: If there are wired entries,
don't use TLBINVF") we have been using a non-zero number of wired
entries to determine whether we should avoid use of the tlbinvf
instruction (which would invalidate wired entries) and instead loop over
TLB entries in local_flush_tlb_all(). This loop begins with the number
of wired entries, or before this patch some large bogus TLB index on
MIPSr6 systems. Thus since the aforementioned commit some MIPSr6 systems
with FTLBs have been prone to leaving stale address translations in the
FTLB & crashing in various weird & wonderful ways when we later observe
the wrong memory.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14557/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
Use builtin_platform_driver() helper to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The GPIO input status was read from control register
(AXP20X_GPIO[210]_CTRL) instead of status register (AXP20X_GPIO20_SS).
Signed-off-by: Quentin Schulz <quentin.schulz@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
commit 43db289d00c6 ("gpio: stmpe: Rework registers access")
reworked the STMPE register access so as to use
[STMPE_IDX_*_LSB + i] to access the 8bit register for a
certain bank, assuming the CSB and MSB will follow after
the enumerator. For this to work the index needs to go from
(size-1) to 0 not 0 to (size-1).
However for the GPIO IRQ handler, the status registers we read
register MSB + 3 bytes ahead for the 24 bit GPIOs and index
registers from MSB upwards and run an index i over the
registers UNLESS we are STMPE1600.
This is not working when we get to clearing the interrupt
EDGE status register STMPE_IDX_GPEDR_[LCM]SB: it is indexed
like all other registers [STMPE_IDX_*_LSB + i] but in this
loop we index from 0 to get the right bank index for the
calculations, and we need to just add i to the MSB.
Before this, interrupts on the STMPE2401 were broken, this
patch fixes it so it works again.
Cc: stable@vger.kernel.org
Cc: Patrice Chotard <patrice.chotard@st.com>
Fixes: 43db289d00c6 ("gpio: stmpe: Rework registers access")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The GPIO_EM is part of the Renesas SoCs so depend on the arch.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
[Changed to depend on ARCH_EMEV2]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
When loading the TX fifo to receive bytes on the I2C bus, we incorrectly
count the number of bytes:
rx_limit = dev->rx_fifo_depth - dw_readl(dev, DW_IC_RXFLR);
while (buf_len > 0 && tx_limit > 0 && rx_limit > 0) {
if (rx_limit - dev->rx_outstanding <= 0)
break;
rx_limit--;
dev->rx_outstanding++;
}
DW_IC_RXFLR indicates how many bytes are available to be read in the
FIFO, dev->rx_fifo_depth is the FIFO size, and dev->rx_outstanding is
the number of bytes that we've requested to be read so far, but which
have not been read.
Firstly, increasing dev->rx_outstanding and decreasing rx_limit and then
comparing them results in each byte consuming "two" bytes in this
tracking, so this is obviously wrong.
Secondly, the number of bytes that _could_ be received into the FIFO at
any time is the number of bytes we have so far requested but not yet
read from the FIFO - in other words dev->rx_outstanding.
So, in order to request enough bytes to fill the RX FIFO, we need to
request dev->rx_fifo_depth - dev->rx_outstanding bytes.
Modifying the code thusly results in us reaching the maximum number of
bytes outstanding each time we queue more "receive" operations, provided
the transfer allows that to happen.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Rather than reporting success for a short transfer due to interrupt
latency, report an error both to the caller, as well as to the kernel
log.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Some class subsystems are open-coding CLASS_ATTR_WO because the driver
core never provided it. Add the macro to device.h so that we can go
around and fix up the individual subsystems as needed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If the chip does not have an oscio pin, all pins are configured in
the same regmap register making it trivial to update all pins at
once, so do that. If an oscio pin is present, there needs to be
more locking in place to handle all cases correctly, so this is
skipped.
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Stas Nichiporovich reports oops in nf_nat_bysource_cmp(), trying to
access nf_conn struct at address 0xffffffffffffff50.
This is the result of fetching a null rhash list (struct embedded at
offset 176; 0 - 176 gets us ...fff50).
The problem is that conntrack entries are allocated from a
SLAB_DESTROY_BY_RCU cache, i.e. entries can be free'd and reused
on another cpu while nf nat bysource hash access the same conntrack entry.
Freeing is fine (we hold rcu read lock); zeroing rhlist_head isn't.
-> Move the rhlist struct outside of the memset()-inited area.
Fixes: 7c9664351980aaa6a ("netfilter: move nat hlist_head to nf_conn")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Otherwise, kernel panic will happen if the user does not specify
the related attributes.
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
while set->timeout was stored in milliseconds. This is inconsistent and
incorrect.
Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
priv->timeout will be converted to jiffies twice.
Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
set->timeout will be used, but we forget to call msecs_to_jiffies
when do update elements.
Fix this by using jiffies internally for traditional sets and doing the
conversions to/from msec when interacting with userspace - as dynset
already does.
This is preferable to doing the conversions, when elements are inserted or
updated, because this can happen very frequently on busy dynsets.
Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Reported-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Acked-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
I got offlist bug report about failing connections and high cpu usage.
This happens because we hit 'elasticity' checks in rhashtable that
refuses bucket list exceeding 16 entries.
The nat bysrc hash unfortunately needs to insert distinct objects that
share same key and are identical (have same source tuple), this cannot
be avoided.
Switch to the rhlist interface which is designed for this.
The nulls_base is removed here, I don't think its needed:
A (unlikely) false positive results in unneeded port clash resolution,
a false negative results in packet drop during conntrack confirmation,
when we try to insert the duplicate into main conntrack hash table.
Tested by adding multiple ip addresses to host, then adding
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
... and then creating multiple connections, from same source port but
different addresses:
for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done
(all of these then get hashed to same bysource slot)
Then, to test that nat conflict resultion is working:
nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
nc -s 10.0.0.2 -p 1234 192.168.7.1 2000
tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
[..]
-> nat altered source ports to 1024 and 1025, respectively.
This can also be confirmed on destination host which shows
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The comparator works like memcmp, i.e. 0 means objects are equal.
In other words, when objects are distinct they are treated as identical,
when they are distinct they are allegedly the same.
The first case is rare (distinct objects are unlikely to get hashed to
same bucket).
The second case results in unneeded port conflict resolutions attempts.
Fixes: 870190a9ec907 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use the function nft_parse_u32_check() to fetch the value and validate
the u32 attribute into the hash len u8 field.
This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").
Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When we inject a level triggerered interrupt (and unless it
is backed by the physical distributor - timer style), we request
a maintenance interrupt. Part of the processing for that interrupt
is to feed to the rest of KVM (and to the eventfd subsystem) the
information that the interrupt has been EOIed.
But that notification only makes sense for SPIs, and not PPIs
(such as the PMU interrupt). Skip over the notification if
the interrupt is not an SPI.
Cc: stable@vger.kernel.org # 4.7+
Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number. This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).
We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-pm@vger.kernel.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Since kernel 4.7 this defaults to off.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_send_reset6 is not considering the L3 domain and lookups are sent
to the wrong table. For example consider the following output rule:
ip6tables -A OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset
using perf to analyze lookups via the fib6_table_lookup tracepoint shows:
swapper 0 [001] 248.787816: fib6:fib6_table_lookup: table 255 oif 0 iif 1 src 2100:1::3 dst 2100:1:
ffffffff81439cdc perf_trace_fib6_table_lookup ([kernel.kallsyms])
ffffffff814c1ce3 trace_fib6_table_lookup ([kernel.kallsyms])
ffffffff814c3e89 ip6_pol_route ([kernel.kallsyms])
ffffffff814c40d5 ip6_pol_route_output ([kernel.kallsyms])
ffffffff814e7b6f fib6_rule_action ([kernel.kallsyms])
ffffffff81437f60 fib_rules_lookup ([kernel.kallsyms])
ffffffff814e7c79 fib6_rule_lookup ([kernel.kallsyms])
ffffffff814c2541 ip6_route_output_flags ([kernel.kallsyms])
528 nf_send_reset6 ([nf_reject_ipv6])
The lookup is directed to table 255 rather than the table associated with
the device via the L3 domain. Update nf_send_reset6 to pull the L3 domain
from the dst currently attached to the skb.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ip_route_me_harder is not considering the L3 domain and sending lookups
to the wrong table. For example consider the following output rule:
iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset
using perf to analyze lookups via the fib_table_lookup tracepoint shows:
vrf-test 1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0
ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms])
ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms])
ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms])
and
vrf-test 1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags
ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
ffffffff814998ff fib4_rule_action ([kernel.kallsyms])
ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms])
ffffffff81499758 __fib_lookup ([kernel.kallsyms])
ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms])
ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms])
ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms])
ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms])
In both cases the lookups are directed to table 255 rather than the
table associated with the device via the L3 domain. Update both
lookups to pull the L3 domain from the dst currently attached to the
skb.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|