Age | Commit message (Collapse) | Author |
|
In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.
The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.
We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.
Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently there is a race between the health care work and the kernel
pci error handlers because both of them detect the error, the first one
to be called will do the error handling.
There is a chance that health care will disable the pci after resuming
pci slot.
Also create a separate WQ because now we will have two types of health
works, one for the error detection and one for the recovery.
Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The health sick status should be cleared when we start the health poll.
This is crucial for driver reload (unload + load) in order to behave
right in case of health issue.
Fixes: fd76ee4da55a ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Ingress/Egress ACL enable function may fail and it should return
status to its caller to avoid NULL pointer dereference.
Fixes: f942380c1239 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Detaching the netdev before unregistering it cause some netdev cleanup
ndos to fail because they check presence of the netdev, so we need to
unregister the netdev first.
Fixes: 26e59d8077a3 ('net/mlx5e: Implement mlx5e interface attach/detach callbacks')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of predicting the index of the wanted LRO timeout value from
hardware capabilities, look for the nearest LRO timeout value.
Fixes: 5c50368f3831 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, last use timestamp is initialized to zero.
This is not the expected value by higher layers such as
when we do TC action offloading. To fix that, set it to
the current time, e.g when the counter/rule is offloaded.
This is the same behaviour of non-offloaded TC actions.
Fixes: 43a335e055bb ('mlx5_core: Flow counters infrastructure')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Autogroups groups num is increased when creating a new flow group,
but is never decreased.
Now decreasing it when deleting a flow group.
Fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Finding a new autogroup range is done by going over a group list
sorted by each group start index. The search is stopped after finding
the first free range. Adding the newly created group to the list is
wrongly added to the end of the list regardless of its start index as
the parameter of where to insert it is ignored.
This commit makes sure to use that unused parameter to insert
it where requested.
Fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Always query the HCA caps after setting them to update the capablities
data structures. Not doing so results in incorrect capabilities being
reported including max_dc, max_qp and several others.
Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware
and software flows")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ARM 64B cache line systems have L1_CACHE_BYTES set to 128.
cache_line_size() will return the correct size.
Fixes: cf50b5efa2fe('net/mlx5_core/ib: New device capabilities
handling.')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.
The fix is to just move the check upwards so it's also validated for the
1st chunk.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In practice, none of the i915 developers Cc dri-devel for strictly i915
specific patches. Make MAINTAINERS reflect reality, and reduce random
i915 specific noise on dri-devel.
Also, we have a fairly large crowd reading and responding on intel-gfx,
and we're pretty good at involving dri-devel when that is appropriate.
Cc: dri-devel@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477498292-9808-1-git-send-email-jani.nikula@intel.com
|
|
The recent changes, which forced the registration of the boot cpu on UP
systems, which do not have ACPI tables, have been fixed for systems w/o
local APIC, but left a wreckage for systems which have neither ACPI nor
mptables, but the CPU has an APIC, e.g. virtualbox.
The boot process crashes in prefill_possible_map() as it wants to register
the boot cpu, which needs to access the local apic, but the local APIC is
not yet mapped.
There is no reason why init_apic_mapping() can't be invoked before
prefill_possible_map(). So instead of playing another silly early mapping
game, as the ACPI/mptables code does, we just move init_apic_mapping()
before the call to prefill_possible_map().
In hindsight, I should have noticed that combination earlier.
Sorry for the churn (also in stable)!
Fixes: ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com>
Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at>
Cc: prarit@redhat.com
Cc: ville.syrjala@linux.intel.com
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: frank.mehnert@oracle.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
One of the CI machines began to run into issues with the hpd poller
suddenly waking up in the midst of the late suspend phase. It looks like
this is getting caused by the fact we now deinitialize power wells in
late suspend, which means that intel_hpd_poll_init() gets called in late
suspend causing polling to get re-enabled. So, when deinitializing power
wells on valleyview we now refrain from enabling polling in the midst of
suspend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98040
Fixes: 19625e85c6ec ("drm/i915: Enable polling when we don't have hpd")
Signed-off-by: Lyude <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Saarinen <jani.saarinen@intel.com>
Cc: Petry Latvala <petri.latvala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477499769-1966-1-git-send-email-lyude@redhat.com
|
|
mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.
1 - means that the array is trying to start a resync, but has yielded
to another array which shares physical devices, and also needs to
start a resync
2 - means the array is trying to start resync, but has found another
array which shares physical devices and has already started resync.
3 - means that resync has commensed, but it is possible that nothing
has actually been resynced yet.
It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint. In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.
There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.
Change them to only propagate the value if it is > 3.
As this can cause an array to fail, the patch is suitable for -stable.
Cc: stable@vger.kernel.org (v3.7+)
Reported-by: Viswesh <viswesh.vichu@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
|
|
If write is the first operation on a disk and it happens not to be
aligned to page size, block layer sends read request first. If read
operation fails, the disk is set as failed as no attempt to fix the
error is made because array is in auto-readonly mode. Similarily, the
disk is set as failed for read-only array.
Take the same approach as in raid10. Don't fail the disk if array is in
readonly or auto-readonly mode. Try to redirect the request first and if
unsuccessful, return a read error.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
|
|
As long as we recover one metadata block, we should write the empty metadata
write. The original code could make recovery corrupted if only one meta is
valid.
Reported-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Shaohua Li <shli@fb.com>
|
|
From Boris:
"""
Three simple fixes:
- the first one is fixing a non-critical bug in the gpmi driver
- the second one is fixing a bug in the 'automatic NAND timings
selection' feature introduced in 4.9-rc1
- the last one is fixing a false positive uninitialized-var warning
"""
Acked-by: Marek Vasut <marex@denx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix recent ACPICA regressions, an older PCI IRQ management
regression, and an incorrect return value of a function in the APEI
code.
Specifics:
- Fix three ACPICA issues related to the interpreter locking and
introduced by recent changes in that area (Lv Zheng).
- Fix a PCI IRQ management regression introduced during the 4.7 cycle
and related to the configuration of shared IRQs on systems with an
ISA bus (Sinan Kaya).
- Fix up a return value of one function in the APEI code (Punit
Agrawal)"
* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
ACPICA: Dispatcher: Fix order issue of method termination
ACPI / APEI: Fix incorrect return value of ghes_proc()
ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
ACPI/PCI: pci_link: penalize SCI correctly
ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two intel_pstate issues related to the way it works when the
scaling_governor sysfs attribute is set to "performance" and fix up
messages in the system suspend core code.
Specifics:
- Fix a missing KERN_CONT in a system suspend message by converting
the affected code to using pr_info() and pr_cont() instead of the
"raw" printk() (Jon Hunter).
- Make intel_pstate set the CPU P-state from its .set_policy()
callback when the scaling_governor sysfs attribute is set to
"performance" so that it interacts with NOHZ_FULL more predictably
which was the case before 4.7 (Rafael Wysocki).
- Make intel_pstate always request the maximum allowed P-state when
the scaling_governor sysfs attribute is set to "performance" to
prevent it from effectively ingoring that setting is some
situations (Rafael Wysocki)"
* tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Always set max P-state in performance mode
PM / suspend: Fix missing KERN_CONT for suspend message
cpufreq: intel_pstate: Set P-state upfront in performance mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- support IDU intc for UP builds
- support gz, lzma compressed uImage [Daniel Mentz]
- adjust /proc/cpuinfo for non-continuous cpu ids [Noam Camus]
- syscall for userspace cmpxchg assist for configs lacking hardware atomics
- rework of boot log printing mainly for identifying older arc700 cores
- retiring some old code, build toggles
* tag 'arc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: module: print pretty section names
ARC: module: elide loop to save reference to .eh_frame
ARC: mm: retire ARC_DBG_TLB_MISS_COUNT...
ARC: build: retire old toggles
ARC: boot log: refactor cpu name/release printing
ARC: boot log: remove awkward space comma from MMU line
ARC: boot log: don't assume SWAPE instruction support
ARC: boot log: refactor printing abt features not captured in BCRs
ARCv2: boot log: print IOC exists as well as enabled status
ARCv2: IOC: use @ioc_enable not @ioc_exist where intended
ARC: syscall for userspace cmpxchg assist
ARC: fix build warning in elf.h
ARC: Adjust cpuinfo for non-continuous cpu ids
ARC: [build] Support gz, lzma compressed uImage
ARCv2: intc: untangle SMP, MCIP and IDU
|
|
* acpica-fixes:
ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
ACPICA: Dispatcher: Fix order issue of method termination
* acpi-pci-fixes:
ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
ACPI/PCI: pci_link: penalize SCI correctly
ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
* acpi-apei-fixes:
ACPI / APEI: Fix incorrect return value of ghes_proc()
|
|
In the code path of acpi_ev_initialize_region(), there is namespace
modification code unlocked. This patch tunes the code to make sure
such modification are always locked.
Fixes: 74f51b80a0c4 (ACPICA: Namespace: Fix dynamic table loading issues)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_ds_auto_serialize_method()
There is a lock unbalanced exit path in acpi_ds_initialize_method(),
this patch corrects it.
Fixes: 441ad11d078f (ACPICA: Dispatcher: Fix a mutex issue for method auto serialization)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The last step of the method termination should be the end of the method
serialization. Otherwise, the steps happening after it will face the race
issues that cannot be protected by the method serialization mechanism.
This patch fixes this issue by moving the per-method-object deletion code
prior than the end of the method serialization. Otherwise, the possible
race issues may result in AE_ALREADY_EXISTS error in a parallel
environment.
Fixes: 74f51b80a0c4 (ACPICA: Namespace: Fix dynamic table loading issues)
Reported-and-tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Convert cmp to cmpd in idle enter sequence (Segher Boessenkool)
- cxl: Fix leaking pid refs in some error paths (Vaibhav Jain)
- Re-fix race condition between going idle and entering guest (Paul Mackerras)
- Fix race condition in setting lock bit in idle/wakeup code (Paul Mackerras)
- radix: Use tlbiel only if we ever ran on the current cpu (Aneesh Kumar K.V)
- relocation, register save fixes for system reset interrupt (Nicholas Piggin)
Fixes for code merged this cycle:
- Fix CONFIG_ALIVEC typo in restore_tm_state() (Valentin Rothberg)
- KVM: PPC: Book3S HV: Fix build error when SMP=n (Michael Ellerman)"
* tag 'powerpc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: relocation, register save fixes for system reset interrupt
powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu
powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()
powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
powerpc/64: Re-fix race condition between going idle and entering guest
cxl: Fix leaking pid refs in some error paths
powerpc: Convert cmp to cmpd in idle enter sequence
KVM: PPC: Book3S HV: Fix build error when SMP=n
|
|
* pm-cpufreq-fixes:
cpufreq: intel_pstate: Always set max P-state in performance mode
cpufreq: intel_pstate: Set P-state upfront in performance mode
* pm-sleep-fixes:
PM / suspend: Fix missing KERN_CONT for suspend message
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc kernel fixes: a virtualization environment related fix, an uncore
PMU driver removal handling fix, a PowerPC fix and new events for
Knights Landing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf/powerpc: Don't call perf_event_disable() from atomic context
perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
perf/x86/intel/cstate: Add C-state residency events for Knights Landing
|
|
We've been seeing some crashes in testing that look like this:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
PGD 212ca2067 PUD 212ca3067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88020d7ed200 task.stack: ffff880211838000
RIP: 0010:[<ffffffff8135ce99>] [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
RSP: 0018:ffff88021183bdd0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
Stack:
ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
Call Trace:
[<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
[<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc]
[<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd]
[<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd]
[<ffffffff810a9418>] kthread+0xd8/0xf0
[<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40
[<ffffffff810a9340>] ? kthread_park+0x60/0x60
Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
RIP [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
RSP <ffff88021183bdd0>
CR2: 0000000000000000
Both Bruce and Eryu ran a bisect here and found that the problematic
patch was 68778945e46 (SUNRPC: Separate buffer pointers for RPC Call and
Reply messages).
That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
set up the receive buffer, but didn't change all of the necessary
codepaths to set it properly. In particular the backchannel setup was
missing.
We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Fixes: 68778945e46 "SUNRPC: Separate buffer pointers..."
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.
v2: Add a comment to indicate the xfer between timelines upon submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
|
|
Defer the assignment of the global seqno on a request to its submission.
In the next patch, we will only allocate the global seqno at that time,
here we are just enabling the wait-for-submission before wait-for-seqno
paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-34-chris@chris-wilson.co.uk
|
|
A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
|
|
The breadcrumbs are about to be used from within IRQ context sections
(e.g. nouveau signals a fence from an interrupt handler causing us to
submit a new request) and/or from bottom-half tasklets (i.e.
intel_lrc_irq_handler), therefore we need to employ the irqsafe spinlock
variants.
For example, deferring the request submission to the
intel_lrc_irq_handler generates this trace:
[ 66.388639] =================================
[ 66.388650] [ INFO: inconsistent lock state ]
[ 66.388663] 4.9.0-rc2+ #56 Not tainted
[ 66.388672] ---------------------------------
[ 66.388682] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 66.388695] swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 66.388706] (&(&b->lock)->rlock){+.?...} , at: [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150
[ 66.388761] {SOFTIRQ-ON-W} state was registered at:
[ 66.388772] [ 66.388783] [<ffffffff810bd842>] __lock_acquire+0x682/0x1870
[ 66.388795] [ 66.388803] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0
[ 66.388814] [ 66.388824] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40
[ 66.388835] [ 66.388845] [<ffffffff81401e41>] intel_engine_reset_breadcrumbs+0x21/0xb0
[ 66.388857] [ 66.388866] [<ffffffff81403ae7>] gen8_init_common_ring+0x67/0x100
[ 66.388878] [ 66.388887] [<ffffffff81403b92>] gen8_init_render_ring+0x12/0x60
[ 66.388903] [ 66.388912] [<ffffffff813f8707>] i915_gem_init_hw+0xf7/0x2a0
[ 66.388927] [ 66.388936] [<ffffffff813f899b>] i915_gem_init+0xbb/0xf0
[ 66.388950] [ 66.388959] [<ffffffff813b4980>] i915_driver_load+0x7e0/0x1330
[ 66.388978] [ 66.388988] [<ffffffff813c09d8>] i915_pci_probe+0x28/0x40
[ 66.389003] [ 66.389013] [<ffffffff812fa0db>] pci_device_probe+0x8b/0xf0
[ 66.389028] [ 66.389037] [<ffffffff8147737e>] driver_probe_device+0x21e/0x430
[ 66.389056] [ 66.389065] [<ffffffff8147766e>] __driver_attach+0xde/0xe0
[ 66.389080] [ 66.389090] [<ffffffff814751ad>] bus_for_each_dev+0x5d/0x90
[ 66.389105] [ 66.389113] [<ffffffff81477799>] driver_attach+0x19/0x20
[ 66.389134] [ 66.389144] [<ffffffff81475ced>] bus_add_driver+0x15d/0x260
[ 66.389159] [ 66.389168] [<ffffffff81477e3b>] driver_register+0x5b/0xd0
[ 66.389183] [ 66.389281] [<ffffffff812fa19b>] __pci_register_driver+0x5b/0x60
[ 66.389301] [ 66.389312] [<ffffffff81aed333>] i915_init+0x3e/0x45
[ 66.389326] [ 66.389336] [<ffffffff81ac2ffa>] do_one_initcall+0x8b/0x118
[ 66.389350] [ 66.389359] [<ffffffff81ac323a>] kernel_init_freeable+0x1b3/0x23b
[ 66.389378] [ 66.389387] [<ffffffff8160fc39>] kernel_init+0x9/0x100
[ 66.389402] [ 66.389411] [<ffffffff816180e7>] ret_from_fork+0x27/0x40
[ 66.389426] irq event stamp: 315865
[ 66.389438] hardirqs last enabled at (315864): [<ffffffff816178f1>] _raw_spin_unlock_irqrestore+0x31/0x50
[ 66.389469] hardirqs last disabled at (315865): [<ffffffff816176b3>] _raw_spin_lock_irqsave+0x13/0x50
[ 66.389499] softirqs last enabled at (315818): [<ffffffff8107a04c>] _local_bh_enable+0x1c/0x50
[ 66.389530] softirqs last disabled at (315819): [<ffffffff8107a50e>] irq_exit+0xbe/0xd0
[ 66.389559]
[ 66.389559] other info that might help us debug this:
[ 66.389580] Possible unsafe locking scenario:
[ 66.389580]
[ 66.389598] CPU0
[ 66.389609] ----
[ 66.389620] lock(&(&b->lock)->rlock);
[ 66.389650] <Interrupt>
[ 66.389661] lock(&(&b->lock)->rlock);
[ 66.389690]
[ 66.389690] *** DEADLOCK ***
[ 66.389690]
[ 66.389715] 2 locks held by swapper/1/0:
[ 66.389728] #0: (&(&tl->lock)->rlock){..-...}, at: [<ffffffff81403e01>] intel_lrc_irq_handler+0x201/0x3c0
[ 66.389785] #1: (&(&req->lock)->rlock/1){..-...}, at: [<ffffffff813fc0af>] __i915_gem_request_submit+0x8f/0x170
[ 66.389854]
[ 66.389854] stack backtrace:
[ 66.389959] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc2+ #56
[ 66.389976] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 66.389999] ffff88027fd03c58 ffffffff812beae5 ffff88027696e680 ffffffff822afe20
[ 66.390036] ffff88027fd03ca8 ffffffff810bb420 0000000000000001 0000000000000000
[ 66.390070] 0000000000000000 0000000000000006 0000000000000004 ffff88027696ee10
[ 66.390104] Call Trace:
[ 66.390117] <IRQ>
[ 66.390128] [<ffffffff812beae5>] dump_stack+0x68/0x93
[ 66.390147] [<ffffffff810bb420>] print_usage_bug+0x1d0/0x1e0
[ 66.390164] [<ffffffff810bb8a0>] mark_lock+0x470/0x4f0
[ 66.390181] [<ffffffff810ba9d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
[ 66.390203] [<ffffffff810bd75d>] __lock_acquire+0x59d/0x1870
[ 66.390221] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0
[ 66.390237] [<ffffffff810bedbc>] ? lock_acquire+0x6c/0xb0
[ 66.390255] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150
[ 66.390273] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40
[ 66.390291] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150
[ 66.390309] [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150
[ 66.390327] [<ffffffff813fc170>] __i915_gem_request_submit+0x150/0x170
[ 66.390345] [<ffffffff81403e8b>] intel_lrc_irq_handler+0x28b/0x3c0
[ 66.390363] [<ffffffff81079d97>] tasklet_action+0x57/0xc0
[ 66.390380] [<ffffffff8107a249>] __do_softirq+0x119/0x240
[ 66.390396] [<ffffffff8107a50e>] irq_exit+0xbe/0xd0
[ 66.390414] [<ffffffff8101afd5>] do_IRQ+0x65/0x110
[ 66.390431] [<ffffffff81618806>] common_interrupt+0x86/0x86
[ 66.390446] <EOI>
[ 66.390457] [<ffffffff814ec6d1>] ? cpuidle_enter_state+0x151/0x200
[ 66.390480] [<ffffffff814ec7a2>] cpuidle_enter+0x12/0x20
[ 66.390498] [<ffffffff810b639e>] call_cpuidle+0x1e/0x40
[ 66.390516] [<ffffffff810b65ae>] cpu_startup_entry+0x10e/0x1f0
[ 66.390534] [<ffffffff81036133>] start_secondary+0x103/0x130
(This is split out of the defer global seqno allocation patch due to
realisation that we need a more complete conversion if we want to defer
request submission even further.)
v2: lockdep was warning about mixed SOFTIRQ contexts not HARDIRQ
contexts so we only need to use spin_lock_bh and not disable interrupts.
v3: We need full irq protection as we may be called from a third party
interrupt handler (via fences).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-32-chris@chris-wilson.co.uk
|
|
This will be used for communicating issues with this context to
userspace, so we want to identify the parent process and the individual
context. Note that the name isn't quite unique, it makes the presumption
of there only being a single device fd per process.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-31-chris@chris-wilson.co.uk
|
|
Currently we try to reduce the number of synchronisations (now the
number of requests we need to wait upon) by noting that if we have
earlier waited upon a request, all subsequent requests in the timeline
will be after the wait. This only applies to requests in this timeline,
as other timelines will not be ordered by that waiter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
|
|
Move the actual emission of the breadcrumb for closing the request from
i915_add_request() to the submit callback. (It can be moved later when
required.) This allows us to defer the allocation of the global_seqno
from request construction to actual submission, allowing us to emit the
requests out of order (wrt to the order of their construction, they
still will only be executed one all of their dependencies are resolved
including that all earlier requests on their timeline have been
submitted.) We have to specialise how we then emit the request in order
to write into the preallocated space, rather than at the tail of the
ringbuffer (which will have been advanced by the addition of new
requests).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
|
|
In the next patch, we will use deferred breadcrumb emission. That requires
reserving sufficient space in the ringbuffer to emit the breadcrumb, which
first requires us to know how large the breadcrumb is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
|
|
Now that the emission of the request tail and its submission to hardware
are two separate steps, engine->emit_request() is confusing.
engine->emit_request() is called to emit the breadcrumb commands for the
request into the ring, name it such (engine->emit_breadcrumb).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
|
|
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
|
|
In future patches, we will no longer be able to wait on a static global
seqno and instead have to break our wait up into phases. First we wait
for the global seqno assignment (upon submission to hardware), and once
submitted we wait for the hardware to complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-25-chris@chris-wilson.co.uk
|
|
Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk
|
|
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.
Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
|
|
After combining the dma-buf reservation object and the GEM reservation
object, we lost the ability to do a nonblocking wait on the i915 request
(as we blocked upon the reservation object during prepare_fb). We can
instead convert the reservation object into a fence upon which we can
asynchronously wait (including a forced timeout in case the DMA fence is
never signaled).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-22-chris@chris-wilson.co.uk
|
|
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
|
|
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
|
|
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.
Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).
v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
|
|
As we can locklessly (well struct_mutex-lessly) acquire the backing
storage, do so in set-domain-ioctl to reduce the contention on the
struct_mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk
|
|
We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk
|
|
We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk
|