Age | Commit message (Collapse) | Author |
|
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Trivial fix to clean up indentation issues, remove spaces
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Commit:
d44248a41337 ("perf/core: Rework memory accounting in perf_mmap()")
does a lot of things to the mlock accounting arithmetics, while the only
thing that actually needed to happen is subtracting the part that is
charged to the mm from the part that is charged to the user, so that the
former isn't charged twice.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: songliubraving@fb.com
Link: https://lkml.kernel.org/r/20191120170640.54123-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We can now safely read user and guest kcpustat fields on nohz_full CPUs.
Use the appropriate accessors.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-5-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now that we can read also user and guest time safely under vtime, use
the relevant accessor to fix frozen kcpustat values on nohz_full CPUs.
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-4-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Many callsites want to fetch the values of system, user, user_nice, guest
or guest_nice kcpustat fields altogether or at least a pair of these.
In that case calling kcpustat_field() for each requested field brings
unecessary overhead when we could fetch all of them in a row.
So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array
in a vtime safe way under the same RCU and seqcount block.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-3-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Provide support for user, nice, guest and guest_nice fields through
kcpustat_field().
Whether we account the delta to a nice or not nice field is decided on
top of the nice value snapshot taken at the time we call kcpustat_field().
If the nice value of the task has been changed since the last vtime
update, we may have inacurrate distribution of the nice VS unnice
cputime.
However this is considered as a minor issue compared to the proper fix
that would involve interrupting the target on nice updates, which is
undesired on nohz_full CPUs.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add support for the soft status and control register, which allows
TX_FAULT and RX_LOS to be monitored and TX_DISABLE to be set. We
make use of this when the board does not support GPIOs for these
signals.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King says:
====================
Add rudimentary SFP module quirk support
The SFP module EEPROM describes the capabilities of the module, but
doesn't describe the host interface. We have a certain amount of
guess-work to work out how to configure the host - which works most
of the time.
However, there are some (such as GPON) modules which are able to
support different host interfaces, such as 1000BASE-X and 2500BASE-X.
The module will switch between each mode until it achieves link with
the host.
There is no defined way to describe this in the SFP EEPROM, so we can
only recognise the module and handle it appropriately. This series
adds the necessary recognition of the modules using a quirk system,
and tweaks the support mask to allow them to link with the host at
2500BASE-X, thereby allowing the user to achieve full line rate.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marc Micalizzi reports that Huawei MA5671A and Alcatel/Lucent G-010S-P
modules are capable of 2500base-X, but incorrectly report their
capabilities in the EEPROM. It seems rather common that GPON modules
mis-report.
Let's fix these modules by adding some quirks.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for applying module quirks to the list of supported
ethtool link modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
snprintf returns the number of chars that would be written, not number
of chars that were actually written. As such, 'offs' may get larger than
'tbl.maxlen', causing the 'tbl.maxlen - offs' being < 0, and since the
parameter is size_t, it would overflow.
Since using scnprintf may hide the limit error, while the buffer is still
enough now, let's just add a WARN_ON_ONCE in case it reach the limit
in future.
v2: Use WARN_ON_ONCE as Jiri and Eric suggested.
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently collect_md gre tunnel will store the tunnel info(metadata_dst)
to skb_dst.
And now the non-tun-dst gre tunnel already can add tunnel header through
lwtunnel.
When received a arp_request on the non-tun-dst gre tunnel. The packet of
arp response will send through the non-tun-dst tunnel without tunnel info
which will lead the arp response packet to be dropped.
If the non-tun-dst gre tunnel also store the tunnel info as metadata_dst,
The arp response packet will set the releted tunnel info in the
iptunnel_metadata_reply.
The following is the test script:
ip netns add cl
ip l add dev vethc type veth peer name eth0 netns cl
ifconfig vethc 172.168.0.7/24 up
ip l add dev tun1000 type gretap key 1000
ip link add user1000 type vrf table 1
ip l set user1000 up
ip l set dev tun1000 master user1000
ifconfig tun1000 10.0.1.1/24 up
ip netns exec cl ifconfig eth0 172.168.0.17/24 up
ip netns exec cl ip l add dev tun type gretap local 172.168.0.17 remote 172.168.0.7 key 1000
ip netns exec cl ifconfig tun 10.0.1.7/24 up
ip r r 10.0.1.7 encap ip id 1000 dst 172.168.0.17 key dev tun1000 table 1
With this patch
ip netns exec cl ping 10.0.1.1 can success
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kobject_put() should only be called in error path.
Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The headset on this machine is not defined, after applying the quirk
ALC256_FIXUP_ASUS_HEADSET_MIC, the headset-mic works well
BugLink: https://bugs.launchpad.net/bugs/1846148
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20191121025427.8856-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We have a new Dell machine which needs to apply the quirk
ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, try to use the fallback table
to fix it this time. And we could remove all pintbls of alc236
for applying DELL1_MIC_NO_PRESENCE on Dell machines.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20191121022644.8078-2-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We have a new Dell machine which needs to apply the quirk
ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, try to use the fallback table
to fix it this time. And we could remove all pintbls of alc256
for applying DELL1_MIC_NO_PRESENCE on Dell machines.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20191121022644.8078-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We need to check the host page size is big enough to accomodate the
EQ. Let's do this before taking a reference on the EQ page to avoid
a potential leak if the check fails.
Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The EQ page is allocated by the guest and then passed to the hypervisor
with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
before handing it over to the HW. This reference is dropped either when
the guest issues the H_INT_RESET hcall or when the KVM device is released.
But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
In both cases the existing EQ page reference is leaked because we simply
overwrite it in the XIVE queue structure without calling put_page().
This is especially visible when the guest memory is backed with huge pages:
start a VM up to the guest userspace, either reboot it or unplug a vCPU,
quit QEMU. The leak is observed by comparing the value of HugePages_Free in
/proc/meminfo before and after the VM is run.
Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
platform level. This isn't the case right now because the various XIVE
drivers have different allocation needs. It could maybe worth introducing
hooks for this purpose instead of exposing XIVE internals to the drivers,
but this is certainly a huge work to be done later.
In the meantime, for easier backport, fix both vCPU unplug and guest reboot
leaks by introducing a wrapper around xive_native_configure_queue() that
does the necessary cleanup.
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
git://people.freedesktop.org/~agd5f/linux into drm-fixes
drm-fixes-5.4-2019-11-20:
amdgpu:
- Remove experimental flag for navi14
- Fix confusing power message failures on older VI parts
- Hang fix for gfxoff when using the read register interface
- Two stability regression fixes for Raven
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120235130.23755-1-alexander.deucher@amd.com
|
|
The spectre_v2 test must be built 64-bit, it includes hand-written asm
that is 64-bit only, and segfaults if built 32-bit.
Fixes: c790c3d2b0ec ("selftests/powerpc: Add a test of spectre_v2 mitigations")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191120023924.13130-1-mpe@ellerman.id.au
|
|
On PowerNV the PCIe topology is (currently) managed by the powernv platform
code in Linux in cooperation with the platform firmware. Linux's native
PCIe port service drivers operate independently of both and this can cause
problems.
The main issue is that the portbus driver will conflict with the platform
specific hotplug driver (pnv_php) over ownership of the MSI used to notify
the host when a hotplug event occurs. The portbus driver claims this MSI on
behalf of the individual port services because the same interrupt is used
for hotplug events, PMEs (on root ports), and link bandwidth change
notifications. The portbus driver will always claim the interrupt even if
the individual port service drivers, such as pciehp, are compiled out.
The second, bigger, problem is that the hotplug port service driver
fundamentally does not work on PowerNV. The platform assumes that all
PCI devices have a corresponding arch-specific handle derived from the DT
node for the device (pci_dn) and without one the platform will not allow
a PCI device to be enabled. This problem is largely due to historical
baggage, but it can't be resolved without significant re-factoring of the
platform PCI support.
We can fix these problems in the interim by setting the
"pcie_ports_disabled" flag during platform initialisation. The flag
indicates the platform owns the PCIe ports which stops the portbus driver
from being registered.
This does have the side effect of disabling all port services drivers
that is: AER, PME, BW notifications, hotplug, and DPC. However, this is
not a huge disadvantage on PowerNV since these services are either unused
or handled through other means.
Fixes: 66725152fb9f ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191118065553.30362-1-oohall@gmail.com
|
|
arch/powerpc/kernel/ contains 8 files dedicated to kexec.
Move them into a dedicated subdirectory.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Move to a/p/kexec, drop the 'machine' naming and use 'core' instead]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afbef97ec6a978574a5cf91a4441000e0a9da42a.1572351221.git.christophe.leroy@c-s.fr
|
|
Almost half of misc_32.S is dedicated to kexec.
That's the relocation function for kexec.
Drop it into a dedicated kexec_relocate_32.S
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e235973a1198195763afd3b6baffa548a83f4611.1572351221.git.christophe.leroy@c-s.fr
|
|
There is a config item CONFIG_SIMPLE_GPIO which
provides simple memory mapped GPIOs specific to powerpc.
However, the only platform which selects this option is
mpc5200, and this platform doesn't use it.
There are three boards calling simple_gpiochip_init(), but
as they don't select CONFIG_SIMPLE_GPIO, this is just a nop.
Simple_gpio is just redundant with the generic MMIO GPIO
driver which can be found in driver/gpio/ and selected via
CONFIG_GPIO_GENERIC_PLATFORM, so drop simple_gpio driver.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bf930402613b41b42d0441b784e0cc43fc18d1fb.1572529632.git.christophe.leroy@c-s.fr
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca74886c433:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
- Fix ttm bo refcnt when using the new gem obj mmap hook (Thomas)
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120204946.GA120328@art_vandelay
|
|
On compat interfaces, the high order bits of nanoseconds should be zeroed
out. This is because the application code or the libc do not guarantee
zeroing of these. If used without zeroing, kernel might be at risk of using
timespec values incorrectly.
Originally it was handled correctly, but lost during is_compat_syscall()
cleanup. Revert the condition back to check CONFIG_64BIT.
Fixes: 98f76206b335 ("compat: Cleanup in_compat_syscall() callers")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191121000303.126523-1-dima@arista.com
|
|
When CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not set it's best to
have the stub functions return ENOTSUPP instead of ENODEV,
otherwise ENODEV is a valid error when ip is incorrect which is
indistinguishable from ftrace not compiled in.
Link: http://lkml.kernel.org/r/CAADnVQ+OzTikM9EhrfsC7NFsVYhATW1SVHxK64w3xn9qpk81pg@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
This reverts commit 1c4259159132ae4ceaf7c6db37a6cf76417f73d9.
S/G display is not stable with the IOMMU enabled on some
platforms.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205523
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
There are still combinations of sbios and firmware that
are not stable.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204689
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
When gfxoff is enabled, accessing gfx registers via MMIO
can lead to a hang.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205497
Acked-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
For fine grained dpm, there is only two levels supported. However
to reflect correctly the current clock frequency, there is an
intermediate level faked. Thus on forcing level setting, we
need to treat level 2 correctly as level 1.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise, the error message prompted will confuse user.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
5.4 and newer works fine with navi14.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move the definition of pci_dev_wait() above pci_power_up() so that it can
be called from the latter with no change in functionality. This is a pure
code move with no functional change.
Link: https://lore.kernel.org/r/20191120051743.23124-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that consists
of a PCIe switch and two PCIe endpoints:
+-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
+-01.0-[04-36]-- DS hotplug port
+-02.0-[37]----00.0 xHCI controller
\-04.0-[38-6b]-- DS hotplug port
The root port (1b.0) and the PCIe switch downstream ports are all PCIe Gen3
so they support 8GT/s link speeds.
We wait for the PCIe hierarchy to enter D3cold (runtime):
pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
When it wakes up from D3cold, according to the PCIe 5.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that we
must follow the rules in PCIe 5.0 section 6.6.1.
For the PCIe Gen3 ports we are dealing with here, the following applies:
With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
software must wait a minimum of 100 ms after Link training completes
before sending a Configuration Request to the device immediately below
that Port. Software can determine when Link training completes by polling
the Data Link Layer Link Active bit or by setting up an associated
interrupt (see Section 6.7.3.3).
Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):
0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
I've instrumented the kernel with some additional logging so we can see the
actual delays performed:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
For the switch upstream port (01:00.0 reachable through 00:1b.0 root port)
we wait for 100 ms but not taking into account the DLLLA requirement. We
then wait 10 ms for D3hot -> D0 transition of the root port and the two
downstream hotplug ports. This means that we deviate from what the spec
requires.
Performing the same check for system sleep (s2idle) transitions it turns
out to be even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays but
this platform does not provide the _DSM and does not go to S3 anyway so no
firmware is involved that could already handle these delays.
On this particular platform these delays are not actually needed because
there is an additional delay as part of the ACPI power resource that is
used to turn on power to the hierarchy but since that additional delay is
not required by any of standards (PCIe, ACPI) it is not present in the
Intel Ice Lake, for example where missing the mandatory delays causes
pciehp to start tearing down the stack too early (links are not yet
trained). Below is an example how it looks like when this happens:
pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
pcieport 0000:87:04.0: PME# disabled
pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
pcieport 0000:86:00.0: Refused to change power state, currently in D3
pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
There is also one reported case (see the bugzilla link below) where the
missing delay causes xHCI on a Titan Ridge controller fail to runtime
resume when USB-C dock is plugged. This does not involve pciehp but instead
the PCI core fails to runtime resume the xHCI device:
pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
Add a new function pci_bridge_wait_for_secondary_bus() that is called on
PCI core resume and runtime resume paths accordingly if the bridge entered
D3cold (and thus went through reset).
This is second attempt to add the missing delays. The previous solution in
c2bf1fc212f7 ("PCI: Add missing link delays required by the PCIe spec") was
reverted because of two issues it caused:
1. One system become unresponsive after S3 resume due to PME service
spinning in pcie_pme_work_fn(). The root port in question reports that
the xHCI sent PME but the xHCI device itself does not have PME status
set. The PME status bit is never cleared in the root port resulting
the indefinite loop in pcie_pme_work_fn().
2. Slows down resume if the root/downstream port does not support Data
Link Layer Active Reporting because pcie_wait_for_link_delay() waits
1100 ms in that case.
This version should avoid the above issues because we restrict the delay to
happen only if the port went into D3cold.
Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
Link: https://lore.kernel.org/r/20191112091617.70282-3-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add pcie_wait_for_link_delay(). Similar to pcie_wait_for_link() but allows
passing custom activation delay in milliseconds.
Link: https://lore.kernel.org/r/20191112091617.70282-2-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
pci_raw_set_power_state() uses the Power Management capability to change a
device's power state. The capability is in config space, which is
accessible in D0, D1, D2, and D3hot, but not in D3cold.
If we call pci_raw_set_power_state() on a device that's in D3cold, config
reads fail and return ~0 data, which we erroneously interpreted as "the
device is in D3hot", leading to messages like this:
pcieport 0000:03:00.0: Refused to change power state, currently in D3
The PCI_PM_CTRL has several RsvdP fields, so ~0 is never a valid register
value. If we get that value, print a more informative message and return
an error.
Changing the power state of a device from D3cold must be done by a platform
power management method or some other non-config space mechanism.
Link: https://lore.kernel.org/r/20190822200551.129039-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Use pci_power_name() to print pci_power_t correctly. This changes:
"state 0" or "D0" to "D0"
"state 1" or "D1" to "D1"
"state 2" or "D2" to "D2"
"state 3" or "D3" to "D3hot"
"state 4" or "D4" to "D3cold"
Changes dmesg logging only, no other functional change intended.
Link: https://lore.kernel.org/r/20190822200551.129039-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Because pci_set_power_state() has become the only caller of
__pci_complete_power_transition(), there is no need for the latter to
be a separate function any more, so fold it into the former, drop a
redundant check and reduce the number of lines of code somewhat.
Code rearrangement, no intentional functional impact.
Link: https://lore.kernel.org/r/15576968.k611qn3UU0@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Notice that radeon_set_suspend(), which is the only caller of
__pci_complete_power_transition() outside of pci.c, really only
cares about the pci_platform_power_transition() invoked by it,
so export the latter instead of it, update the radeon driver to
call pci_platform_power_transition() directly and make
__pci_complete_power_transition() static.
Code rearrangement, no intentional functional impact.
Link: https://lore.kernel.org/r/1731661.ykamz2Tiuf@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Because pci_power_up() has become the only caller of
__pci_start_power_transition(), there is no need for the latter to
be a separate function any more, so fold it into the former, drop a
redundant check and reduce the number of lines of code somewhat.
Code rearrangement, no intentional functional impact.
Link: https://lore.kernel.org/r/3458080.lsoDbfkST9@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Make it explicitly clear that the code to put devices into D0 in
pci_set_power_state() and in pci_pm_default_resume_early() is the
same by making the latter use pci_power_up() for transitions into D0.
Code rearrangement, no intentional functional impact.
Link: https://lore.kernel.org/r/2520019.OZ1nXS5aSj@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Move the invocation of pci_update_current_state() from pci_power_up() to
pci_pm_default_resume_early(), which is the only caller of that function.
Preparatory change, no functional impact.
Link: https://lore.kernel.org/r/37482337.udjOGdOKNb@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
The struct pci_driver.suspend_late() hook is one of the legacy PCI power
management callbacks, and there are no remaining users of it. Remove it.
Link: https://lore.kernel.org/r/20191101204558.210235-7-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The struct pci_driver.resume_early() hook is one of the legacy PCI power
management callbacks, and there are no remaining users of it. Remove it.
Link: https://lore.kernel.org/r/20191101204558.210235-6-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Convert xen-platform from the legacy PCI power management callbacks to the
generic operations. This is one step towards removing support for the
legacy PCI callbacks.
The generic .resume_noirq() operation is called by pci_pm_resume_noirq() at
the same point the legacy PCI .resume_early() callback was, so this patch
should not change the xen-platform behavior.
Link: https://lore.kernel.org/r/20191101204558.210235-5-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
|
|
Check for the PCI_DEV_FLAGS_NO_D3 quirk early, before calling
__pci_start_power_transition(). This way all the cases where we don't need
to do anything at all are checked up front.
This doesn't fix anything because if the caller requested D3hot or D3cold,
__pci_start_power_transition() is a no-op. But calling it is pointless and
makes the code harder to analyze.
Link: https://lore.kernel.org/r/20191101204558.210235-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
pci_pm_reset() resets a device by putting it in D3hot and bringing it back
to D0. Clarify related messages to mention "D3hot" explicitly instead of
just "D3".
Link: https://lore.kernel.org/r/20191101204558.210235-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|