Age | Commit message (Collapse) | Author |
|
Once a send context is taken down due to a link failure, any QPs waiting
for pio credits will stay on the waitlist indefinitely.
Fix by wakeing up all QPs linked to piowait list.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Once an SDMA engine is taken down due to a link failure, any waiting QPs
that do not have outstanding descriptors in the ring will stay
on the dmawait list as long as the port is down.
Since there is no timer running, they will stay there for a long time.
The fix is to wake up all iowaits linked to dmawait. The send engine
will build and post packets that get flushed back.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
SDMA and pio flushes will cause a lot of packets to be transmitted
after a link has gone down, using a lot of CPU to retransmit
packets.
Fix for RC QPs by recognizing the flush status and:
- Forcing a timer start
- Putting the QP into a "send one" mode
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This paves the way for another patch that reacts to a
flush sdma completion for RC.
Fixes: 81cd3891f021 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The following warning can happen when a memory shortage
occurs during txreq allocation:
[10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[10220.939247] cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0
[10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1]
[10220.939261] node 0: slabs: 1026568, objs: 43115856, free: 0
[10220.939262] Call Trace:
[10220.939262] node 1: slabs: 820872, objs: 34476624, free: 0
[10220.939263] dump_stack+0x5a/0x73
[10220.939265] warn_alloc+0x103/0x190
[10220.939267] ? wake_all_kswapds+0x54/0x8b
[10220.939268] __alloc_pages_slowpath+0x86c/0xa2e
[10220.939270] ? __alloc_pages_nodemask+0x2fe/0x320
[10220.939271] __alloc_pages_nodemask+0x2fe/0x320
[10220.939273] new_slab+0x475/0x550
[10220.939275] ___slab_alloc+0x36c/0x520
[10220.939287] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939299] ? __get_txreq+0x54/0x160 [hfi1]
[10220.939310] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939312] __slab_alloc+0x40/0x61
[10220.939323] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939325] kmem_cache_alloc+0x181/0x1b0
[10220.939336] hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939348] ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1]
[10220.939359] ? find_prev_entry+0xb0/0xb0 [hfi1]
[10220.939371] hfi1_do_send+0x1d9/0x3f0 [hfi1]
[10220.939372] process_one_work+0x171/0x380
[10220.939374] worker_thread+0x49/0x3f0
[10220.939375] kthread+0xf8/0x130
[10220.939377] ? max_active_store+0x80/0x80
[10220.939378] ? kthread_bind+0x10/0x10
[10220.939379] ret_from_fork+0x35/0x40
[10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
The shortage is handled properly so the message isn't needed. Silence by
adding the no warn option to the slab allocation.
Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.
Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
PME polling does not take into account that a device that is directly
connected to the host bridge may go into D3cold as well. This leads to a
situation where the PME poll thread reads from a config space of a
device that is in D3cold and gets incorrect information because the
config space is not accessible.
Here is an example from Intel Ice Lake system where two PCIe root ports
are in D3cold (I've instrumented the kernel to log the PMCSR register
contents):
[ 62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff
[ 62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff
Since 0xffff is interpreted so that PME is pending, the root ports will
be runtime resumed. This repeats over and over again essentially
blocking all runtime power management.
Prevent this from happening by checking whether the device is in D3cold
before its PME status is read.
Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: 3.6+ <stable@vger.kernel.org> # v3.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that
consists of a PCIe switch and two PCIe endpoints:
+-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
+-01.0-[04-36]-- DS hotplug port
+-02.0-[37]----00.0 xHCI controller
\-04.0-[38-6b]-- DS hotplug port
The root port (1b.0) and the PCIe switch downstream ports are all PCIe
gen3 so they support 8GT/s link speeds.
We wait for the PCIe hierarchy to enter D3cold (runtime):
pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that
we must follow the rules in PCIe 4.0 section 6.6.1.
For the PCIe gen3 ports we are dealing with here, the following applies:
With a Downstream Port that supports Link speeds greater than 5.0
GT/s, software must wait a minimum of 100 ms after Link training
completes before sending a Configuration Request to the device
immediately below that Port. Software can determine when Link training
completes by polling the Data Link Layer Link Active bit or by setting
up an associated interrupt (see Section 6.7.3.3).
Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):
pcieport 0000:00:1b.0: wait for 100ms after DLLLA is set before access to 0000:01:00.0
pcieport 0000:02:00.0: wait for 100ms after DLLLA is set before access to 0000:03:00.0
pcieport 0000:02:02.0: wait for 100ms after DLLLA is set before access to 0000:37:00.0
I've instrumented the kernel with additional logging so we can see the
actual delays the kernel performs:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
pcieport 0000:00:1b.0: waking up bus
pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
...
pcieport 0000:00:1b.0: PME# disabled
pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:01:00.0: PME# disabled
pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:00.0: PME# disabled
pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:01.0: PME# disabled
pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:02.0: PME# disabled
pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:04.0: PME# disabled
pcieport 0000:02:01.0: PME# enabled
pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:04.0: PME# enabled
pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
...
thunderbolt 0000:03:00.0: PME# disabled
xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
...
xhci_hcd 0000:37:00.0: PME# disabled
For the switch upstream port (01:00.0) we wait for 100ms but not taking
into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
transition of the root port and the two downstream hotplug ports. This
means that we deviate from what the spec requires.
Performing the same check for system sleep (s2idle) transitions we can
see following when resuming from s2idle:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
...
pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
pcieport 0000:02:01.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
pcieport 0000:02:02.0: restoring config space at offset 0x20 (was 0x0, writing 0x73f073f0)
pcieport 0000:02:04.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
pcieport 0000:02:01.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
pcieport 0000:02:00.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
pcieport 0000:02:02.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
pcieport 0000:02:04.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
pcieport 0000:02:01.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1ff10001)
pcieport 0000:02:00.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
pcieport 0000:02:02.0: restoring config space at offset 0x18 (was 0x0, writing 0x373702)
pcieport 0000:02:04.0: restoring config space at offset 0x24 (was 0x10001, writing 0x49f12001)
pcieport 0000:02:01.0: restoring config space at offset 0x20 (was 0x0, writing 0x73e05c00)
pcieport 0000:02:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
pcieport 0000:02:04.0: restoring config space at offset 0x20 (was 0x0, writing 0x89f07400)
pcieport 0000:02:01.0: restoring config space at offset 0x1c (was 0x101, writing 0x5151)
pcieport 0000:02:00.0: restoring config space at offset 0x20 (was 0x0, writing 0x8a008a00)
pcieport 0000:02:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:04.0: restoring config space at offset 0x1c (was 0x101, writing 0x6161)
pcieport 0000:02:01.0: restoring config space at offset 0x18 (was 0x0, writing 0x360402)
pcieport 0000:02:00.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
pcieport 0000:02:04.0: restoring config space at offset 0x18 (was 0x0, writing 0x6b3802)
pcieport 0000:02:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:00.0: restoring config space at offset 0x18 (was 0x0, writing 0x30302)
pcieport 0000:02:01.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:04.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:00.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:04.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
...
thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
This is even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays
but this platform does not provide the _DSM and does not go to S3 anyway
so no firmware is involved that could already handle these delays.
In this particular Intel Coffee Lake platform these delays are not
actually needed because there is an additional delay as part of the ACPI
power resource that is used to turn on power to the hierarchy but since
that additional delay is not required by any of standards (PCIe, ACPI)
it is not present in the Intel Ice Lake, for example where missing the
mandatory delays causes pciehp to start tearing down the stack too early
(links are not yet trained).
For this reason, change the PCIe portdrv PM resume hooks so that they
perform the mandatory delays before the downstream component gets
resumed. We perform the delays before port services are resumed because
otherwise pciehp might find that the link is not up (even if it is just
training) and tears-down the hierarchy.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Pull networking fixes from David Miller:
"Lots of bug fixes here:
1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.
2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
Crispin.
3) Use after free in psock backlog workqueue, from John Fastabend.
4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
Salem.
5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.
6) Network header needs to be set for packet redirect in nfp, from
John Hurley.
7) Fix udp zerocopy refcnt, from Willem de Bruijn.
8) Don't assume linear buffers in vxlan and geneve error handlers,
from Stefano Brivio.
9) Fix TOS matching in mlxsw, from Jiri Pirko.
10) More SCTP cookie memory leak fixes, from Neil Horman.
11) Fix VLAN filtering in rtl8366, from Linus Walluij.
12) Various TCP SACK payload size and fragmentation memory limit fixes
from Eric Dumazet.
13) Use after free in pneigh_get_next(), also from Eric Dumazet.
14) LAPB control block leak fix from Jeremy Sowden"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
lapb: fixed leak of control-blocks.
tipc: purge deferredq list for each grp member in tipc_group_delete
ax25: fix inconsistent lock state in ax25_destroy_timer
neigh: fix use-after-free read in pneigh_get_next
tcp: fix compile error if !CONFIG_SYSCTL
hv_sock: Suppress bogus "may be used uninitialized" warnings
be2net: Fix number of Rx queues used for flow hashing
net: handle 802.1P vlan 0 packets properly
tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
tcp: add tcp_min_snd_mss sysctl
tcp: tcp_fragment() should apply sane memory limits
tcp: limit payload size of sacked skbs
Revert "net: phylink: set the autoneg state in phylink_phy_change"
bpf: fix nested bpf tracepoints with per-cpu data
bpf: Fix out of bounds memory access in bpf_sk_storage
vsock/virtio: set SOCK_DONE on peer shutdown
net: dsa: rtl8366: Fix up VLAN filtering
net: phylink: set the autoneg state in phylink_phy_change
net: add high_order_alloc_disable sysctl/static key
tcp: add tcp_tx_skb_cache sysctl
...
|
|
A branch is needed here to get the fix into staging-next as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are some backward incompatible features pending
for months, mainly due to on-disk format expensions.
However, we should ensure that it cannot be mounted with
old kernels. Otherwise, it will causes unexpected behaviors.
Fixes: ba2b77a82022 ("staging: erofs: add super block operations")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 5.2 cycle.
* ad7150
- sense of bit for controlling adaptive vs fixed threshold was flipped.
* adt7316
- Fix a build issue due to wrong headers for gpio usage.
* lsm6dsx
- correctly suspend / resume i2c slaves when the host goes to sleep.
* mlx90632
- relax a compatability check to allow for newer devices.
Also one counters fix
* counter/ftm-quaddec
- missing dependencies in Kconfig.
* tag 'iio-fixes-for-5.2b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
counter/ftm-quaddec: Add missing dependencies in Kconfig
staging: iio: adt7316: Fix build errors when GPIOLIB is not set
iio: temperature: mlx90632 Relax the compatibility check
iio: imu: st_lsm6dsx: fix PM support for st_lsm6dsx i2c controller
staging:iio:ad7150: fix threshold mode config bit
|
|
spmi_regulator_common_get_mode and spmi_regulator_common_set_mode use
multi-level ifs which mirror a switch statement. Refactor to use a switch
statement to make the code flow more clear.
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
We want to allow the parent lookup to happen even if the index is some
value less than 0. This may be the case if a clk provider only specifies
the .name member to match a string in the "clock-names" DT property. We
shouldn't require that the index be >= 0 to make this use case work.
Fixes: 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
Reported-by: Alexandre Mergnat <amergnat@baylibre.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
An endpoint conflict occurs when the USB is working in device mode
during an isochronous communication. When the endpointA IN direction
is an isochronous IN endpoint, and the host sends an IN token to
endpointA on another device, then the OUT transaction may be missed
regardless the OUT endpoint number. Generally, this occurs when the
device is connected to the host through a hub and other devices are
connected to the same hub.
The affected OUT endpoint can be either control, bulk, isochronous, or
an interrupt endpoint. After the OUT endpoint is primed, if an IN token
to the same endpoint number on another device is received, then the OUT
endpoint may be unprimed (cannot be detected by software), which causes
this endpoint to no longer respond to the host OUT token, and thus, no
corresponding interrupt occurs.
There is no good workaround for this issue, the only thing the software
could do is numbering isochronous IN from the highest endpoint since we
have observed most of device number endpoint from the lowest.
Cc: <stable@vger.kernel.org> #v3.14+
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Jun Li <jun.li@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The device_for_each_child() doesn't require the returned value to be checked.
Thus, drop the dummy variable completely and have no warning anymore:
drivers/spi/spi.c: In function ‘spi_unregister_controller’:
drivers/spi/spi.c:2480:6: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable]
int dummy;
^~~~~
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
With both the direct-addressed and indirect-addressed CCW paths
simplified to this point, the amount of shared code between them is
(hopefully) more easily visible. Move the processing of IDA-specific
bits into the direct-addressed path, and add some useful commentary of
what the individual pieces are doing. This allows us to remove the
entire ccwchain_fetch_idal() routine and maintain a single function
for any non-TIC CCW.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-10-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
This is purely deck furniture, to help understand the merge of the
direct and indirect handlers.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-9-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
Now that both CCW codepaths build this nested array:
ccwchain->pfn_array_table[1]->pfn_array[#idaws/#pages]
We can collapse this into simply:
ccwchain->pfn_array[#idaws/#pages]
Let's do that, so that we don't have to continually navigate two
nested arrays when the first array always has a count of one.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-8-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
Now that pfn_array_table[] is always an array of 1, it seems silly to
check for the very first entry in an array in the middle of two nested
loops, since we know it'll only ever happen once.
Let's move this outside the loops to simplify things, even though
the "k" variable is still necessary.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-7-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
While processing a channel program, we currently have two nested
arrays that carry a slightly different structure. The direct CCW
path creates this:
ccwchain->pfn_array_table[1]->pfn_array[#pages]
while an IDA CCW creates:
ccwchain->pfn_array_table[#idaws]->pfn_array[1]
The distinction appears to state that each pfn_array_table entry
points to an array of contiguous pages, represented by a pfn_array,
um, array. Since the direct-addressed scenario can ONLY represent
contiguous pages, it makes the intermediate array necessary but
difficult to recognize. Meanwhile, since an IDAL can contain
non-contiguous pages and there is no logic in vfio-ccw to detect
adjacent IDAWs, it is the second array that is necessary but appearing
to be superfluous.
I am not aware of any documentation that states the pfn_array[] needs
to be of contiguous pages; it is just what the code does today.
I don't see any reason for this either, let's just flip the IDA
codepath around so that it generates:
ch_pat->pfn_array_table[1]->pfn_array[#idaws]
This will bring it in line with the direct-addressed codepath,
so that we can understand the behavior of this memory regardless
of what type of CCW is being processed. And it means the casual
observer does not need to know/care whether the pfn_array[]
represents contiguous pages or not.
NB: The existing vfio-ccw code only supports 4K-block Format-2 IDAs,
so that "#pages" == "#idaws" in this area. This means that we will
have difficulty with this overlap in terminology if support for
Format-1 or 2K-block Format-2 IDAs is ever added. I don't think that
this patch changes our ability to make that distinction.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-6-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
It is now pretty apparent that ccwchain_handle_ccw()
(nee ccwchain_handle_tic()) does everything that cp_init()
wants to do.
Let's remove that duplicated code from cp_init() and let
ccwchain_handle_ccw() handle it itself.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
Refactor ccwchain_handle_tic() into a routine that handles a channel
program address (which itself is a CCW pointer), rather than a CCW pointer
that is only a TIC CCW. This will make it easier to reuse this code for
other CCW commands.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-4-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
Extract the "does the target of this TIC already exist?" check from
ccwchain_handle_tic(), so that it's easier to refactor that function
into one that cp_init() is able to use.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-3-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
The routine cp_free() does nothing but call cp_unpin_free(), and while
most places call cp_free() there is one caller of cp_unpin_free() used
when the cp is guaranteed to have not been marked initialized.
This seems like a dubious way to make a distinction, so let's combine
these routines and make cp_free() do all the work.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190606202831.44135-2-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
|
|
If cmd19 timeout or response crcerr occurs during execute_tuning(),
it need invoke msdc_reset_hw(). Otherwise SDIO IRQ can't be detected.
Signed-off-by: jjian zhou <jjian.zhou@mediatek.com>
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Yong Mao <yong.mao@mediatek.com>
Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
SDIO IRQ is triggered by low level. It need disable SDIO IRQ
detected function. Otherwise the interrupt register can't be cleared.
It will process the interrupt more.
Signed-off-by: Jjian Zhou <jjian.zhou@mediatek.com>
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Yong Mao <yong.mao@mediatek.com>
Fixes: 5215b2e952f3 ("mmc: mediatek: Add MMC_CAP_SDIO_IRQ support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
We don't have a reproducible error case, yet our BSP team suggested that
the mmc_switch_status() command in mmc_select_hs400() should come after
the callback into the driver completing HS400 setup. It makes sense to
me because we want the status of a fully setup HS400, so it will
increase the reliability of the mmc_switch_status() command.
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The code in pci_dev_keep_suspended() is relatively hard to follow due
to the negative checks in it and in its callers and the function has
a possible side-effect (disabling the PME) which doesn't really match
its role.
For this reason, move the PME disabling from pci_dev_keep_suspended()
to a separate function and change the semantics (and name) of the
rest of it, so that 'true' is returned when the device needs to be
resumed (and not the other way around). Change the callers of
pci_dev_keep_suspended() accordingly.
While at it, make the code flow in pci_pm_poweroff() reflect the
pci_pm_suspend() more closely to avoid arbitrary differences between
them.
This is a cosmetic change with no intention to alter behavior.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
The current code resumes devices in D3hot during system suspend if
the target power state for them is D3cold, but that is not necessary
in general. It only is necessary to do that if the platform firmware
requires the device to be resumed, but that should be covered by
the platform_pci_need_resume() check anyway, so rework
pci_dev_keep_suspended() to avoid returning 'false' for devices
in D3hot which need not be resumed due to platform firmware
requirements.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Make pcc_cpufreq_init() return error codes when the driver cannot be
registered. Otherwise the driver can shows up loaded via lsmod even
though it failed initialization. This is confusing to the user.
Signed-off-by: David Arcari <darcari@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
lockdep_assert_held_write()
All callers of lockdep_assert_held_exclusive() use it to verify the
correct locking state of either a semaphore (ldisc_sem in tty,
mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by
apparmor. Thus it makes sense to rename _exclusive to _write since
that's the semantics callers care. Additionally there is already
lockdep_assert_held_read(), which this new naming is more consistent with.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The problem is that on 64bit systems then we don't clear the higher
bits of the "pending" variable. So when we do:
ack = pending & ~BIT(STMFX_REG_IRQ_SRC_EN_GPIO);
if (ack) {
the if (ack) condition relies on uninitialized data. The fix it that
I've changed "pending" from an unsigned long to a u32. I changed "n" as
well, because that's a number in the 0-10 range and it fits easily
inside an int. We do need to add a cast to "pending" when we use it in
the for_each_set_bit() loop, but that doesn't cause a problem, it's
fine.
Fixes: 06252ade9156 ("mfd: Add ST Multi-Function eXpander (STMFX) core driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amelie Delaunay <amelie.delaunay@st.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
|
|
CONFIG_ARM_GIC_MAX_NR is enabled by default.
It is redundant in x86 and IA-64 where is
without GIC.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
There is pvinfo writing come from vgpu might be unexpected, like
writing to one unknown address, GVT-g should do as reserved register
to discard any invalid write. Now GVT-g lets it write to the vreg
without prompt error message, should ignore the unexpected pvinfo
write access and leave the vreg as the default value.
For possible guest query GVT-g host feature, this returned proper
value instead of wrong guest setting.
v2: ignore unexpected pvinfo write instead of return predefined value
Fixes: e39c5add3221 ("drm/i915/gvt: vGPU MMIO virtualization")
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
For devices with performance state, we use dev_pm_opp_set_rate() to set
the appropriate clk rate and the performance state.
We do need a way to remove the performance state vote when we idle the
device and turn the clocks off. Use dev_pm_opp_set_rate() with freq = 0
to achieve this.
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
[ Viresh: Updated _set_required_opps() to handle the !opp case ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
The OPP table normally contains 'fmax' values corresponding to the
voltage or performance levels of each OPP, but we don't necessarily want
all the devices to run at fmax all the time. Running at fmax makes sense
for devices like CPU/GPU, which have a finite amount of work to do and
since a specific amount of energy is consumed at an OPP, its better to
run at the highest possible frequency for that voltage value.
On the other hand, we have IO devices which need to run at specific
frequencies only for their proper functioning, instead of maximum
possible frequency.
The OPP core currently roundup to the next possible OPP for a frequency
and select the fmax value. To support the IO devices by the OPP core,
lets do the roundup to fetch the voltage or performance state values,
but not use the OPP frequency value. Rather use the value returned by
clk_round_rate().
The current user, cpufreq, of dev_pm_opp_set_rate() already does the
rounding to the next OPP before calling this routine and it won't
have any side affects because of this change.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
[ Viresh: Massaged changelog, added comment and use temp_opp variable
instead ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Number of Rx queues used for flow hashing returned by the driver is
incorrect and this bug prevents user to use the last Rx queue in
indirection table.
Let's say we have a NIC with 6 combined queues:
[root@sm-03 ~]# ethtool -l enp4s0f0
Channel parameters for enp4s0f0:
Pre-set maximums:
RX: 5
TX: 5
Other: 0
Combined: 6
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 6
Default indirection table maps all (6) queues equally but the driver
reports only 5 rings available.
[root@sm-03 ~]# ethtool -x enp4s0f0
RX flow hash indirection table for enp4s0f0 with 5 RX ring(s):
0: 0 1 2 3 4 5 0 1
8: 2 3 4 5 0 1 2 3
16: 4 5 0 1 2 3 4 5
24: 0 1 2 3 4 5 0 1
...
Now change indirection table somehow:
[root@sm-03 ~]# ethtool -X enp4s0f0 weight 1 1
[root@sm-03 ~]# ethtool -x enp4s0f0
RX flow hash indirection table for enp4s0f0 with 6 RX ring(s):
0: 0 0 0 0 0 0 0 0
...
64: 1 1 1 1 1 1 1 1
...
Now it is not possible to change mapping back to equal (default) state:
[root@sm-03 ~]# ethtool -X enp4s0f0 equal 6
Cannot set RX flow hash configuration: Invalid argument
Fixes: 594ad54a2c3b ("be2net: Add support for setting and getting rx flow hash options")
Reported-by: Tianhao <tizhao@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omap variants for dra7 mmc voltage and boot issues
This series contains dra7 mmc voltage fixes, and fixes to the recent
changes to probe devices with device tree data insteas of legacy
platform data:
- Two fixes for dra7 mmc that needs 1.8V mode disabled as in case of a
reset, the bootrom will try to access the mmc card at 3.3V potentially
damaging the card
- Two regression fixes for am335x d_can. We must allow devices with no
control registers for ti-sysc interconnect target module driver for
at least d_can, and we remove the incorrect control registers for
d_can. And we must configure the osc clock for d_can as otherwise
register access may fail depending on the bootloader version
- Four regression fixes for dra7 variant dts files to tag rtc and usb4
as disabled for dra71x and dra76x. These SoC variants do not have
these devices, and got accidentally enabled when the L4 interconnect
got defined in the dra7-l4.dtsi for the dra7 SoC family
* tag 'omap-for-v5.2/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: dra71x: Disable usb4_tm target module
ARM: dts: dra71x: Disable rtc target module
ARM: dts: dra76x: Disable usb4_tm target module
ARM: dts: dra76x: Disable rtc target module
ARM: dts: dra76x: Update MMC2_HS200_MANUAL1 iodelay values
ARM: dts: am57xx-idk: Remove support for voltage switching for SD card
bus: ti-sysc: Handle devices with no control registers
ARM: dts: Configure osc clock for d_can on am335x
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
i.MX fixes for 5.2:
- A build fix for soc-imx8 driver which needs SOC_BUS support. To
avoid dealing with the dependency for every single i.MX SoC bus
driver, we selects at from architecture level.
- A fix on i.MX SCU firmware driver to ensure SCU irq is enabled only
after IPC is ready.
- A regression fix on cpuidle-imx6sx driver, which causes some
characters loss on serial communication.
* tag 'imx-fixes-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX
firmware: imx: SCU irq should ONLY be enabled after SCU IPC is ready
arm64: imx: Fix build error without CONFIG_SOC_BUS
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
https://github.com/Broadcom/stblinux into fixes
This pull request contains Broadcom ARM/ARM64/MIPS SoCs device drivers
fixes for 5.2-rc1, please pull the following:
- Florian fixes the biuctrl driver not to create an error condition/path
upon unsupported CPU and also fixes the biuctrl driver writes to used
a data barrier which is necessary given the HW block design
* tag 'arm-soc/for-5.2/drivers-fixes' of https://github.com/Broadcom/stblinux:
soc: bcm: brcmstb: biuctrl: Register writes require a barrier
soc: brcmstb: Fix error path for unsupported CPUs
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A set of small fixes:
- Repair the ktime_get_coarse() functions so they actually deliver
what they are supposed to: tick granular time stamps. The current
code missed to add the accumulated nanoseconds part of the
timekeeper so the resulting granularity was 1 second.
- Prevent the tracer from infinitely recursing into time getter
functions in the arm architectured timer by marking these functions
notrace
- Fix a trivial compiler warning caused by wrong qualifier ordering"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Repair ktime_get_coarse*() granularity
clocksource/drivers/arm_arch_timer: Don't trace count reader functions
clocksource/drivers/timer-ti-dm: Change to new style declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two small fixes for RAS:
- Use a proper search algorithm to find the correct element in the
CEC array. The replacement was a better choice than fixing the
crash causes by the original search function with horrible duct
tape.
- Move the timer based decay function into thread context so it can
actually acquire the mutex which protects the CEC array to prevent
corruption"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS/CEC: Convert the timer callback to a workqueue
RAS/CEC: Fix binary search function
|
|
This reverts commit ef7bfa84725d891bbdb88707ed55b2cbf94942bb.
Russell King espressed some strong opposition to this
change, explaining that this is trying to make phylink
behave outside of how it has been designed.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We get this regression when using RTL8366RB as part of a bridge
with OpenWrt:
WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
switchdev_port_attr_set_now+0x80/0xa4
lan0: Commit of attribute (id=7) failed.
(...)
realtek-smi switch lan0: failed to initialize vlan filtering on this port
This is because it is trying to disable VLAN filtering
on VLAN0, as we have forgot to add 1 to the port number
to get the right VLAN in rtl8366_vlan_filtering(): when
we initialize the VLAN we associate VLAN1 with port 0,
VLAN2 with port 1 etc, so we need to add 1 to the port
offset.
Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phy_state field of phylink should carry only valid information
especially when this can be passed to the .mac_config callback.
Update the an_enabled field with the autoneg state in the
phylink_phy_change function.
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|