Age | Commit message (Collapse) | Author |
|
When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
current driver disables interrupt remapping when it updates the IRTE
so that the upper and lower 64-bit values can be updated safely.
However, this creates a small window, where the interrupt could
arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
IOMMU Driver Device IRQ
============ ===========
irte.RemapEn=0
...
change IRTE IRQ from device ==> IO_PAGE_FAULT !!
...
irte.RemapEn=1
This scenario has been observed when changing irq affinity on a system
running I/O-intensive workload, in which the destination APIC ID
in the IRTE is updated.
Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
disabling the interrupt remapping. However, this means several features,
which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
is not supported (which is unprecedented for AMD processors w/ IOMMU).
Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
Reported-by: Sean Osborne <sean.m.osborne@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Erik Rockstrom <erik.rockstrom@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Currently, the RemapEn (valid) bit is accidentally cleared when
programming IRTE w/ guestMode=0. It should be restored to
the prior state.
Fixes: b9fc6b56f478 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200903093822.52012-2-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The user-after-free bug in thermal_zone_device_unregister() is reported by
KASAN. It happens because struct thermal_zone_device is released during of
device_unregister() invocation, and hence the "tz" variable shouldn't be
touched by thermal_notify_tz_delete(tz->id).
Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200817235854.26816-1-digetx@gmail.com
|
|
Currently driver is suppressing the negative temperature
readings from the vadc. Consumers of the thermal zones need
to read the negative temperature too. Don't suppress the
readings.
Fixes: c610afaa21d3c6e ("thermal: Add QPNP PMIC temperature alarm driver")
Signed-off-by: Veera Vegivada <vvegivad@codeaurora.org>
Signed-off-by: Guru Das Srinagesh <gurus@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/944856eb819081268fab783236a916257de120e4.1596040416.git.gurus@codeaurora.org
|
|
For the obscure cases where PMD and PUD are the same size
(64kB pages with 42bit VA, for example, which results in only
two levels of page tables), we can't map anything as a PUD,
because there is... erm... no PUD to speak of. Everything is
either a PMD or a PTE.
So let's only try and map a PUD when its size is different from
that of a PMD.
Cc: stable@vger.kernel.org
Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at stage 2")
Reported-by: Gavin Shan <gshan@redhat.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We can sometimes get bogus thermal shutdowns on omap4430 at least with
droid4 running idle with a battery charger connected:
thermal thermal_zone0: critical temperature reached (143 C), shutting down
Dumping out the register values shows we can occasionally get a 0x7f value
that is outside the TRM listed values in the ADC conversion table. And then
we get a normal value when reading again after that. Reading the register
multiple times does not seem help avoiding the bogus values as they stay
until the next sample is ready.
Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
should have values from 13 to 107 listed with a total of 95 values. But
looking at the omap4430_adc_to_temp array, the values are off, and the
end values are missing. And it seems that the 4430 ADC table is similar
to omap3630 rather than omap4460.
Let's fix the issue by using values based on the omap3630 table and just
ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
the missing values added while the TRM table only shows every second
value.
Note that sometimes the ADC register values within the valid table can
also be way off for about 1 out of 10 values. But it seems that those
just show about 25 C too low values rather than too high values. So those
do not cause a bogus thermal shutdown.
Fixes: 1a31270e54d7 ("staging: omap-thermal: add OMAP4 data structures")
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200706183338.25622-1-tony@atomide.com
|
|
The dev_iommu_priv_set() must be called after probe_device(). This fixes
a NULL pointer deference bug when booting a system with kernel cmdline
"intel_iommu=on,igfx_off", where the dev_iommu_priv_set() is abused.
The following stacktrace was produced:
Command line: BOOT_IMAGE=/isolinux/bzImage console=tty1 intel_iommu=on,igfx_off
...
DMAR: Host address width 39
DMAR: DRHD base: 0x000000fed90000 flags: 0x0
DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
DMAR: DRHD base: 0x000000fed91000 flags: 0x1
DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
DMAR: RMRR base: 0x0000009aa9f000 end: 0x0000009aabefff
DMAR: RMRR base: 0x0000009d000000 end: 0x0000009f7fffff
DMAR: No ATSR found
BUG: kernel NULL pointer dereference, address: 0000000000000038
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.9.0-devel+ #2
Hardware name: LENOVO 20HGS0TW00/20HGS0TW00, BIOS N1WET46S (1.25s ) 03/30/2018
RIP: 0010:intel_iommu_init+0xed0/0x1136
Code: fe e9 61 02 00 00 bb f4 ff ff ff e9 57 02 00 00 48 63 d1 48 c1 e2 04 48
03 50 20 48 8b 12 48 85 d2 74 0b 48 8b 92 d0 02 00 00 48 89 7a 38 ff c1
e9 15 f5 ff ff 48 c7 c7 60 99 ac a7 49 c7 c7 a0
RSP: 0000:ffff96d180073dd0 EFLAGS: 00010282
RAX: ffff8c91037a7d20 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
RBP: ffff96d180073e90 R08: 0000000000000001 R09: ffff8c91039fe3c0
R10: 0000000000000226 R11: 0000000000000226 R12: 000000000000000b
R13: ffff8c910367c650 R14: ffffffffa8426d60 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8c9107480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000004b100a001 CR4: 00000000003706e0
Call Trace:
? _raw_spin_unlock_irqrestore+0x1f/0x30
? call_rcu+0x10e/0x320
? trace_hardirqs_on+0x2c/0xd0
? rdinit_setup+0x2c/0x2c
? e820__memblock_setup+0x8b/0x8b
pci_iommu_init+0x16/0x3f
do_one_initcall+0x46/0x1e4
kernel_init_freeable+0x169/0x1b2
? rest_init+0x9f/0x9f
kernel_init+0xa/0x101
ret_from_fork+0x22/0x30
Modules linked in:
CR2: 0000000000000038
---[ end trace 3653722a6f936f18 ]---
Fixes: 01b9d4e21148c ("iommu/vt-d: Use dev_iommu_priv_get/set()")
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Link: https://lore.kernel.org/linux-iommu/96717683-70be-7388-3d2f-61131070a96a@secunet.com/
Link: https://lore.kernel.org/r/20200903065132.16879-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The VT-d spec requires (10.4.4 Global Command Register, GCMD_REG General
Description) that:
If multiple control fields in this register need to be modified, software
must serialize the modifications through multiple writes to this register.
However, in irq_remapping.c, modifications of IRE and CFI are done in one
write. We need to do two separate writes with STS checking after each. It
also checks the status register before writing command register to avoid
unnecessary register write.
Fixes: af8d102f999a4 ("x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20200828000615.8281-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Commit e86d1aa8b60f ("iommu/arm-smmu: Move Arm SMMU drivers into their own
subdirectory") moved drivers/iommu/qcom_iommu.c to
drivers/iommu/arm/arm-smmu/qcom_iommu.c amongst other moves, adjusted some
sections in MAINTAINERS, but missed adjusting the QUALCOMM IOMMU section.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
warning: no file matches F: drivers/iommu/qcom_iommu.c
Update the file entry in MAINTAINERS to the new location.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200825053828.4166-1-lukas.bulwahn@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Unlike we previously thought, the per-pixel alpha is just as broken on the
A20 as it is on the A10. Remove the quirk that says we can use it.
Fixes: dcf496a6a608 ("drm/sun4i: sun4i: Introduce a quirk for lowest plane alpha support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728134810.883457-2-maxime@cerno.tech
|
|
Unlike what we previously thought, only the per-pixel alpha is broken on
the lowest plane and the per-plane alpha isn't. Remove the check on the
alpha property being set on the lowest plane to reject a mode.
Fixes: dcf496a6a608 ("drm/sun4i: sun4i: Introduce a quirk for lowest plane alpha support")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728134810.883457-1-maxime@cerno.tech
|
|
Function sun8i_vi_layer_get_csc_mode() is supposed to return CSC mode
but due to inproper return type (bool instead of u32) it returns just 0
or 1. Colors are wrong for YVU formats because of that.
Fixes: daab3d0e8e2b ("drm/sun4i: de2: csc_mode in de2 format struct is mostly redundant")
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200901220305.6809-1-jernej.skrabec@siol.net
|
|
To be used in order to create foreign mappings. This is based on the
ZONE_DEVICE facility which is used by persistent memory devices in
order to create struct pages and kernel virtual mappings for the IOMEM
areas of such devices. Note that on kernels without support for
ZONE_DEVICE Xen will fallback to use ballooned pages in order to
create foreign mappings.
The newly added helpers use the same parameters as the existing
{alloc/free}_xenballooned_pages functions, which allows for in-place
replacement of the callers. Once a memory region has been added to be
used as scratch mapping space it will no longer be released, and pages
returned are kept in a linked list. This allows to have a buffer of
pages and prevents resorting to frequent additions and removals of
regions.
If enabled (because ZONE_DEVICE is supported) the usage of the new
functionality untangles Xen balloon and RAM hotplug from the usage of
unpopulated physical memory ranges to map foreign pages, which is the
correct thing to do in order to avoid mappings of foreign pages depend
on memory hotplug.
Note the driver is currently not enabled on Arm platforms because it
would interfere with the identity mapping required on some platforms.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200901083326.21264-4-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also
being used by non DAX devices.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: https://lore.kernel.org/r/20200901083326.21264-3-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
In order to protect against the header being included multiple times
on the same compilation unit.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20200901083326.21264-2-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
syzbot reports,
WARNING: inconsistent lock state
5.9.0-rc2-syzkaller #0 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor.0/26715 takes:
(padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220
{IN-SOFTIRQ-W} state was registered at:
spin_lock include/linux/spinlock.h:354 [inline]
padata_do_parallel kernel/padata.c:220
...
__do_softirq kernel/softirq.c:298
...
sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581
Possible unsafe locking scenario:
CPU0
----
lock(padata_works_lock);
<Interrupt>
lock(padata_works_lock);
padata_do_parallel() takes padata_works_lock with softirqs enabled, so a
deadlock is possible if, on the same CPU, the lock is acquired in
process context and then softirq handling done in an interrupt leads to
the same path.
Fix by leaving softirqs disabled while do_parallel holds
padata_works_lock.
Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com
Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Commit:
cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")
uses "-1" as the starting node ID, which causes the strange kernel log as
follows, when "numa=fake=32G" is added to the kernel command line:
Faking node -1 at [mem 0x0000000000000000-0x0000000893ffffff] (35136MB)
Faking node 0 at [mem 0x0000001840000000-0x000000203fffffff] (32768MB)
Faking node 1 at [mem 0x0000000894000000-0x000000183fffffff] (64192MB)
Faking node 2 at [mem 0x0000002040000000-0x000000283fffffff] (32768MB)
Faking node 3 at [mem 0x0000002840000000-0x000000303fffffff] (32768MB)
And finally the kernel crashes:
BUG: Bad page state in process swapper pfn:00011
page:(____ptrval____) refcount:0 mapcount:1 mapping:(____ptrval____) index:0x55cd7e44b270 pfn:0x11
failed to read mapping contents, not a valid kernel address?
flags: 0x5(locked|uptodate)
raw: 0000000000000005 000055cd7e44af30 000055cd7e44af50 0000000100000006
raw: 000055cd7e44b270 000055cd7e44b290 0000000000000000 000055cd7e44b510
page dumped because: page still charged to cgroup
page->mem_cgroup:000055cd7e44b510
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
Call Trace:
dump_stack+0x57/0x80
bad_page.cold+0x63/0x94
__free_pages_ok+0x33f/0x360
memblock_free_all+0x127/0x195
mem_init+0x23/0x1f5
start_kernel+0x219/0x4f5
secondary_startup_64+0xb6/0xc0
Fix this bug via using 0 as the starting node ID. This restores the
original behavior before cc9aec03e58f.
[ mingo: Massaged the changelog. ]
Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200904061047.612950-1-ying.huang@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools fixes from Arnaldo Carvalho de Melo:
- Use uintptr_t when casting numbers to pointers
- Keep output expected by 3rd parties: Turn off summary for interval
mode by default.
- BPF is in kernel space, make sure do_validate_kcore_modules() knows
about that.
- Explicitly call out event modifiers in the documentation.
- Fix jevents() allocation of space for regular expressions.
- Address libtraceevent build warnings on 32-bit arches.
- Fix checking of functions returns using ERR_PTR() in 'perf bench'.
* tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Add bpf image check to __map__is_kmodule
perf record/stat: Explicitly call out event modifiers in the documentation
perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new())
perf stat: Turn off summary for interval mode by default
libtraceevent: Fix build warning on 32-bit arches
perf jevents: Fix suspicious code in fixregex()
perf parse-events: Use uintptr_t when casting numbers to pointers
|
|
Pull networking fixes from David Miller:
1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.
2) Fix loss of RTT samples in rxrpc, from David Howells.
3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.
6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.
8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.
9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
11) Memory leak in rxkad_verify_response, from Dinghao Liu.
12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.
13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...
|
|
Merge gate page refcount fix from Dave Hansen:
"During the conversion over to pin_user_pages(), gate pages were missed.
The fix is pretty simple, and is accompanied by a new test from Andy
which probably would have caught this earlier"
* emailed patches from Dave Hansen <dave.hansen@linux.intel.com>:
selftests/x86/test_vsyscall: Improve the process_vm_readv() test
mm: fix pin vs. gup mismatch with gate pages
|
|
The existing code accepted process_vm_readv() success or failure as long
as it didn't return garbage. This is too weak: if the vsyscall page is
readable, then process_vm_readv() should succeed and, if the page is not
readable, then it should fail.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Gate pages were missed when converting from get to pin_user_pages().
This can lead to refcount imbalances. This is reliably and quickly
reproducible running the x86 selftests when vsyscall=emulate is enabled
(the default). Fix by using try_grab_page() with appropriate flags
passed.
The long story:
Today, pin_user_pages() and get_user_pages() are similar interfaces for
manipulating page reference counts. However, "pins" use a "bias" value
and manipulate the actual reference count by 1024 instead of 1 used by
plain "gets".
That means that pin_user_pages() must be matched with unpin_user_pages()
and can't be mixed with a plain put_user_pages() or put_page().
Enter gate pages, like the vsyscall page. They are pages usually in the
kernel image, but which are mapped to userspace. Userspace is allowed
access to them, including interfaces using get/pin_user_pages(). The
refcount of these kernel pages is manipulated just like a normal user
page on the get/pin side so that the put/unpin side can work the same
for normal user pages or gate pages.
get_gate_page() uses try_get_page() which only bumps the refcount by
1, not 1024, even if called in the pin_user_pages() path. If someone
pins a gate page, this happens:
pin_user_pages()
get_gate_page()
try_get_page() // bump refcount +1
... some time later
unpin_user_pages()
page_ref_sub_and_test(page, 1024))
... and boom, we get a refcount off by 1023. This is reliably and
quickly reproducible running the x86 selftests when booted with
vsyscall=emulate (the default). The selftests use ptrace(), but I
suspect anything using pin_user_pages() on gate pages could hit this.
To fix it, simply use try_grab_page() instead of try_get_page(), and
pass 'gup_flags' in so that FOLL_PIN can be respected.
This bug traces back to the very beginning of the FOLL_PIN support in
commit 3faa52c03f44 ("mm/gup: track FOLL_PIN pages"), which showed up in
the 5.7 release.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A couple of minor fixes to the display changes that went in for 5.9.
The most important of which is a workaround for a HW bug that was
exposed by better push buffer space management, leading to
random(ish...) display engine hangs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv5QDxyMihrxbPk+-sORnaYtjR6_dbM68gEhb2wxht_G1w@mail.gmail.com
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.9-rc4:
- Clang build warning fix
- HDCP fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87sgbz2pnx.fsf@intel.com
|
|
git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-09-03:
amdgpu:
- Fix for 32bit systems
- SW CTF fix
- Update for Sienna Cichlid
- CIK bug fixes
radeon:
- PLL fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903050022.3960-1-alexander.deucher@amd.com
|
|
Yonghong Song says:
====================
Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock,
before visiting each element in the bucket. This will cause a deadlock
if a map update/delete operates on an element with the same
bucket id of the visited map.
To avoid the deadlock, let us just use rcu_read_lock instead of
bucket_lock. This may result in visiting stale elements, missing some elements,
or repeating some elements, if concurrent map delete/update happens for the
same map. I think using rcu_read_lock is a reasonable compromise.
For users caring stale/missing/repeating element issues, bpf map batch
access syscall interface can be used.
Note that another approach is during bpf_iter link stage, we check
whether the iter program might be able to do update/delete to the visited
map. If it is, reject the link_create. Verifier needs to record whether
an update/delete operation happens for each map for this approach.
I just feel this checking is too specialized, hence still prefer
rcu_read_lock approach.
Patch #1 has the kernel implementation and Patch #2 added a selftest
which can trigger deadlock without Patch #1.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Added bpf_{updata,delete}_map_elem to the very map element the
iter program is visiting. Due to rcu protection, the visited map
elements, although stale, should still contain correct values.
$ ./test_progs -n 4/18
#4/18 bpf_hash_map:OK
#4 bpf_iter:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
|
|
Currently, for hashmap, the bpf iterator will grab a bucket lock, a
spinlock, before traversing the elements in the bucket. This can ensure
all bpf visted elements are valid. But this mechanism may cause
deadlock if update/deletion happens to the same bucket of the
visited map in the program. For example, if we added bpf_map_update_elem()
call to the same visited element in selftests bpf_iter_bpf_hash_map.c,
we will have the following deadlock:
============================================
WARNING: possible recursive locking detected
5.9.0-rc1+ #841 Not tainted
--------------------------------------------
test_progs/1750 is trying to acquire lock:
ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410
but task is already holding lock:
ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&htab->buckets[i].raw_lock);
lock(&htab->buckets[i].raw_lock);
*** DEADLOCK ***
...
Call Trace:
dump_stack+0x78/0xa0
__lock_acquire.cold.74+0x209/0x2e3
lock_acquire+0xba/0x380
? htab_map_update_elem+0x1cf/0x410
? __lock_acquire+0x639/0x20c0
_raw_spin_lock_irqsave+0x3b/0x80
? htab_map_update_elem+0x1cf/0x410
htab_map_update_elem+0x1cf/0x410
? lock_acquire+0xba/0x380
bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c
? find_held_lock+0x34/0xa0
bpf_iter_run_prog+0x81/0x16e
__bpf_hash_map_seq_show+0x145/0x180
bpf_seq_read+0xff/0x3d0
vfs_read+0xad/0x1c0
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(),
and then grabbed again in htab_map_update_elem() in the bpf program, causing
deadlocks.
Actually, we do not need bucket_lock here, we can just use rcu_read_lock()
similar to netlink iterator where the rcu_read_{lock,unlock} likes below:
seq_ops->start():
rcu_read_lock();
seq_ops->next():
rcu_read_unlock();
/* next element */
rcu_read_lock();
seq_ops->stop();
rcu_read_unlock();
Compared to old bucket_lock mechanism, if concurrent updata/delete happens,
we may visit stale elements, miss some elements, or repeat some elements.
I think this is a reasonable compromise. For users wanting to avoid
stale, missing/repeated accesses, bpf_map batch access syscall interface
can be used.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
|
|
The returned value of bpf_object__open_file() should be checked with
libbpf_get_error() rather than NULL. This fix prevents test_progs from
crash when test_global_data.o is not present.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200903200528.747884-1-haoluo@google.com
|
|
Andrii Nakryiko says:
====================
Currently, libbpf supports a limited form of BPF-to-BPF subprogram calls. The
restriction is that entry-point BPF program should use *all* of defined
sub-programs in BPF .o file. If any of the subprograms is not used, such
entry-point BPF program will be rejected by verifier as containing unreachable
dead code. This is not a big limitation for cases with single entry-point BPF
programs, but is quite a heavy restriction for multi-programs that use only
partially overlapping set of subprograms.
This patch set removes all such restrictions and adds complete support for
using BPF sub-program calls on BPF side. This is achieved through libbpf
tracking subprograms individually and detecting which subprograms are used by
any given entry-point BPF program, and subsequently only appending and
relocating code for just those used subprograms.
In addition, libbpf now also supports multiple entry-point BPF programs within
the same ELF section. This allows to structure code so that there are few
variants of BPF programs of the same type and attaching to the same target
(e.g., for tracepoints and kprobes) without the need to worry about ELF
section name clashes.
This patch set opens way for more wider adoption of BPF subprogram calls,
especially for real-world production use-cases with complicated net of
subprograms. This will allow to further scale BPF verification process through
good use of global functions, which can be verified independently. This is
also important prerequisite for static linking which allows static BPF
libraries to not worry about naming clashes for section names, as well as use
static non-inlined functions (subprograms) without worries of verifier
rejecting program due to dead code.
Patch set is structured as follows:
- patched 1-6 contain all the libbpf changes necessary to support multi-prog
sections and bpf2bpf subcalls;
- patch 7 adds dedicated selftests validating all combinations of possible
sub-calls (within and across sections, static vs global functions);
- patch 8 deprecated bpf_program__title() in favor of
bpf_program__section_name(). The intent was to also deprecate
bpf_object__find_program_by_title() as it's now non-sensical with multiple
programs per section. But there were too many selftests uses of this and
I didn't want to delay this patches further and make it even bigger, so left
it for a follow up cleanup;
- patches 9-10 remove uses for title-related APIs from bpftool and
bpf_program__title() use from selftests;
- patch 11 is converting fexit_bpf2bpf to have explicit subtest (it does
contain 4 subtests, which are not handled as sub-tests);
- patches 12-14 convert few complicated BPF selftests to use __noinline
functions to further validate correctness of libbpf's bpf2bpf processing
logic.
v2->v3:
- explained subprog relocation algorithm in more details (Alexei);
- pyperf, strobelight and cls_redirect got new subprog variants, leaving
other modes intact (Alexei);
v1->v2:
- rename DEPRECATED to LIBBPF_DEPRECATED to avoid name clashes;
- fix test_subprogs build;
- convert a bunch of complicated selftests to __noinline (Alexei).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As one of the most complicated and close-to-real-world programs, cls_redirect
is a good candidate to exercise libbpf's logic of handling bpf2bpf calls. So
add variant with using explicit __noinline for majority of functions except
few most basic ones. If those few functions are inlined, verifier starts to
complain about program instruction limit of 1mln instructions being exceeded,
most probably due to instruction overhead of doing a sub-program call.
Convert user-space part of selftest to have to sub-tests: with and without
inlining.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200903203542.15944-15-andriin@fb.com
|
|
Update xdp_noinline to use BPF skeleton and force __noinline on helper
sub-programs. Also, split existing logic into v4- and v6-only to complicate
sub-program calling patterns (partially overlapped sets of functions for
entry-level BPF programs).
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-14-andriin@fb.com
|
|
Add use of non-inlined subprogs to few bigger selftests to excercise libbpf's
bpf2bpf handling logic. Also split l4lb_all selftest into two sub-tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-13-andriin@fb.com
|
|
There are clearly 4 subtests, so make it official.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-12-andriin@fb.com
|
|
BPF program title is ambigious and misleading term. It is ELF section name, so
let's just call it that and deprecate bpf_program__title() API in favor of
bpf_program__section_name().
Additionally, using bpf_object__find_program_by_title() is now inherently
dangerous and ambiguous, as multiple BPF program can have the same section
name. So deprecate this API as well and recommend to switch to non-ambiguous
bpf_object__find_program_by_name().
Internally, clean up usage and mis-usage of BPF program section name for
denoting BPF program name. Shorten the field name to prog->sec_name to be
consistent with all other prog->sec_* variables.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-11-andriin@fb.com
|
|
Remove all uses of bpf_program__title() and
bpf_program__find_program_by_title().
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-10-andriin@fb.com
|
|
bpf_program__title() is deprecated, switch to bpf_program__section_name() and
avoid compilation warnings.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-9-andriin@fb.com
|
|
Add a selftest excercising bpf-to-bpf subprogram calls, as well as multiple
entry-point BPF programs per section. Also make sure that BPF CO-RE works for
such set ups both for sub-programs and for multi-entry sections.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-8-andriin@fb.com
|
|
Adjust struct_ops handling code to work with multi-program ELF sections
properly.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-7-andriin@fb.com
|
|
Complete multi-prog sections and multi sub-prog support in libbpf by properly
adjusting .BTF.ext's line and function information. Mark exposed
btf_ext__reloc_func_info() and btf_ext__reloc_func_info() APIs as deprecated.
These APIs have simplistic assumption that all sub-programs are going to be
appended to all main BPF programs, which doesn't hold in real life. It's
unlikely there are any users of this API, as it's very libbpf
internals-specific.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-6-andriin@fb.com
|
|
This patch implements general and correct logic for bpf-to-bpf sub-program
calls. Only sub-programs used (called into) from entry-point (main) BPF
program are going to be appended at the end of main BPF program. This ensures
that BPF verifier won't encounter any dead code due to copying unreferenced
sub-program. This change means that each entry-point (main) BPF program might
have a different set of sub-programs appended to it and potentially in
different order. This has implications on how sub-program call relocations
need to be handled, described below.
All relocations are now split into two categores: data references (maps and
global variables) and code references (sub-program calls). This distinction is
important because data references need to be relocated just once per each BPF
program and sub-program. These relocation are agnostic to instruction
locations, because they are not code-relative and they are relocating against
static targets (maps, variables with fixes offsets, etc).
Sub-program RELO_CALL relocations, on the other hand, are highly-dependent on
code position, because they are recorded as instruction-relative offset. So
BPF sub-programs (those that do calls into other sub-programs) can't be
relocated once, they need to be relocated each time such a sub-program is
appended at the end of the main entry-point BPF program. As mentioned above,
each main BPF program might have different subset and differen order of
sub-programs, so call relocations can't be done just once. Splitting data
reference and calls relocations as described above allows to do this
efficiently and cleanly.
bpf_object__find_program_by_name() will now ignore non-entry BPF programs.
Previously one could have looked up '.text' fake BPF program, but the
existence of such BPF program was always an implementation detail and you
can't do much useful with it. Now, though, all non-entry sub-programs get
their own BPF program with name corresponding to a function name, so there is
no more '.text' name for BPF program. This means there is no regression,
effectively, w.r.t. API behavior. But this is important aspect to highlight,
because it's going to be critical once libbpf implements static linking of BPF
programs. Non-entry static BPF programs will be allowed to have conflicting
names, but global and main-entry BPF program names should be unique. Just like
with normal user-space linking process. So it's important to restrict this
aspect right now, keep static and non-entry functions as internal
implementation details, and not have to deal with regressions in behavior
later.
This patch leaves .BTF.ext adjustment as is until next patch.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-5-andriin@fb.com
|
|
Fix up CO-RE relocation code to handle relocations against ELF sections
containing multiple BPF programs. This requires lookup of a BPF program by its
section name and instruction index it contains. While it could have been done
as a simple loop, it could run into performance issues pretty quickly, as
number of CO-RE relocations can be quite large in real-world applications, and
each CO-RE relocation incurs BPF program look up now. So instead of simple
loop, implement a binary search by section name + insn offset.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200903203542.15944-4-andriin@fb.com
|
|
Teach libbpf how to parse code sections into potentially multiple bpf_program
instances, based on ELF FUNC symbols. Each BPF program will keep track of its
position within containing ELF section for translating section instruction
offsets into program instruction offsets: regardless of BPF program's location
in ELF section, it's first instruction is always at local instruction offset
0, so when libbpf is working with relocations (which use section-based
instruction offsets) this is critical to make proper translations.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200903203542.15944-3-andriin@fb.com
|
|
libbpf ELF parsing logic might need symbols available before ELF parsing is
completed, so we need to make sure that symbols table section is found in
a separate pass before all the subsequent sections are processed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200903203542.15944-2-andriin@fb.com
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'smsc9420_probe()', GFP_KERNEL can be used
because it is a probe function and no lock is acquired.
While at it, rewrite the size passed to 'dma_alloc_coherent()' the same way
as the one passed to 'dma_free_coherent()'. This form is less verbose:
sizeof(struct smsc9420_dma_desc) * RX_RING_SIZE +
sizeof(struct smsc9420_dma_desc) * TX_RING_SIZE,
vs
sizeof(struct smsc9420_dma_desc) * (RX_RING_SIZE + TX_RING_SIZE)
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'epic_init_one()', GFP_KERNEL can be used
because it is a probe function and no lock is acquired.
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Karsten Graul says:
====================
net/smc: fixes 2020-09-03
Please apply the following patch series for smc to netdev's net tree.
Patch 1 fixes the toleration of older SMC implementations. Patch 2
takes care of a problem that happens when SMCR is used after SMCD
initialization failed. Patch 3 fixes a problem with freed send buffers,
and patch 4 corrects refcounting when SMC terminates due to device
removal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an ISM device is removed, all its linkgroups are terminated,
i.e. all the corresponding connections are killed.
Connection killing invokes smc_close_active_abort(), which decreases
the sock refcount for certain states to simulate passive closing.
And it cancels the close worker and has to give up the sock lock for
this timeframe. This opens the door for a passive close worker or a
socket close to run in between. In this case smc_close_active_abort() and
passive close worker resp. smc_release() might do a sock_put for passive
closing. This causes:
[ 1323.315943] refcount_t: underflow; use-after-free.
[ 1323.316055] WARNING: CPU: 3 PID: 54469 at lib/refcount.c:28 refcount_warn_saturate+0xe8/0x130
[ 1323.316069] Kernel panic - not syncing: panic_on_warn set ...
[ 1323.316084] CPU: 3 PID: 54469 Comm: uperf Not tainted 5.9.0-20200826.rc2.git0.46328853ed20.300.fc32.s390x+debug #1
[ 1323.316096] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
[ 1323.316108] Call Trace:
[ 1323.316125] [<00000000c0d4aae8>] show_stack+0x90/0xf8
[ 1323.316143] [<00000000c15989b0>] dump_stack+0xa8/0xe8
[ 1323.316158] [<00000000c0d8344e>] panic+0x11e/0x288
[ 1323.316173] [<00000000c0d83144>] __warn+0xac/0x158
[ 1323.316187] [<00000000c1597a7a>] report_bug+0xb2/0x130
[ 1323.316201] [<00000000c0d36424>] monitor_event_exception+0x44/0xc0
[ 1323.316219] [<00000000c195c716>] pgm_check_handler+0x1da/0x238
[ 1323.316234] [<00000000c151844c>] refcount_warn_saturate+0xec/0x130
[ 1323.316280] ([<00000000c1518448>] refcount_warn_saturate+0xe8/0x130)
[ 1323.316310] [<000003ff801f2e2a>] smc_release+0x192/0x1c8 [smc]
[ 1323.316323] [<00000000c169f1fa>] __sock_release+0x5a/0xe0
[ 1323.316334] [<00000000c169f2ac>] sock_close+0x2c/0x40
[ 1323.316350] [<00000000c1086de0>] __fput+0xb8/0x278
[ 1323.316362] [<00000000c0db1e0e>] task_work_run+0x76/0xb8
[ 1323.316393] [<00000000c0d8ab84>] do_exit+0x26c/0x520
[ 1323.316408] [<00000000c0d8af08>] do_group_exit+0x48/0xc0
[ 1323.316421] [<00000000c0d8afa8>] __s390x_sys_exit_group+0x28/0x38
[ 1323.316433] [<00000000c195c32c>] system_call+0xe0/0x2b4
[ 1323.316446] 1 lock held by uperf/54469:
[ 1323.316456] #0: 0000000044125e60 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x44/0xe0
The patch rechecks sock state in smc_close_active_abort() after
smc_close_cancel_work() to avoid duplicate decrease of sock
refcount for the same purpose.
Fixes: 611b63a12732 ("net/smc: cancel tx worker in case of socket aborts")
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an SMC connection is created, and there is a problem to
create an RMB or DMB, the previously created send buffer is
thrown away as well including buffer descriptor freeing.
Make sure the connection no longer references the freed
buffer descriptor, otherwise bugs like this are possible:
[71556.835148] =============================================================================
[71556.835168] BUG kmalloc-128 (Tainted: G B OE ): Poison overwritten
[71556.835172] -----------------------------------------------------------------------------
[71556.835179] INFO: 0x00000000d20894be-0x00000000aaef63e9 @offset=2724. First byte 0x0 instead of 0x6b
[71556.835215] INFO: Allocated in __smc_buf_create+0x184/0x578 [smc] age=0 cpu=5 pid=46726
[71556.835234] ___slab_alloc+0x5a4/0x690
[71556.835239] __slab_alloc.constprop.0+0x70/0xb0
[71556.835243] kmem_cache_alloc_trace+0x38e/0x3f8
[71556.835250] __smc_buf_create+0x184/0x578 [smc]
[71556.835257] smc_buf_create+0x2e/0xe8 [smc]
[71556.835264] smc_listen_work+0x516/0x6a0 [smc]
[71556.835275] process_one_work+0x280/0x478
[71556.835280] worker_thread+0x66/0x368
[71556.835287] kthread+0x17a/0x1a0
[71556.835294] ret_from_fork+0x28/0x2c
[71556.835301] INFO: Freed in smc_buf_create+0xd8/0xe8 [smc] age=0 cpu=5 pid=46726
[71556.835307] __slab_free+0x246/0x560
[71556.835311] kfree+0x398/0x3f8
[71556.835318] smc_buf_create+0xd8/0xe8 [smc]
[71556.835324] smc_listen_work+0x516/0x6a0 [smc]
[71556.835328] process_one_work+0x280/0x478
[71556.835332] worker_thread+0x66/0x368
[71556.835337] kthread+0x17a/0x1a0
[71556.835344] ret_from_fork+0x28/0x2c
[71556.835348] INFO: Slab 0x00000000a0744551 objects=51 used=51 fp=0x0000000000000000 flags=0x1ffff00000010200
[71556.835352] INFO: Object 0x00000000563480a1 @offset=2688 fp=0x00000000289567b2
[71556.835359] Redzone 000000006783cde2: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835363] Redzone 00000000e35b876e: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835367] Redzone 0000000023074562: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835372] Redzone 00000000b9564b8c: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835376] Redzone 00000000810c6362: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835380] Redzone 0000000065ef52c3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835384] Redzone 00000000c5dd6984: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835388] Redzone 000000004c480f8f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[71556.835392] Object 00000000563480a1: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835397] Object 000000009c479d06: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835401] Object 000000006e1dce92: 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b kkkk....kkkkkkkk
[71556.835405] Object 00000000227f7cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835410] Object 000000009a701215: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835414] Object 000000003731ce76: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835418] Object 00000000f7085967: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[71556.835422] Object 0000000007f99927: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[71556.835427] Redzone 00000000579c4913: bb bb bb bb bb bb bb bb ........
[71556.835431] Padding 00000000305aef82: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[71556.835435] Padding 00000000b1cdd722: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[71556.835438] Padding 00000000c7568199: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[71556.835442] Padding 00000000fad4c4d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[71556.835451] CPU: 0 PID: 47939 Comm: kworker/0:15 Tainted: G B OE 5.9.0-rc1uschi+ #54
[71556.835456] Hardware name: IBM 3906 M03 703 (LPAR)
[71556.835464] Workqueue: events smc_listen_work [smc]
[71556.835470] Call Trace:
[71556.835478] [<00000000d5eaeb10>] show_stack+0x90/0xf8
[71556.835493] [<00000000d66fc0f8>] dump_stack+0xa8/0xe8
[71556.835499] [<00000000d61a511c>] check_bytes_and_report+0x104/0x130
[71556.835504] [<00000000d61a57b2>] check_object+0x26a/0x2e0
[71556.835509] [<00000000d61a59bc>] alloc_debug_processing+0x194/0x238
[71556.835514] [<00000000d61a8c14>] ___slab_alloc+0x5a4/0x690
[71556.835519] [<00000000d61a9170>] __slab_alloc.constprop.0+0x70/0xb0
[71556.835524] [<00000000d61aaf66>] kmem_cache_alloc_trace+0x38e/0x3f8
[71556.835530] [<000003ff80549bbc>] __smc_buf_create+0x184/0x578 [smc]
[71556.835538] [<000003ff8054a396>] smc_buf_create+0x2e/0xe8 [smc]
[71556.835545] [<000003ff80540c16>] smc_listen_work+0x516/0x6a0 [smc]
[71556.835549] [<00000000d5f0f448>] process_one_work+0x280/0x478
[71556.835554] [<00000000d5f0f6a6>] worker_thread+0x66/0x368
[71556.835559] [<00000000d5f18692>] kthread+0x17a/0x1a0
[71556.835563] [<00000000d6abf3b8>] ret_from_fork+0x28/0x2c
[71556.835569] INFO: lockdep is turned off.
[71556.835573] FIX kmalloc-128: Restoring 0x00000000d20894be-0x00000000aaef63e9=0x6b
[71556.835577] FIX kmalloc-128: Marking all objects used
Fixes: fd7f3a746582 ("net/smc: remove freed buffer from list")
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SMC tries to make use of SMCD first. If a problem shows up,
it tries to switch to SMCR. If the SMCD initializing problem shows
up after the SMCD connection has already been initialized, field
rx_off keeps the wrong SMCD value for SMCR, which results in corrupted
data at the receiver.
This patch adds an explicit (re-)setting of field rx_off to zero if the
connection uses SMCR.
Fixes: be244f28d22f ("net/smc: add SMC-D support in data transfer")
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|