Age | Commit message (Collapse) | Author |
|
Fix a regression introduced when removing bi_phys_segments for Write Zeroes
requests, which need to have a segment count of zero, as they don't have a
payload.
Fixes: 14ccb66b3f58 ("block: remove the bi_phys_segments field in struct bio")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Device that bound to XDP socket will not have zero refcount until the
userspace application will not close it. This leads to hang inside
'netdev_wait_allrefs()' if device unregistering requested:
# ip link del p1
< hang on recvmsg on netlink socket >
# ps -x | grep ip
5126 pts/0 D+ 0:00 ip link del p1
# journalctl -b
Jun 05 07:19:16 kernel:
unregister_netdevice: waiting for p1 to become free. Usage count = 1
Jun 05 07:19:27 kernel:
unregister_netdevice: waiting for p1 to become free. Usage count = 1
...
Fix that by implementing NETDEV_UNREGISTER event notification handler
to properly clean up all the resources and unref device.
This should also allow socket killing via ss(8) utility.
Fixes: 965a99098443 ("xsk: add support for bind for Rx")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Device pointer stored in umem regardless of zero-copy mode,
so we heed to hold the device in all cases.
Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Selftests are reporting this failure in test_lwt_seg6local.sh:
+ ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
Error fetching program/map!
Failed to parse eBPF program: Operation not permitted
The problem is __attribute__((always_inline)) alone is not enough to prevent
clang from inserting those functions in .text. In that case, .text is not
marked as relocateable.
See the output of objdump -h test_lwt_seg6local.o:
Idx Name Size VMA LMA File off Algn
0 .text 00003530 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
This causes the iproute bpf loader to fail in bpf_fetch_prog_sec:
bpf_has_call_data returns true but bpf_fetch_prog_relo fails as there's no
relocateable .text section in the file.
To fix this, convert to 'static __always_inline'.
v2: Use 'static __always_inline' instead of 'static inline
__attribute__((always_inline))'
Fixes: c99a84eac026 ("selftests/bpf: test for seg6local End.BPF action")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The progs for bpf selftests use several different notations to force
function inlining. Standardize to what most of them use,
static __always_inline.
Suggested-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth
Manager). Includes a new BPF program supporting EDT, and also updates
corresponding programs.
It will drop packets with an EDT of more than 500us in the future
unless the packet belongs to a flow with less than 2 packets in flight.
This is done so each flow has at least 2 packets in flight, so they
will not starve, and also to help prevent delayed ACK timeouts.
It will also work with ECN enabled traffic, where the packets will be
CE marked if their EDT is more than 50us in the future.
The table below shows some performance numbers. The flows are back to
back RPCS. One server sending to another, either 2 or 4 flows.
One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more
than one flow of a given RPC size, the numbers represent averages.
The rate limit applies to all flows (they are in the same cgroup).
Tests ending with "-edt" ran with the new BPF program supporting EDT.
Tests ending with "-hbt" ran on top HBT qdisc with the specified rate
(i.e. no HBM). The other tests ran with the HBM BPF program included
in the HBM patch-set.
EDT has limited value when using DCTCP, but it helps in many cases when
using Cubic. It usually achieves larger link utilization and lower
99% latencies for the 1MB RPCs.
HBM ends up queueing a lot of packets with its default parameter values,
reducing the goodput of the 10KB RPCs and increasing their latency. Also,
the RTTs seen by the flows are quite large.
Aggr 10K 10K 10K 1MB 1MB 1MB
Limit rate drops RTT rate P90 P99 rate P90 P99
Test rate Flows Mbps % us Mbps us us Mbps ms ms
-------- ---- ----- ---- ----- --- ---- ---- ---- ---- ---- ----
cubic 1G 2 904 0.02 108 257 511 539 647 13.4 24.5
cubic-edt 1G 2 982 0.01 156 239 656 967 743 14.0 17.2
dctcp 1G 2 977 0.00 105 324 408 744 653 14.5 15.9
dctcp-edt 1G 2 981 0.01 142 321 417 811 660 15.7 17.0
cubic-htb 1G 2 919 0.00 1825 40 2822 4140 879 9.7 9.9
cubic 200M 2 155 0.30 220 81 532 655 74 283 450
cubic-edt 200M 2 188 0.02 222 87 1035 1095 101 84 85
dctcp 200M 2 188 0.03 111 77 912 939 111 76 325
dctcp-edt 200M 2 188 0.03 217 74 1416 1738 114 76 79
cubic-htb 200M 2 188 0.00 5015 8 14ms 15ms 180 48 50
cubic 1G 4 952 0.03 110 165 516 546 262 38 154
cubic-edt 1G 4 973 0.01 190 111 1034 1314 287 65 79
dctcp 1G 4 951 0.00 103 180 617 905 257 37 38
dctcp-edt 1G 4 967 0.00 163 151 732 1126 272 43 55
cubic-htb 1G 4 914 0.00 3249 13 7ms 8ms 300 29 34
cubic 5G 4 4236 0.00 134 305 490 624 1310 10 17
cubic-edt 5G 4 4865 0.00 156 306 425 759 1520 10 16
dctcp 5G 4 4936 0.00 128 485 221 409 1484 7 9
dctcp-edt 5G 4 4924 0.00 148 390 392 623 1508 11 26
v1 -> v2: Incorporated Andrii's suggestions
v2 -> v3: Incorporated Yonghong's suggestions
v3 -> v4: Removed credit update that is not needed
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Rewrite gfs2_allocate_page_backing to call gfs2_iomap_get_alloc and operate on
struct iomap directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
No need to indirect through get_blocks and buffer_heads when we can just use
the iomap version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
There is no need to keep these two functions separate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The only difference between the two is that gfs2_ordered_aops sets the
set_page_dirty method to __set_page_dirty_buffers, but given that
__set_page_dirty_buffers is the default, if no method is set, there is no need
to to do that. Merge the two sets of operations into one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The dev_info() call already prints the device name, so there's
no need to explicitly include it in the message for second time.
Signed-off-by: Enrico Weigelt <info@metux.net>
Link: https://lore.kernel.org/r/1562146944-4162-1-git-send-email-info@metux.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Add the optional reset line handling which is present on the new SoC
families, such as the g12a. Triggering this reset is not critical but
it helps solve a channel shift issue on the g12a.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20190703120749.32341-3-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add an optional reset property to the tdm formatter bindings. The
dedicated reset line is present on some SoC families, such as the g12a.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20190703120749.32341-2-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Provide a keyctl() operation to grant/remove permissions. The grant
operation, wrapped by libkeyutils, looks like:
int ret = keyctl_grant_permission(key_serial_t key,
enum key_ace_subject_type type,
unsigned int subject,
unsigned int perm);
Where key is the key to be modified, type and subject represent the subject
to which permission is to be granted (or removed) and perm is the set of
permissions to be granted. 0 is returned on success. SET_SECURITY
permission is required for this.
The subject type currently must be KEY_ACE_SUBJ_STANDARD for the moment
(other subject types will come along later).
For subject type KEY_ACE_SUBJ_STANDARD, the following subject values are
available:
KEY_ACE_POSSESSOR The possessor of the key
KEY_ACE_OWNER The owner of the key
KEY_ACE_GROUP The key's group
KEY_ACE_EVERYONE Everyone
perm lists the permissions to be granted:
KEY_ACE_VIEW Can view the key metadata
KEY_ACE_READ Can read the key content
KEY_ACE_WRITE Can update/modify the key content
KEY_ACE_SEARCH Can find the key by searching/requesting
KEY_ACE_LINK Can make a link to the key
KEY_ACE_SET_SECURITY Can set security
KEY_ACE_INVAL Can invalidate
KEY_ACE_REVOKE Can revoke
KEY_ACE_JOIN Can join this keyring
KEY_ACE_CLEAR Can clear this keyring
If an ACE already exists for the subject, then the permissions mask will be
overwritten; if perm is 0, it will be deleted.
Currently, the internal ACL is limited to a maximum of 16 entries.
For example:
int ret = keyctl_grant_permission(key,
KEY_ACE_SUBJ_STANDARD,
KEY_ACE_OWNER,
KEY_ACE_VIEW | KEY_ACE_READ);
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Intel Elkhart Lake has the same LPSS than Intel Broxton. Add support for
it.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20190703114603.22301-1-jarkko.nikula@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Define a MODULE_ALIAS() in the regulator sub-driver for max77650 so that
the appropriate module gets loaded together with the core mfd driver.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20190703084849.9668-1-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The variable ret is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20190703082009.18779-1-colin.king@canonical.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
Felipe writes:
USB: more changes for v5.3 merge window
Turns out a few more important changes came about. We have the new
Cadence DRD Driver being added here and that's the biggest, most
important part.
Together with that we have suport for new imx7ulp phy. Support for
TigerLake Devices on dwc3. Also a couple important fixes which weren't
completed in time for the -rc cycle.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
* tag 'usb-for-v5.3-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: renesas_usbhs: add a workaround for a race condition of workqueue
usb: gadget: udc: renesas_usb3: remove redundant assignment to ret
usb: dwc2: use a longer AHB idle timeout in dwc2_core_reset()
USB: gadget: function: fix issue Unneeded variable: "value"
usb: phy: phy-mxs-usb: add imx7ulp support
doc: dt-binding: mxs-usb-phy: add compatible for 7ulp
usb:cdns3 Fix for stuck packets in on-chip OUT buffer.
usb:cdns3 Add Cadence USB3 DRD Driver
usb:gadget Simplify usb_decode_get_set_descriptor function.
usb:gadget Patch simplify usb_decode_set_clear_feature function.
usb:gadget Separated decoding functions from dwc3 driver.
dt-bindings: add binding for USBSS-DRD controller.
usb: dwc3: pci: add support for TigerLake Devices
|
|
"git diff" says:
\ No newline at end of file
after modifying the files.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is no need to remove address space handler twice,
because removal is idempotent.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The pointer clk is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The intel-int3496.txt file is a documentation for an ACPI driver.
There's no reason to keep it on a separate directory.
So, instead of keeping it on some random location, move it
to a sub-directory inside the ACPI documentation dir,
renaming it to .rst.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Based on the following report from Smatch, fix the potential NULL
pointer dereference check:
tools/lib/bpf/libbpf.c:3493
bpf_prog_load_xattr() warn: variable dereferenced before check 'attr'
(see line 3483)
3479 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
3480 struct bpf_object **pobj, int *prog_fd)
3481 {
3482 struct bpf_object_open_attr open_attr = {
3483 .file = attr->file,
3484 .prog_type = attr->prog_type,
^^^^^^
3485 };
At the head of function, it directly access 'attr' without checking
if it's NULL pointer. This patch moves the values assignment after
validating 'attr' and 'attr->file'.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
GCC8 started emitting warning about using strncpy with number of bytes
exactly equal destination size, which is generally unsafe, as can lead
to non-zero terminated string being copied. Use IFNAMSIZ - 1 as number
of bytes to ensure name is always zero-terminated.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates for Linux 5.3 from Marc Zyngier:
- ACPI support for the exiu and mb86s7x drivers
- New Renesas RZ/A1, Amazon al-fic drivers
- Add quirk for Amazon Graviton GICv2m widget
- Large Renesas driver cleanup
- CSky mpintc trigger type fixes
- Meson G12A driver support
- Various minor cleanups
|
|
There are currently no tests for ALU64 shift operations when the shift
amount is 0. This adds 6 new tests to make sure they are equivalent
to a no-op. The x32 JIT had such bugs that could have been caught by
these tests.
Cc: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The current x32 BPF JIT does not correctly compile shift operations when
the immediate shift amount is 0. The expected behavior is for this to
be a no-op.
The following program demonstrates the bug. The expexceted result is 1,
but the current JITed code returns 2.
r0 = 1
r1 = 1
r1 <<= 0
if r1 == 1 goto end
r0 = 2
end:
exit
This patch simplifies the code and fixes the bug.
Fixes: 03f5781be2c7 ("bpf, x86_32: add eBPF JIT compiler for ia32")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The current x32 BPF JIT for shift operations is not correct when the
shift amount in a register is 0. The expected behavior is a no-op, whereas
the current implementation changes bits in the destination register.
The following example demonstrates the bug. The expected result of this
program is 1, but the current JITed code returns 2.
r0 = 1
r1 = 1
r2 = 0
r1 <<= r2
if r1 == 1 goto end
r0 = 2
end:
exit
The bug is caused by an incorrect assumption by the JIT that a shift by
32 clear the register. On x32 however, shifts use the lower 5 bits of
the source, making a shift by 32 equivalent to a shift by 0.
This patch fixes the bug using double-precision shifts, which also
simplifies the code.
Fixes: 03f5781be2c7 ("bpf, x86_32: add eBPF JIT compiler for ia32")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When equivalent state is found the current state needs to propagate precision marks.
Otherwise the verifier will prune the search incorrectly.
There is a price for correctness:
before before broken fixed
cnst spill precise precise
bpf_lb-DLB_L3.o 1923 8128 1863 1898
bpf_lb-DLB_L4.o 3077 6707 2468 2666
bpf_lb-DUNKNOWN.o 1062 1062 544 544
bpf_lxc-DDROP_ALL.o 166729 380712 22629 36823
bpf_lxc-DUNKNOWN.o 174607 440652 28805 45325
bpf_netdev.o 8407 31904 6801 7002
bpf_overlay.o 5420 23569 4754 4858
bpf_lxc_jit.o 39389 359445 50925 69631
Overall precision tracking is still very effective.
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Reported-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Continue consolidating Hyper-V clock and timer code into an ISA
independent Hyper-V clocksource driver.
Move the existing clocksource code under drivers/hv and arch/x86 to the new
clocksource driver while separating out the ISA dependencies. Update
Hyper-V initialization to call initialization and cleanup routines since
the Hyper-V synthetic clock is not independently enumerated in ACPI.
Update Hyper-V clocksource users in KVM and VDSO to get definitions from
the new include file.
No behavior is changed and no new functionality is added.
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "bp@alien8.de" <bp@alien8.de>
Cc: "will.deacon@arm.com" <will.deacon@arm.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "olaf@aepfle.de" <olaf@aepfle.de>
Cc: "apw@canonical.com" <apw@canonical.com>
Cc: "jasowang@redhat.com" <jasowang@redhat.com>
Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: "sashal@kernel.org" <sashal@kernel.org>
Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
Cc: "arnd@arndb.de" <arnd@arndb.de>
Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
Cc: "paul.burton@mips.com" <paul.burton@mips.com>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
Cc: "salyzyn@android.com" <salyzyn@android.com>
Cc: "pcc@google.com" <pcc@google.com>
Cc: "shuah@kernel.org" <shuah@kernel.org>
Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
Cc: "huw@codeweavers.com" <huw@codeweavers.com>
Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com
|
|
Hyper-V clock/timer code and data structures are currently mixed
in with other code in the ISA independent drivers/hv directory as
well as the ISA dependent Hyper-V code under arch/x86.
Consolidate this code and data structures into a Hyper-V clocksource driver
to better follow the Linux model. In doing so, separate out the ISA
dependent portions so the new clocksource driver works for x86 and for the
in-process Hyper-V on ARM64 code.
To start, move the existing clockevents code to create the new clocksource
driver. Update the VMbus driver to call initialization and cleanup routines
since the Hyper-V synthetic timers are not independently enumerated in
ACPI.
No behavior is changed and no new functionality is added.
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "bp@alien8.de" <bp@alien8.de>
Cc: "will.deacon@arm.com" <will.deacon@arm.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "olaf@aepfle.de" <olaf@aepfle.de>
Cc: "apw@canonical.com" <apw@canonical.com>
Cc: "jasowang@redhat.com" <jasowang@redhat.com>
Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: "sashal@kernel.org" <sashal@kernel.org>
Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
Cc: "arnd@arndb.de" <arnd@arndb.de>
Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
Cc: "paul.burton@mips.com" <paul.burton@mips.com>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
Cc: "salyzyn@android.com" <salyzyn@android.com>
Cc: "pcc@google.com" <pcc@google.com>
Cc: "shuah@kernel.org" <shuah@kernel.org>
Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
Cc: "huw@codeweavers.com" <huw@codeweavers.com>
Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
|
|
The following commands produce a backtrace and return an error but the xfrm
interface is created (in the wrong netns):
$ ip netns add foo
$ ip netns add bar
$ ip -n foo netns set bar 0
$ ip -n foo link add xfrmi0 link-netnsid 0 type xfrm dev lo if_id 23
RTNETLINK answers: Invalid argument
$ ip -n bar link ls xfrmi0
2: xfrmi0@lo: <NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00
Here is the backtrace:
[ 79.879174] WARNING: CPU: 0 PID: 1178 at net/core/dev.c:8172 rollback_registered_many+0x86/0x3c1
[ 79.880260] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic ide_cd_mod ide_gd_mod cdrom ata_$
eneric ata_piix libata scsi_mod 8139too piix psmouse i2c_piix4 ide_core 8139cp mii i2c_core floppy
[ 79.883698] CPU: 0 PID: 1178 Comm: ip Not tainted 5.2.0-rc6+ #106
[ 79.884462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 79.885447] RIP: 0010:rollback_registered_many+0x86/0x3c1
[ 79.886120] Code: 01 e8 d7 7d c6 ff 0f 0b 48 8b 45 00 4c 8b 20 48 8d 58 90 49 83 ec 70 48 8d 7b 70 48 39 ef 74 44 8a 83 d0 04 00 00 84 c0 75 1f <0f> 0b e8 61 cd ff ff 48 b8 00 01 00 00 00 00 ad de 48 89 43 70 66
[ 79.888667] RSP: 0018:ffffc900015ab740 EFLAGS: 00010246
[ 79.889339] RAX: ffff8882353e5700 RBX: ffff8882353e56a0 RCX: ffff8882353e5710
[ 79.890174] RDX: ffffc900015ab7e0 RSI: ffffc900015ab7e0 RDI: ffff8882353e5710
[ 79.891029] RBP: ffffc900015ab7e0 R08: ffffc900015ab7e0 R09: ffffc900015ab7e0
[ 79.891866] R10: ffffc900015ab7a0 R11: ffffffff82233fec R12: ffffc900015ab770
[ 79.892728] R13: ffffffff81eb7ec0 R14: ffff88822ed6cf00 R15: 00000000ffffffea
[ 79.893557] FS: 00007ff350f31740(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
[ 79.894581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.895317] CR2: 00000000006c8580 CR3: 000000022c272000 CR4: 00000000000006f0
[ 79.896137] Call Trace:
[ 79.896464] unregister_netdevice_many+0x12/0x6c
[ 79.896998] __rtnl_newlink+0x6e2/0x73b
[ 79.897446] ? __kmalloc_node_track_caller+0x15e/0x185
[ 79.898039] ? pskb_expand_head+0x5f/0x1fe
[ 79.898556] ? stack_access_ok+0xd/0x2c
[ 79.899009] ? deref_stack_reg+0x12/0x20
[ 79.899462] ? stack_access_ok+0xd/0x2c
[ 79.899927] ? stack_access_ok+0xd/0x2c
[ 79.900404] ? __module_text_address+0x9/0x4f
[ 79.900910] ? is_bpf_text_address+0x5/0xc
[ 79.901390] ? kernel_text_address+0x67/0x7b
[ 79.901884] ? __kernel_text_address+0x1a/0x25
[ 79.902397] ? unwind_get_return_address+0x12/0x23
[ 79.903122] ? __cmpxchg_double_slab.isra.37+0x46/0x77
[ 79.903772] rtnl_newlink+0x43/0x56
[ 79.904217] rtnetlink_rcv_msg+0x200/0x24c
In fact, each time a xfrm interface was created, a netdev was allocated
by __rtnl_newlink()/rtnl_create_link() and then another one by
xfrmi_newlink()/xfrmi_create(). Only the second one was registered, it's
why the previous commands produce a backtrace: dev_change_net_namespace()
was called on a netdev with reg_state set to NETREG_UNINITIALIZED (the
first one).
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Benedict Wong <benedictwong@google.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Shannon Nelson <shannon.nelson@oracle.com>
CC: Antony Antony <antony@phenome.org>
CC: Eyal Birger <eyal.birger@gmail.com>
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
syzbot reported following spat:
BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221
BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455
BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066
Workqueue: events xfrm_hash_rebuild
Call Trace:
__write_once_size include/linux/compiler.h:221 [inline]
hlist_del_rcu include/linux/rculist.h:455 [inline]
xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
process_one_work+0x814/0x1130 kernel/workqueue.c:2269
Allocated by task 8064:
__kmalloc+0x23c/0x310 mm/slab.c:3669
kzalloc include/linux/slab.h:742 [inline]
xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21
xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline]
xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120
ops_init+0x336/0x420 net/core/net_namespace.c:130
setup_net+0x212/0x690 net/core/net_namespace.c:316
The faulting address is the address of the old chain head,
free'd by xfrm_hash_resize().
In xfrm_hash_rehash(), chain heads get re-initialized without
any hlist_del_rcu:
for (i = hmask; i >= 0; i--)
INIT_HLIST_HEAD(odst + i);
Then, hlist_del_rcu() gets called on the about to-be-reinserted policy
when iterating the per-net list of policies.
hlist_del_rcu() will then make chain->first be nonzero again:
static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next; // address of next element in list
struct hlist_node **pprev = n->pprev;// location of previous elem, this
// can point at chain->first
WRITE_ONCE(*pprev, next); // chain->first points to next elem
if (next)
next->pprev = pprev;
Then, when we walk chainlist to find insertion point, we may find a
non-empty list even though we're supposedly reinserting the first
policy to an empty chain.
To fix this first unlink all exact and inexact policies instead of
zeroing the list heads.
Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh,
without fix KASAN catches the corruption as it happens, SLUB poisoning
detects it a bit later.
Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com
Fixes: 1548bc4e0512 ("xfrm: policy: delete inexact policies from inexact list on hash rebuild")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
so the hyper-v clocksource update can be applied.
|
|
gic-pm driver does not use pm-clk interface now and hence the dependency
is removed from Kconfig.
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
We need to convert all old gpio irqchips to pass the irqchip
setup along when adding the gpio_chip.
For chained irqchips this is a pretty straight-forward
conversion.
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Tien Hock Loh <thloh@altera.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
By using devm_gpiochip_add_data() we can get rid of the
remove() callback. As this driver doesn't use the
gpiochip data pointer we simply pass in NULL.
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This makes the code easier to read.
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
We need to convert all old gpio irqchips to pass the irqchip
setup along when adding the gpio_chip.
For chained irqchips this is a pretty straight-forward
conversion.
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Controller Driver
The Amazon's Annapurna Labs Fabric Interrupt Controller has 32 inputs.
A FIC (Fabric Interrupt Controller) may be cascaded into another FIC or
directly to the main CPU Interrupt Controller (e.g. GIC).
Signed-off-by: Talel Shenhar <talel@amazon.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Document Amazon's Annapurna Labs Fabric Interrupt Controller SoC binding.
Signed-off-by: Talel Shenhar <talel@amazon.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Quite some time ago the interrupt entry stubs for unused vectors in the
system vector range got removed and directly mapped to the spurious
interrupt vector entry point.
Sounds reasonable, but it's subtly broken. The spurious interrupt vector
entry point pushes vector number 0xFF on the stack which makes the whole
logic in __smp_spurious_interrupt() pointless.
As a consequence any spurious interrupt which comes from a vector != 0xFF
is treated as a real spurious interrupt (vector 0xFF) and not
acknowledged. That subsequently stalls all interrupt vectors of equal and
lower priority, which brings the system to a grinding halt.
This can happen because even on 64-bit the system vector space is not
guaranteed to be fully populated. A full compile time handling of the
unused vectors is not possible because quite some of them are conditonally
populated at runtime.
Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
but gains the proper handling back. There is no point to selectively spare
some of the stubs which are known at compile time as the required code in
the IDT management would be way larger and convoluted.
Do not route the spurious entries through common_interrupt and do_IRQ() as
the original code did. Route it to smp_spurious_interrupt() which evaluates
the vector number and acts accordingly now that the real vector numbers are
handed in.
Fixup the pr_warn so the actual spurious vector (0xff) is clearly
distiguished from the other vectors and also note for the vectored case
whether it was pending in the ISR or not.
"Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
"Spurious interrupt vector 0xed on CPU#1. Acked."
"Spurious interrupt vector 0xee on CPU#1. Not pending!."
Fixes: 2414e021ac8d ("x86: Avoid building unused IRQ entry stubs")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
|
|
Since the rework of the vector management, warnings about spurious
interrupts have been reported. Robert provided some more information and
did an initial analysis. The following situation leads to these warnings:
CPU 0 CPU 1 IO_APIC
interrupt is raised
sent to CPU1
Unable to handle
immediately
(interrupts off,
deep idle delay)
mask()
...
free()
shutdown()
synchronize_irq()
clear_vector()
do_IRQ()
-> vector is clear
Before the rework the vector entries of legacy interrupts were statically
assigned and occupied precious vector space while most of them were
unused. Due to that the above situation was handled silently because the
vector was handled and the core handler of the assigned interrupt
descriptor noticed that it is shut down and returned.
While this has been usually observed with legacy interrupts, this situation
is not limited to them. Any other interrupt source, e.g. MSI, can cause the
same issue.
After adding proper synchronization for level triggered interrupts, this
can only happen for edge triggered interrupts where the IO-APIC obviously
cannot provide information about interrupts in flight.
While the spurious warning is actually harmless in this case it worries
users and driver developers.
Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
of VECTOR_UNUSED when the vector is freed up.
If that above late handling happens the spurious detector will not complain
and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
that line will trigger the spurious warning as before.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>-
Tested-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.de
|
|
When an interrupt is shut down in free_irq() there might be an inflight
interrupt pending in the IO-APIC remote IRR which is not yet serviced. That
means the interrupt has been sent to the target CPUs local APIC, but the
target CPU is in a state which delays the servicing.
So free_irq() would proceed to free resources and to clear the vector
because synchronize_hardirq() does not see an interrupt handler in
progress.
That can trigger a spurious interrupt warning, which is harmless and just
confuses users, but it also can leave the remote IRR in a stale state
because once the handler is invoked the interrupt resources might be freed
already and therefore acknowledgement is not possible anymore.
Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The
callback is invoked from free_irq() via __synchronize_hardirq(). Check the
remote IRR bit of the interrupt and return 'in flight' if it is set and the
interrupt is configured in level mode. For edge mode the remote IRR has no
meaning.
As this is only meaningful for level triggered interrupts this won't cure
the potential spurious interrupt warning for edge triggered interrupts, but
the edge trigger case does not result in stale hardware state. This has to
be addressed at the vector/interrupt entry level seperately.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.de
|
|
free_irq() ensures that no hardware interrupt handler is executing on a
different CPU before actually releasing resources and deactivating the
interrupt completely in a domain hierarchy.
But that does not catch the case where the interrupt is on flight at the
hardware level but not yet serviced by the target CPU. That creates an
interesing race condition:
CPU 0 CPU 1 IRQ CHIP
interrupt is raised
sent to CPU1
Unable to handle
immediately
(interrupts off,
deep idle delay)
mask()
...
free()
shutdown()
synchronize_irq()
release_resources()
do_IRQ()
-> resources are not available
That might be harmless and just trigger a spurious interrupt warning, but
some interrupt chips might get into a wedged state.
Utilize the existing irq_get_irqchip_state() callback for the
synchronization in free_irq().
synchronize_hardirq() is not using this mechanism as it might actually
deadlock unter certain conditions, e.g. when called with interrupts
disabled and the target CPU is the one on which the synchronization is
invoked. synchronize_irq() uses it because that function cannot be called
from non preemtible contexts as it might sleep.
No functional change intended and according to Marc the existing GIC
implementations where the driver supports the callback should be able
to cope with that core change. Famous last words.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de
|
|
The function might sleep, so it cannot be called from interrupt
context. Not even with care.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.189241552@linutronix.de
|
|
When interrupts are shutdown, they are immediately deactivated in the
irqdomain hierarchy. While this looks obviously correct there is a subtle
issue:
There might be an interrupt in flight when free_irq() is invoking the
shutdown. This is properly handled at the irq descriptor / primary handler
level, but the deactivation might completely disable resources which are
required to acknowledge the interrupt.
Split the shutdown code and deactivate the interrupt after synchronization
in free_irq(). Fixup all other usage sites where this is not an issue to
invoke the combined shutdown_and_deactivate() function instead.
This still might be an issue if the interrupt in flight servicing is
delayed on a remote CPU beyond the invocation of synchronize_irq(), but
that cannot be handled at that level and needs to be handled in the
synchronize_irq() context.
Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de
|
|
The struct resource field is statically initialized
and may never change. Therefore make it const.
Signed-off-by: Enrico Weigelt <info@metux.net>
Link: https://lore.kernel.org/r/1560787211-15443-1-git-send-email-info@metux.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Fix minimum encryption key size check so that HCI_MIN_ENC_KEY_SIZE is
also allowed as stated in the comment.
This bug caused connection problems with devices having maximum
encryption key size of 7 octets (56-bit).
Fixes: 693cd8ce3f88 ("Bluetooth: Fix regression with minimum encryption key size alignment")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203997
Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|