Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
"The main changes are to move the ORC unwind table sorting from early
init to build-time - this speeds up booting.
No change in functionality intended"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Fix !CONFIG_MODULES build warning
x86/unwind/orc: Remove boot-time ORC unwind tables sorting
scripts/sorttable: Implement build-time ORC unwind table sorting
scripts/sorttable: Rename 'sortextable' to 'sorttable'
scripts/sortextable: Refactor the do_func() function
scripts/sortextable: Remove dead code
scripts/sortextable: Clean up the code to meet the kernel coding style better
scripts/sortextable: Rewrite error/success handling
|
|
with phandles"
This reverts upstream commit 18d7b2f4ee45fec422b7d82bab0b3c762ee907e4. A
revert in upstream dtc is pending.
This commit didn't work for properties such as 'interrupt-map' that have
phandle in the middle of an entry. It would also not work for a 0 or -1
phandle value that acts as a NULL.
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull header cleanup from Ingo Molnar:
"This is a treewide cleanup, mostly (but not exclusively) with x86
impact, which breaks implicit dependencies on the asm/realtime.h
header and finally removes it from asm/acpi.h"
* 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ACPI/sleep: Move acpi_get_wakeup_address() into sleep.c, remove <asm/realmode.h> from <asm/acpi.h>
ACPI/sleep: Convert acpi_wakeup_address into a function
x86/ACPI/sleep: Remove an unnecessary include of asm/realmode.h
ASoC: Intel: Skylake: Explicitly include linux/io.h for virt_to_phys()
vmw_balloon: Explicitly include linux/io.h for virt_to_phys()
virt: vbox: Explicitly include linux/io.h to pick up various defs
efi/capsule-loader: Explicitly include linux/io.h for page_to_phys()
perf/x86/intel: Explicitly include asm/io.h to use virt_to_phys()
x86/kprobes: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
x86/ftrace: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
x86/boot: Explicitly include realmode.h to handle RM reservations
x86/efi: Explicitly include realmode.h to handle RM trampoline quirk
x86/platform/intel/quark: Explicitly include linux/io.h for virt_to_phys()
x86/setup: Enhance the comments
x86/setup: Clean up the header portion of setup.c
|
|
It was requested to remove the cond_func check but the follow up patch was
overlooked. Remove it now.
Fixes: 67719ef25eeb ("smp: Add a smp_cond_func_t argument to smp_call_function_many()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200127083915.434tdkztorkklpdu@linutronix.de
|
|
There's an OF helper called of_dma_is_coherent(), which checks if a
device has a "dma-coherent" property to see if the device is coherent
for DMA.
But on some platforms devices are coherent by default, and on some
platforms it's not possible to update existing device trees to add the
"dma-coherent" property.
So add a Kconfig symbol to allow arch code to tell
of_dma_is_coherent() that devices are coherent by default, regardless
of the presence of the property.
Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
when the system has a coherent cache.
Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper")
Cc: stable@vger.kernel.org # v3.16+
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
This reverts commit e955537e3262de8e56f070b13817f525f472fa00.
Before patch e955537e32, tr_num_revoke tracked the number of revokes
added to the transaction, and tr_num_revoke_rm tracked how many
revokes were removed. But since revokes are queued off the sdp
(superblock) pointer, some transactions could remove more revokes
than they added. (e.g. revokes added by a different process).
Commit e955537e32 eliminated transaction variable tr_num_revoke_rm,
but in order to do so, it changed the accounting to always use
tr_num_revoke for its math. Since you can remove more revokes than
you add, tr_num_revoke could now become a negative value.
This negative value broke the assert in function gfs2_trans_end:
if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) &&
(tr->tr_num_revoke <=3D tr->tr_revokes)))
One way to fix this is to simply remove the tr_num_revoke clause
from the assert and allow the value to become negative. Andreas
didn't like that idea, so instead, we decided to revert e955537e32.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This adds MSI support to the Broadcom STB PCIe host controller. The MSI
controller is physically located within the PCIe block, however, there
is no reason why the MSI controller could not be moved elsewhere in the
future. MSIX is not supported by the HW.
Since the internal Brcmstb MSI controller is intertwined with the PCIe
controller, it is not its own platform device but rather part of the
PCIe platform device.
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Co-developed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
|
|
This adds a basic driver for Broadcom's STB PCIe controller, for now
aimed at Raspberry Pi 4's SoC, bcm2711.
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Co-developed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
[lorenzo.pieralisi@arm.com: updated brcm_pcie_get_rc_bar2_size_and_offset()according to https://lore.kernel.org/linux-pci/be8ddb33a7360af1815cf686f77f3f0913d02be3.camel@suse.de]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
|
|
We currently allocate redistributor region structures for
individual redistributors when ACPI doesn't present us with
compact MMIO regions covering multiple redistributors.
It turns out that we allocate these structures even when
the redistributor is flagged as disabled by ACPI. It works
fine until someone actually tries to tarse one of these
structures, and access the corresponding MMIO region.
Instead, track the number of enabled redistributors, and
only allocate what is required. This makes sure that there
is no invalid data to misuse.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Heyi Guo <guoheyi@huawei.com>
Tested-by: Heyi Guo <guoheyi@huawei.com>
Link: https://lore.kernel.org/r/20191216062745.63397-1-guoheyi@huawei.com
|
|
According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32]
RES0 [1]. When reading the register, the value is truncated to the least
significant 32 bits [2], and on writes, TimerValue is treated as a signed
32-bit integer [1, 2].
When the guest behaves correctly and writes 32-bit values, treating TVAL
as an unsigned 64 bit register works as expected. However, things start
to break down when the guest writes larger values, because
(u64)0x1_ffff_ffff = 8589934591. but (s32)0x1_ffff_ffff = -1, and the
former will cause the timer interrupt to be asserted in the future, but
the latter will cause it to be asserted now. Let's treat TVAL as a
signed 32-bit register on writes, to match the behaviour described in
the architecture, and the behaviour experimentally exhibited by the
virtual timer on a non-vhe host.
[1] Arm DDI 0487E.a, section D13.8.18
[2] Arm DDI 0487E.a, section D11.2.4
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
[maz: replaced the read-side mask with lower_32_bits]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation")
Link: https://lore.kernel.org/r/20200127103652.2326-1-alexandru.elisei@arm.com
|
|
Let the code never use unsupported event counters. Change
kvm_pmu_handle_pmcr() to only reset supported counters and
kvm_pmu_vcpu_reset() to only stop supported counters.
Other actions are filtered on the supported counters in
kvm/sysregs.c
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-5-eric.auger@redhat.com
|
|
The newly added metadata documentation title was not properly indented
resulting in doc build break:
Sphinx parallel build error:
docutils.utils.SystemMessage: /linux/Documentation/driver-api/dmaengine/client.rst:155: (SEVERE/4) Unexpected section title.
Optional: per descriptor metadata
---------------------------------
Fix this by doing the right indent
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 7d083ae98357 ("dmaengine: doc: Add sections for per descriptor metadata support")
Link: https://lore.kernel.org/r/20200128125032.1650816-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
At the moment a SW_INCR counter always overflows on 32-bit
boundary, independently on whether the n+1th counter is
programmed as CHAIN.
Check whether the SW_INCR counter is a 64b counter and if so,
implement the 64b logic.
Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-4-eric.auger@redhat.com
|
|
At the moment we update the chain bitmap on type setting. This
does not take into account the enable state of the odd register.
Let's make sure a counter is never considered as chained if
the high counter is disabled.
We recompute the chain state on enable/disable and type changes.
Also let create_perf_event() use the chain bitmap and not use
kvm_pmu_idx_has_chain_evtype().
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-3-eric.auger@redhat.com
|
|
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1
if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR.
For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.
Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200124142535.29386-2-eric.auger@redhat.com
|
|
Implement user_access_save() and user_access_restore()
On 8xx and radix:
- On save, get the value of the associated special register then
prevent user access.
- On restore, set back the saved value to the associated special
register.
On book3s/32:
- On save, get the value stored in current->thread.kuap and prevent
user access.
- On restore, regenerate address range from the stored value and
reopen read/write access for that range.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/54f2f74938006b33c55a416674807b42ef222068.1579866752.git.christophe.leroy@c-s.fr
|
|
Today, when a function like strncpy_from_user() is called,
the userspace access protection is de-activated and re-activated
for every word read.
By implementing user_access_begin and friends, the protection
is de-activated at the beginning of the copy and re-activated at the
end.
Implement user_access_begin(), user_access_end() and
unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user()
For the time being, we keep user_access_save() and
user_access_restore() as nops.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/36d4fbf9e56a75994aca4ee2214c77b26a5a8d35.1579866752.git.christophe.leroy@c-s.fr
|
|
In preparation of implementing user_access_begin and friends
on powerpc, the book3s/32 version of prevent_user_access() need
to be prepared for user_access_end().
user_access_end() doesn't provide the address and size which
were passed to user_access_begin(), required by prevent_user_access()
to know which segment to modify.
The list of segments which where unprotected by allow_user_access()
are available in current->kuap. But we don't want prevent_user_access()
to read this all the time, especially everytime it is 0 (for instance
because the access was not a write access).
Implement a special direction named KUAP_CURRENT. In this case only,
the addr and end are retrieved from current->kuap.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/55bcc1f25d8200892a31f67a0b024ff3b816c3cc.1579866752.git.christophe.leroy@c-s.fr
|
|
NULL addr is a user address. Don't waste time checking it. If
someone tries to access it, it will SIGFAULT the same way as for
address 1, so no need to make it special.
The special case is when not doing a write, in that case we want
to drop the entire function. This is now handled by 'dir' param
and not by the nulity of 'to' anymore.
Also make beginning of prevent_user_access() similar
to beginning of allow_user_access(), and tell the compiler
that writing in kernel space or with a 0 length is unlikely
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/85e971223dfe6ace734637db1841678939a76155.1579866752.git.christophe.leroy@c-s.fr
|
|
__builtin_constant_p() always return 0 for pointers, so on RADIX
we always end up opening both direction (by writing 0 in SPR29):
0000000000000170 <._copy_to_user>:
...
1b0: 4c 00 01 2c isync
1b4: 39 20 00 00 li r9,0
1b8: 7d 3d 03 a6 mtspr 29,r9
1bc: 4c 00 01 2c isync
1c0: 48 00 00 01 bl 1c0 <._copy_to_user+0x50>
1c0: R_PPC64_REL24 .__copy_tofrom_user
...
0000000000000220 <._copy_from_user>:
...
2ac: 4c 00 01 2c isync
2b0: 39 20 00 00 li r9,0
2b4: 7d 3d 03 a6 mtspr 29,r9
2b8: 4c 00 01 2c isync
2bc: 7f c5 f3 78 mr r5,r30
2c0: 7f 83 e3 78 mr r3,r28
2c4: 48 00 00 01 bl 2c4 <._copy_from_user+0xa4>
2c4: R_PPC64_REL24 .__copy_tofrom_user
...
Use an explicit parameter for direction selection, so that GCC
is able to see it is a constant:
00000000000001b0 <._copy_to_user>:
...
1f0: 4c 00 01 2c isync
1f4: 3d 20 40 00 lis r9,16384
1f8: 79 29 07 c6 rldicr r9,r9,32,31
1fc: 7d 3d 03 a6 mtspr 29,r9
200: 4c 00 01 2c isync
204: 48 00 00 01 bl 204 <._copy_to_user+0x54>
204: R_PPC64_REL24 .__copy_tofrom_user
...
0000000000000260 <._copy_from_user>:
...
2ec: 4c 00 01 2c isync
2f0: 39 20 ff ff li r9,-1
2f4: 79 29 00 04 rldicr r9,r9,0,0
2f8: 7d 3d 03 a6 mtspr 29,r9
2fc: 4c 00 01 2c isync
300: 7f c5 f3 78 mr r5,r30
304: 7f 83 e3 78 mr r3,r28
308: 48 00 00 01 bl 308 <._copy_from_user+0xa8>
308: R_PPC64_REL24 .__copy_tofrom_user
...
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Spell out the directions, s/KUAP_R/KUAP_READ/ etc.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f4e88ec4941d5facb35ce75026b0112f980086c3.1579866752.git.christophe.leroy@c-s.fr
|
|
At the moment, bad_kuap_fault() reports a fault only if a bad access
to userspace occurred while access to userspace was not granted.
But if a fault occurs for a write outside the allowed userspace
segment(s) that have been unlocked, bad_kuap_fault() fails to
detect it and the kernel loops forever in do_page_fault().
Fix it by checking that the accessed address is within the allowed
range.
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f48244e9485ada0a304ed33ccbb8da271180c80d.1579866752.git.christophe.leroy@c-s.fr
|
|
Commit f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and
rename TI_CPU") broke the CPU wake-up from sleep mode (i.e. when
_TLF_SLEEPING is set) by delaying the tovirt(r2, r2).
This is because r2 is not restored by fast_exception_return. It used
to work (by chance ?) because CPU wake-up interrupt never comes from
user, so r2 is expected to point to 'current' on return.
Commit e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access
Protection") broke it even more by clobbering r0 which is not
restored by fast_exception_return either.
Use r6 instead of r0. This is possible because r3-r6 are restored by
fast_exception_return and only r3-r5 are used for exception arguments.
For r2 it could be converted back to virtual address, but stay on the
safe side and restore it from the stack instead. It should be live
in the cache at that moment, so loading from the stack should make
no difference compared to converting it from phys to virt.
Fixes: f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU")
Fixes: e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6d02c3ae6ad77af34392e98117e44c2bf6d13ba1.1580121710.git.christophe.leroy@c-s.fr
|
|
According to reviews, Time Namespace source was moved from
kernel/time_namespace.c to kernel/time/namespace.c between patchset
versions, while the path in MAINTERNERS file wasn't adjusted properly.
Correct it, so get_maintainer.pl produces a correct emails list again.
Fixes: 769071ac9f20 ("ns: Introduce Time Namespace")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200127143748.268515-1-dima@arista.com
|
|
Add default MDIO_BCM_IPROC Kconfig setting such that it is default
on for IPROC family of devices.
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Multicast and broadcast packets can be looped from egress to ingress
pre segmentation with dev_loopback_xmit. That function unconditionally
sets ip_summed to CHECKSUM_UNNECESSARY.
udp_rcv_segment segments gso packets in the udp rx path. Segmentation
usually executes on egress, and does not expect packets of this type.
__udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
the offsets are not correct for gso_make_checksum.
UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
to the correct pseudo header checksum. Reset ip_summed to this type.
(CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The old netem mailing list was inactive and recently was targeted by
spammers. Switch to just using netdev mailing list which is where all
the real change happens.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tegra HDA has FIFO size which can hold upto 10 audio frames to support
DVFS. When HDA DMA RUN bit is set to 0 to stop the stream, the DMA RUN
bit will be cleared to 0 only after transferring all the remaining audio
frames queued up in the fifo. This is not in sync with spec which states
the controller will stop transmitting(output) in the beginning of the
next frame for the relevant stream.
The above behavior with Tegra HDA was resulting in machine check error
during the system suspend flow with active audio playback with below kernel
error logs.
[ 33.524583] mc-err: [mcerr] (hda) csr_hdar: EMEM address decode error
[ 33.531088] mc-err: [mcerr] status = 0x20000015; addr = 0x00000000
[ 33.537431] mc-err: [mcerr] secure: no, access-type: read, SMMU fault: none
This was due to the fifo has more than one audio frame when the DMA
RUN bit is set to 0 during system suspend flow and the timeout handling in
snd_hdac_stream_sync() was not designed to handle this scenario. So the
DMA will continue running even after timeout hit until all remaining
audio frames in the fifo are transferred, but the suspend flow will try
to reset the controller and turn off the hda clocks without the knowledge
of the DMA is still running and could result in mc-err.
The above issue can be resolved by doing stream reset with the help of
snd_hdac_stream_reset() which would ensure the DMA RUN bit is cleared
if the timeout was hit in snd_hdac_stream_sync().
Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
Link: https://lore.kernel.org/r/20200128051508.26064-1-mkumard@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.
In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.
Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory. Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.
[ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195] ? __schedule+0x29b/0x630
[ 1626.609197] ? wait_for_completion+0xe0/0x170
[ 1626.609198] schedule+0x30/0xb0
[ 1626.609200] schedule_timeout+0x1f6/0x2f0
[ 1626.609202] ? blk_finish_plug+0x21/0x2e
[ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206] ? wait_for_completion+0xe0/0x170
[ 1626.609208] wait_for_completion+0x108/0x170
[ 1626.609210] ? wake_up_q+0x70/0x70
[ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214] ? xfs_bwrite+0x25/0x60
[ 1626.609215] xfs_buf_iowait+0x22/0xf0
[ 1626.609218] __xfs_buf_submit+0x12e/0x250
[ 1626.609220] xfs_bwrite+0x25/0x60
[ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228] super_cache_scan+0x152/0x1a0
[ 1626.609231] do_shrink_slab+0x12c/0x2d0
[ 1626.609233] shrink_slab+0x9c/0x2a0
[ 1626.609235] shrink_node+0xd7/0x470
[ 1626.609237] do_try_to_free_pages+0xbf/0x380
[ 1626.609240] try_to_free_pages+0xd9/0x1f0
[ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251] ? ___slab_alloc+0x238/0x560
[ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259] skb_page_frag_refill+0x97/0xd0
[ 1626.609274] sk_page_frag_refill+0x1d/0x80
[ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304] tcp_sendmsg+0x27/0x40
[ 1626.609307] sock_sendmsg+0x54/0x60
[ 1626.609308] ___sys_sendmsg+0x29f/0x320
[ 1626.609313] ? sock_poll+0x66/0xb0
[ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320] ? ep_send_events_proc+0xe6/0x230
[ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329] ? __hrtimer_init+0xb0/0xb0
[ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334] ? ep_poll+0x26c/0x4a0
[ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339] ? release_sock+0x43/0x90
[ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342] __sys_sendmsg+0x47/0x80
[ 1626.609347] do_syscall_64+0x5f/0x1c0
[ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Masato Suzuki <masato.suzuki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"The performance deterioration departement provides a few non-scary
fixes and improvements:
- Update the cached HLE state when the TSX state is changed via the
new control register. This ensures feature bit consistency.
- Exclude the new Zhaoxin CPUs from Spectre V2 and SWAPGS
vulnerabilities"
* tag 'x86-pti-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerability
x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2
x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt departement provides:
- A mechanism to shield isolated tasks from managed interrupts:
The affinity of managed interrupts is completely controlled by the
kernel and user space has no influence on them. The reason is that
the automatically assigned affinity correlates to the multi-queue
CPU handling of block devices.
If the generated affinity mask spaws both housekeeping and isolated
CPUs the interrupt could be routed to an isolated CPU which would
then be disturbed by I/O submitted by a housekeeping CPU.
The new mechamism ensures that as long as one housekeeping CPU is
online in the assigned affinity mask the interrupt is routed to a
housekeeping CPU.
If there is no online housekeeping CPU in the affinity mask, then
the interrupt is routed to an isolated CPU to keep the device queue
intact, but unless the isolated CPU submits I/O by itself these
interrupts are not raised.
- A small addon to the device tree irqdomain core code to avoid
duplication in irq chip drivers
- Conversion of the SiFive PLIC to hierarchical domains
- The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
NXP INTMUX, Meson A1 GPIO
- The first cut of support for the new ARM GICv4.1
- The usual pile of fixes and improvements in core and driver code"
* tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq, sched/isolation: Isolate from handling managed interrupts
irqchip/gic-v4.1: Allow direct invalidation of VLPIs
irqchip/gic-v4.1: Suppress per-VLPI doorbell
irqchip/gic-v4.1: Add VPE INVALL callback
irqchip/gic-v4.1: Add VPE eviction callback
irqchip/gic-v4.1: Add VPE residency callback
irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
irqchip/gic-v4.1: Plumb skeletal VPE irqchip
irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
irqchip/gic-v3: Add GICv4.1 VPEID size discovery
irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
irqdomain: Fix a memory leak in irq_domain_push_irq()
irqchip: Add NXP INTMUX interrupt multiplexer support
dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
irqchip: Define EXYNOS_IRQ_COMBINER
irqchip/meson-gpio: Add support for meson a1 SoCs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"A small set of SMP core code changes:
- Rework the smp function call core code to avoid the allocation of
an additional cpumask
- Remove the not longer required GFP argument from on_each_cpu_cond()
and on_each_cpu_cond_mask() and fixup the callers"
* tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove allocation mask from on_each_cpu_cond.*()
smp: Add a smp_cond_func_t argument to smp_call_function_many()
smp: Use smp_cond_func_t as type for the conditional function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects update from Thomas Gleixner:
"A single commit for debug objects which fixes a pile of potential data
races detected by KCSAN"
* tag 'core-debugobjects-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Fix various data races
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog updates from Thomas Gleixner:
"A set of watchdog/softlockup related improvements:
- Enforce that the watchdog timestamp is always valid on boot. The
original implementation caused a watchdog disabled gap of one
second in the boot process due to truncation of the underlying
sched clock.
The sched clock is divided by 1e9 to convert nanoseconds to
seconds. So for the first second of the boot process the result is
0 which is at the same time the indicator to disable the watchdog.
The trivial fix is to change the disabled indicator to ULONG_MAX.
- Two cleanup patches removing unused and redundant code which got
forgotten to be cleaned up in previous changes"
* tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/softlockup: Enforce that timestamp is valid on boot
watchdog/softlockup: Remove obsolete check of last reported task
watchdog: Remove soft_lockup_hrtimer_cnt and related code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the generic VDSO code which missed 5.5:
- Make the update to the coarse timekeeper unconditional.
This is required because the coarse timekeeper interfaces in the
VDSO do not depend on a VDSO capable clocksource. If the system
does not have a VDSO capable clocksource and the update is
depending on the VDSO capable clocksource, the coarse VDSO
interfaces would operate on stale data forever.
- Invert the logic of __arch_update_vdso_data() to avoid further head
scratching.
Tripped over this several times while analyzing the update problem
above"
* tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Update coarse timekeeper unconditionally
lib/vdso: Make __arch_update_vdso_data() logic understandable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux update from Paul Moore:
"This is one of the bigger SELinux pull requests in recent years with
28 patches. Everything is passing our test suite and the highlights
are below:
- Mark CONFIG_SECURITY_SELINUX_DISABLE as deprecated. We're some time
away from actually attempting to remove this in the kernel, but the
only distro we know that still uses it (Fedora) is working on
moving away from this so we want to at least let people know we are
planning to remove it.
- Reorder the SELinux hooks to help prevent bad things when SELinux
is disabled at runtime. The proper fix is to remove the
CONFIG_SECURITY_SELINUX_DISABLE functionality (see above) and just
take care of it at boot time (e.g. "selinux=0").
- Add SELinux controls for the kernel lockdown functionality,
introducing a new SELinux class/permissions: "lockdown { integrity
confidentiality }".
- Add a SELinux control for move_mount(2) that reuses the "file {
mounton }" permission.
- Improvements to the SELinux security label data store lookup
functions to speed up translations between our internal label
representations and the visible string labels (both directions).
- Revisit a previous fix related to SELinux inode auditing and
permission caching and do it correctly this time.
- Fix the SELinux access decision cache to cleanup properly on error.
In some extreme cases this could limit the cache size and result in
a decrease in performance.
- Enable SELinux per-file labeling for binderfs.
- The SELinux initialized and disabled flags were wrapped with
accessors to ensure they are accessed correctly.
- Mark several key SELinux structures with __randomize_layout.
- Changes to the LSM build configuration to only build
security/lsm_audit.c when needed.
- Changes to the SELinux build configuration to only build the IB
object cache when CONFIG_SECURITY_INFINIBAND is enabled.
- Move a number of single-caller functions into their callers.
- Documentation fixes (/selinux -> /sys/fs/selinux).
- A handful of cleanup patches that aren't worth mentioning on their
own, the individual descriptions have plenty of detail"
* tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (28 commits)
selinux: fix regression introduced by move_mount(2) syscall
selinux: do not allocate ancillary buffer on first load
selinux: remove redundant allocation and helper functions
selinux: remove redundant selinux_nlmsg_perm
selinux: fix wrong buffer types in policydb.c
selinux: reorder hooks to make runtime disable less broken
selinux: treat atomic flags more carefully
selinux: make default_noexec read-only after init
selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.
selinux: remove redundant msg_msg_alloc_security
Documentation,selinux: fix references to old selinuxfs mount point
selinux: deprecate disabling SELinux and runtime
selinux: allow per-file labelling for binderfs
selinuxfs: use scnprintf to get real length for inode
selinux: remove set but not used variable 'sidtab'
selinux: ensure the policy has been loaded before reading the sidtab stats
selinux: ensure we cleanup the internal AVC counters on error in avc_update()
selinux: randomize layout of key structures
selinux: clean up selinux_enabled/disabled/enforcing_boot
selinux: remove unnecessary selinux cred request
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"One small audit patch for the Linux v5.6 merge window, and
unsurprisingly it passes our test suite with flying colors"
* tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Add __rcu annotation to RCU pointer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cgroup2 interface for hugetlb controller. I think this was the last
remaining bit which was missing from cgroup2
- fixes for race and a spurious warning in threaded cgroup handling
- other minor changes
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
iocost: Fix iocost_monitor.py due to helper type mismatch
cgroup: Prevent double killing of css when enabling threaded cgroup
cgroup: fix function name in comment
mm: hugetlb controller for cgroups v2
|
|
Pull workqueue updates from Tejun Heo:
"Just a couple tracepoint patches"
* 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove workqueue_work event class
workqueue: add worker function to workqueue_execute_end tracepoint
|
|
In preparation for sharing an io-wq across different users, add a
reference count that manages destruction of it.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Merely enabling I2C and RTC selects REGMAP_I2C and REGMAP_SPI, even when
no driver needs it. While the former can be moduler, the latter cannot,
and thus becomes built-in.
Fix this by moving the select statements for REGMAP_I2C and REGMAP_SPI
from the RTC_I2C_AND_SPI helper to the individual drivers that depend on
it.
Note that the comment for RTC_I2C_AND_SPI refers to SND_SOC_I2C_AND_SPI
for more information, but the latter does not select REGMAP_{I2C,SPI}
itself, and defers that to the individual drivers, too.
Fixes: 080481f54ef62121 ("rtc: merge ds3232 and ds3234")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: kbuild test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20200112171349.22268-1-geert@linux-m68k.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
In case of out of memory the second argument of percpu_ref_put_many() in
io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly
wrong.
Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some i2c RTC drivers are using regmap but are not selecting REGMAP_I2C
which may lead to build failures.
Link: https://lore.kernel.org/r/20200127221724.10160-2-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The RTC_DRV_SD3078 is indented using spaces, use tabs instead.
Link: https://lore.kernel.org/r/20200127221724.10160-1-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Convert the Goodix binding to DT schema format using json-schema
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20200108091118.5130-3-benjamin.gaignard@st.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Add touchscreen schema for common properties
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20200108091118.5130-2-benjamin.gaignard@st.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|