Age | Commit message (Collapse) | Author |
|
We need to have a synchronize_rcu before free'ing the sockmap because
any outstanding psock references will have a pointer to the map and
when they use this could trigger a use after free.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
__sock_map_delete() may be called from a tcp event such as unhash or
close from the following trace,
tcp_bpf_close()
tcp_bpf_remove()
sk_psock_unlink()
sock_map_delete_from_link()
__sock_map_delete()
In this case the sock lock is held but this only protects against
duplicate removals on the TCP side. If the map is free'd then we have
this trace,
sock_map_free
xchg() <- replaces map entry
sock_map_unref()
sk_psock_put()
sock_map_del_link()
The __sock_map_delete() call however uses a read, test, null over the
map entry which can result in both paths trying to free the map
entry.
To fix use xchg in TCP paths as well so we avoid having two references
to the same map entry.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call the tls close callback. Because of this a user could
disconnect a socket then put it in a LISTEN state which would break
our assumptions about sockets always being ESTABLISHED state.
More directly because close() can call unhash() and unhash is
implemented by sockmap if a sockmap socket has TLS enabled we can
incorrectly destroy the psock from unhash() and then call its close
handler again. But because the psock (sockmap socket representation)
is already destroyed we call close handler in sk->prot. However,
in some cases (TLS BASE/BASE case) this will still point at the
sockmap close handler resulting in a circular call and crash reported
by syzbot.
To fix both above issues implement the unhash() routine for TLS.
v4:
- add note about tls offload still needing the fix;
- move sk_proto to the cold cache line;
- split TX context free into "release" and "free",
otherwise the GC work itself is in already freed
memory;
- more TX before RX for consistency;
- reuse tls_ctx_free();
- schedule the GC work after we're done with context
to avoid UAF;
- don't set the unhash in all modes, all modes "inherit"
TLS_BASE's callbacks anyway;
- disable the unhash hook for TLS_HW.
Fixes: 3c4d7559159bf ("tls: kernel TLS support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.
To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock.
By restructuring the code we can avoid droping lock and then
reclaiming it. To simplify this we do the following,
tls_sk_proto_close
set_bit(CLOSING)
set_bit(SCHEDULE)
cancel_delay_work_sync() <- cancel workqueue
lock_sock(sk)
...
release_sock(sk)
strp_done()
Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.
Tested with net selftests and bpf selftests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The deprecated TOE offload doesn't actually do anything in
tls_sk_proto_close() - all TLS code is skipped and context
not freed. Remove the callback to make it easier to refactor
tls_sk_proto_close().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().
In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
|
|
Update MAINTAINERS and .mailmap with my @linaro.org address, since I
don't have access to my @arm.com address anymore.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Picking up 7aaddd96d5f ("drm/modes: Don't apply cmdline's rotation if it wasn't specified")
from drm-misc-next-fixes. It missed the merge window.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
|
|
When a pin is active-low, logical trigger edge should be inverted to match
the same interrupt opportunity.
For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
IRQ requesting doesn't need to do any modification except to configuring
GPIOHANDLE_REQUEST_ACTIVE_LOW.
For example, we want to catch the event when the button is pushed. The
button on the original board drives level to be low when it is pushed, and
drives level to be high when it is released.
In user space we can do:
req.handleflags = GPIOHANDLE_REQUEST_INPUT;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
while (1) {
read(fd, &dat, sizeof(dat));
if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
printf("button pushed\n");
}
Run the same logic on another board which the polarity of the button is
inverted; it drives level to be high when pushed, and level to be low when
released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
req.handleflags = GPIOHANDLE_REQUEST_INPUT |
GPIOHANDLE_REQUEST_ACTIVE_LOW;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
At the result, there are no any events caught when the button is pushed.
By the way, button releasing will emit a "falling" event. The timing of
"falling" catching is not expected.
Cc: stable@vger.kernel.org
Signed-off-by: Michael Wu <michael.wu@vatics.com>
Tested-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
|
While using the mmc_spi driver occasionally errors like this popped up:
mmcblk0: error -84 transferring data end_request: I/O error, dev mmcblk0, sector 581756
I looked on the Internet for occurrences of the same problem and came
across a helpful post [1]. It includes source code to reproduce the bug.
There is also an analysis about the cause. During transmission data in the
supplied buffer is being modified. Thus the previously calculated checksum
is not correct anymore.
After some digging I found out that device drivers are supposed to report
they need stable writes. To fix this I set the appropriate flag at queue
initialization if CRC checksumming is enabled for that SPI host.
[1]
https://groups.google.com/forum/#!msg/sim1/gLlzWeXGFr8/KevXinUXfc8J
Signed-off-by: Andreas Koop <andreas.koop@zf.com>
[shihpo: Rebase on top of v5.3-rc1]
Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
CC: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.
Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
|
|
The new hcall named H_SCM_UNBIND_ALL has been introduce that can
unbind all or specific scm memory assigned to an lpar. This is
more efficient than using H_SCM_UNBIND_MEM as currently we don't
support partial unbind of scm memory.
Hence this patch proposes following changes to drc_pmem_unbind():
* Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
H_SCM_UNBIND_ALL.
* Update drc_pmem_unbind() to handles cases when PHYP asks the guest
kernel to wait for specific amount of time before retrying the
hcall via the 'LONG_BUSY' return value.
* Ensure appropriate error code is returned back from the function
in case of an error.
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
|
|
Update the hvcalls.h to include op-codes for new hcalls introduce to
manage SCM memory. Also update existing hcall definitions to reflect
current papr specification for SCM.
The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
transient proposals and there support was never implemented by
Power-VM nor they were used anywhere in Linux kernel. Hence we don't
expect anyone to be impacted by this change.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
|
|
My @arm.com address will stop working in a couple of weeks. Update
MAINTAINERS and .mailmap files with an address I'll have access to.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix a typo in my name for the KVM-ARM reviewers entry.
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
I will soon lose access to my @arm.com email address, so let's
update the MAINTAINERS file to point to my @kernel.org address,
as well as .mailmap for good measure.
Note that my @arm.com address will still work, but someone else
will be reading whatever is sent there. Don't say you didn't know!
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Arguments are supposed to be ordered high then low.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Fixes: ed80a13bb4c4 ("mmc: meson-mx-sdio: Add a driver for the Amlogic
Meson8 and Meson8b SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after
response errors.") we fixed a tuning-induced hang that I saw when
stress testing tuning on certain SD cards. I won't re-hash that whole
commit, but the summary is that as a normal part of tuning you need to
deal with transfer errors and there were cases where these transfer
errors was putting my system into a bad state causing all future
transfers to fail. That commit fixed handling of the transfer errors
for me.
In downstream Chrome OS my fix landed and had the same behavior for
all SD/MMC commands. However, it looks like when the commit landed
upstream we limited it to only SD tuning commands. Presumably this
was to try to get around problems that Alim Akhtar reported on exynos
[1].
Unfortunately while stress testing reboots (and suspend/resume) on
some rk3288-based Chromebooks I found the same problem on the eMMC on
some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC
tuning command is different (MMC_SEND_TUNING_BLOCK_HS200
vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the
same situation.
I'm hoping that whatever problems exynos was having in the past are
somehow magically fixed now and we can make the behavior the same for
all commands.
[1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com
Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Alim Akhtar <alim.akhtar@gmail.com>
Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
When the SD host controller tries to probe again due to the derferred
probe mechanism, it will always keep the SD host device as runtime
resume state due to missing the runtime put operation in error path
last time.
Thus add the pm_runtime_put_noidle() in error path to make the PM runtime
counter balance, which can make the SD host device's PM runtime work well.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
git://people.freedesktop.org/~gabbayo/linux into char-misc-linux
Oded writes:
This tag contains the following fixes:
- Fix to VRHOT event handling from the ASIC. No need to reset the ASIC,
just notify it in dmesg.
- Fix printk specifier for printing dma address
* tag 'misc-habanalabs-fixes-2019-07-22' of git://people.freedesktop.org/~gabbayo/linux:
habanalabs: don't reset device when getting VRHOT
habanalabs: use %pad for printing a dma_addr_t
|
|
Don't undo the PM initialization if we error out before we managed to
initialize it. The call to pm_runtime_disable() without being preceded
by pm_runtime_enable() would disturb the balance of the Force.
In practice, this happens if we fail to allocate any of the GPIOS ("cs",
"ready") due to -EPROBE_DEFER because we're getting probled before the
GPIO driver.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Link: https://lore.kernel.org/r/20190719122713.3444318-1-lkundrak@v3.sk
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
make sure they can be included from user-space.
Currently, header.h and fw.h are excluded from the test coverage.
To make them join the compile-test, we need to fix the build errors
attached below.
For a case like this, we decided to use __u{8,16,32,64} variable types
in this discussion:
https://lkml.org/lkml/2019/6/5/18
Build log:
CC usr/include/sound/sof/header.h.s
CC usr/include/sound/sof/fw.h.s
In file included from <command-line>:32:0:
./usr/include/sound/sof/header.h:19:2: error: unknown type name ‘uint32_t’
uint32_t magic; /**< 'S', 'O', 'F', '\0' */
^~~~~~~~
./usr/include/sound/sof/header.h:20:2: error: unknown type name ‘uint32_t’
uint32_t type; /**< component specific type */
^~~~~~~~
./usr/include/sound/sof/header.h:21:2: error: unknown type name ‘uint32_t’
uint32_t size; /**< size in bytes of data excl. this struct */
^~~~~~~~
./usr/include/sound/sof/header.h:22:2: error: unknown type name ‘uint32_t’
uint32_t abi; /**< SOF ABI version */
^~~~~~~~
./usr/include/sound/sof/header.h:23:2: error: unknown type name ‘uint32_t’
uint32_t reserved[4]; /**< reserved for future use */
^~~~~~~~
./usr/include/sound/sof/header.h:24:2: error: unknown type name ‘uint32_t’
uint32_t data[0]; /**< Component data - opaque to core */
^~~~~~~~
In file included from <command-line>:32:0:
./usr/include/sound/sof/fw.h:49:2: error: unknown type name ‘uint32_t’
uint32_t size; /* bytes minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:50:2: error: unknown type name ‘uint32_t’
uint32_t offset; /* offset from base */
^~~~~~~~
./usr/include/sound/sof/fw.h:64:2: error: unknown type name ‘uint32_t’
uint32_t size; /* bytes minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:65:2: error: unknown type name ‘uint32_t’
uint32_t num_blocks; /* number of blocks */
^~~~~~~~
./usr/include/sound/sof/fw.h:73:2: error: unknown type name ‘uint32_t’
uint32_t file_size; /* size of file minus this header */
^~~~~~~~
./usr/include/sound/sof/fw.h:74:2: error: unknown type name ‘uint32_t’
uint32_t num_modules; /* number of modules */
^~~~~~~~
./usr/include/sound/sof/fw.h:75:2: error: unknown type name ‘uint32_t’
uint32_t abi; /* version of header format */
^~~~~~~~
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20190721142308.30306-1-yamada.masahiro@socionext.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The TS3A227E says that the headset keypress detection needs the MICBIAS
power in order to report the key events to ensure proper operation
The headset keypress detection needs the MICBIAS power in order to report
the key events all the time as long as MIC is present. So MICBIAS pin
is forced on when a MICROPHONE is detected.
On Veyron Minnie I observed that if the MICBIAS power is not present and
the key press detection is activated (just because it is enabled when you
insert a headset), it randomly reports a keypress on insert.
E.g. (KEY_PLAYPAUSE)
Event: (SW_HEADPHONE_INSERT), value 1
Event: (SW_MICROPHONE_INSERT), value 1
Event: -------------- SYN_REPORT ------------
Event: (KEY_PLAYPAUSE), value 1
Userspace thinks that KEY_PLAYPAUSE is pressed and produces the annoying
effect that the media player starts a play/pause loop.
Note that, although most of the time the key reported is the one
associated with BTN_0, not always this is true. On my tests I also saw
different keys reported
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20190719173929.24065-1-enric.balletbo@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When sample rate of TX is different with sample rate of RX in
async mode, the MFreq selection will be wrong.
For example, sysclk = 24.576MHz, TX rate = 96000Hz, RX rate = 48000Hz.
Then ratio of TX = 256, ratio of RX = 512, For MFreq is shared by TX
and RX instance, the correct value of MFreq is 2 for both TX and RX.
But original method will cause MFreq = 0 for TX, MFreq = 2 for RX.
If TX is started after RX, RX will be impacted, RX work abnormal with
MFreq = 0.
This patch is to select proper MFreq value according to TX rate and
RX rate.
Fixes: 0c516b4ff85c ("ASoC: cs42xx8: Add codec driver support for CS42448/CS42888")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/20190716094547.46787-1-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
DPCM uses snd_soc_dapm_dai_get_connected_widgets to build a
list of the widgets connected to a specific front end DAI so it
can search through this list for available back end DAIs. The
custom_stop_condition was added to is_connected_ep to facilitate this
list not containing more widgets than is necessary. Doing so both
speeds up the DPCM handling as less widgets need to be searched and
avoids issues with CODEC to CODEC links as these would be confused
with back end DAIs if they appeared in the list of available widgets.
custom_stop_condition was implemented by aborting the graph walk
when the condition is triggered, however there is an issue with this
approach. Whilst walking the graph is_connected_ep should update the
endpoints cache on each widget, if the walk is aborted the number
of attached end points is unknown for that sub-graph. When the stop
condition triggered, the original patch ignored the triggering widget
and returned zero connected end points; a later patch updated this
to set the triggering widget's cache to 1 and return that. Both of
these approaches result in inaccurate values being stored in various
end point caches as the values propagate back through the graph,
which can result in later issues with widgets powering/not powering
unexpectedly.
As the original goal was to reduce the size of the widget list passed
to the DPCM code, the simplest solution is to limit the functionality
of the custom_stop_condition to the widget list. This means the rest
of the graph will still be processed resulting in correct end point
caches, but only widgets up to the stop condition will be added to the
returned widget list.
Fixes: 6742064aef7f ("ASoC: dapm: support user-defined stop condition in dai_get_connected_widgets")
Fixes: 5fdd022c2026 ("ASoC: dpcm: play nice with CODEC<->CODEC links")
Fixes: 09464974eaa8 ("ASoC: dapm: Fix to return correct path list in is_connected_ep.")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20190718084333.15598-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Shall help finding use-after-free bugs earlier.
Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After reverting commit 240c35a3783a (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.
This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The idea before commit 240c35a37 (which has just been reverted)
was that we have the following FPU states:
userspace (QEMU) guest
---------------------------------------------------------------------------
processor vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
vcpu->arch.user_fpu processor
>>> preempt out
vcpu->arch.user_fpu current->thread.fpu
>>> preempt in
vcpu->arch.user_fpu processor
>>> back to userspace
>>> kvm_put_guest_fpu
processor vcpu->arch.guest_fpu
---------------------------------------------------------------------------
With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.
Reported-by: Thomas Lambertz <mail@thomaslambertz.de>
Reported-by: anthony <antdev66@gmail.com>
Tested-by: anthony <antdev66@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lambertz <mail@thomaslambertz.de>
Cc: anthony <antdev66@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add a comment in front of the warning. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit 240c35a3783ab9b3a0afaba0dde7291295680a6b
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.
Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Letting this pend may cause nested_get_vmcs12_pages to run against an
invalid state, corrupting the effective vmcs of L1.
This was triggerable in QEMU after a guest corruption in L2, followed by
a L1 reset.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 7f7f1ba33cf2 ("KVM: x86: do not load vmcs12 pages while still in SMM")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This helper is required from generic huge_pte_alloc() which is available
when arch subscribes ARCH_WANT_GENERAL_HUGETLB. arm64 implements it's own
huge_pte_alloc() and does not depend on the generic definition. Drop this
helper which is redundant on arm64.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
There are some hand-written instances of "32" to express the number
of SVE Z-registers.
Since this code was written a #define was added for this, so
convert trivial instances of this magic number as appropriate.
No functional change.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently we convert from FPSIMD to SVE register state in memory in
two places.
To ease future maintenance, let's consolidate this in one place.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm64 stacktrace code is careful to only dereference frame records
in valid stack ranges, ensuring that a corrupted frame record won't
result in a faulting access.
However, it's still possible for corrupt frame records to result in
infinite loops in the stacktrace code, which is also undesirable.
This patch ensures that we complete a stacktrace in finite time, by
keeping track of which stacks we have already completed unwinding, and
verifying that if the next frame record is on the same stack, it is at a
higher address.
As this has turned out to be particularly subtle, comments are added to
explain the procedure.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tengfei Fan <tengfeif@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some common code is required by each stacktrace user to initialise
struct stackframe before the first call to unwind_frame().
In preparation for adding to the common code, this patch factors it
out into a separate function start_backtrace(), and modifies the
stacktrace callers appropriately.
No functional change.
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: drop tsk argument, update more callsites]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
on_accessible_stack() and on_task_stack() shouldn't (and don't)
modify their task argument, so it can be const.
This patch adds the appropriate modifiers. Whitespace violations in the
parameter lists are fixed at the same time.
No functional change.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: fixup const location, whitespace]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The recent changes to the vdso library for arm64 and the introduction of
the compat vdso library have generated some misalignment in the
Makefiles.
Cleanup the Makefiles for vdso and vdso32 libraries:
* Removing unused rules.
* Unifying the displayed compilation messages.
* Simplifying the generic library inclusion path for
arm64 vdso.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Running "make" on an already compiled kernel tree will rebuild the kernel
even without any modifications:
$ make ARCH=arm64 CROSS_COMPILE=/usr/bin/aarch64-unknown-linux-gnu-
arch/arm64/Makefile:58: CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
VDSOCHK arch/arm64/kernel/vdso/vdso.so.dbg
VDSOSYM include/generated/vdso-offsets.h
CHK include/generated/compile.h
CC arch/arm64/kernel/signal.o
CC arch/arm64/kernel/vdso.o
CC arch/arm64/kernel/signal32.o
LD arch/arm64/kernel/vdso/vdso.so.dbg
OBJCOPY arch/arm64/kernel/vdso/vdso.so
AS arch/arm64/kernel/vdso/vdso.o
AR arch/arm64/kernel/vdso/built-in.a
AR arch/arm64/kernel/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
This is the same bug fixed in commit 92a4728608a8 ("x86/boot: Fix
if_changed build flip/flop bug"). We cannot use two "if_changed" in one
target. Fix this build bug by merging two commands into one function.
Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO")
Fixes: 28b1a824a4f4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
[will: merged in compat fix from Vincenzo and made rule names consistent]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Prior to the introduction of Unified vDSO support and compat layer for
vDSO on arm64, AT_SYSINFO_EHDR was not defined for compat tasks.
In the current implementation, AT_SYSINFO_EHDR is defined even if the
compat vdso layer is not built, which has been shown to break Android
applications using bionic:
| 01-01 01:22:14.097 755 755 F libc : Fatal signal 11 (SIGSEGV),
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 755 (cameraserver),
| pid 755 (cameraserver)
| 01-01 01:22:14.112 759 759 F libc : Fatal signal 11 (SIGSEGV),
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 759
| (android.hardwar), pid 759 (android.hardwar)
| 01-01 01:22:14.120 756 756 F libc : Fatal signal 11 (SIGSEGV)
| code 1 (SEGV_MAPERR), fault addr 0x3cf2c96c in tid 756 (drmserver),
| pid 756 (drmserver)
Restore the old behaviour by making sure that AT_SYSINFO_EHDR for compat
tasks is defined only when CONFIG_COMPAT_VDSO is enabled.
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The MMIO RAPL interface driver depends on both powercap subsystem and
the intel_rapl_common code.
But when all of them are built-in, the MMIO RAPL interface driver can
be loaded before the other two and this breaks the system during boot.
Fix this by adjusting the init order of the powercap subsystem and the
intel_rapl_common code, so that it can be initialized first.
Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The generic SMP variant of arch_futex_atomic_op_inuser() returns always
-ENOSYS so the switch case and surrounding code are pointless. Remove it
and just return -ENOSYS.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/12bdaca8-99eb-e576-f842-5970ab1d6a92@virtuozzo.com
|
|
Commit e6401c130931 ("x86/irq/64: Split the IRQ stack into its own pages")
missed to update one piece of comment as it did to its peer in Xen, which
will confuse people who still need to read comment.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190719081635.26528-1-caoj.fnst@cn.fujitsu.com
|
|
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
advertise a landscape resolution and pitch, resulting in a messed up
display if the kernel tries to show anything on the efifb (because of the
wrong pitch).
Fix this by adding a new DMI match table for devices which need to have
their width and height swapped.
At first it was tried to use the existing table for overriding some of the
efifb parameters, but some of the affected devices have variants with
different LCD resolutions which will not work with hardcoded override
values.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.com
|
|
When arch_stack_walk_user() is called from atomic contexts, access_ok() can
trigger the following warning if compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y.
Reproducer:
// CONFIG_DEBUG_ATOMIC_SLEEP=y
# cd /sys/kernel/debug/tracing
# echo 1 > options/userstacktrace
# echo 1 > events/irq/irq_handler_entry/enable
WARNING: CPU: 0 PID: 2649 at arch/x86/kernel/stacktrace.c:103 arch_stack_walk_user+0x6e/0xf6
CPU: 0 PID: 2649 Comm: bash Not tainted 5.3.0-rc1+ #99
RIP: 0010:arch_stack_walk_user+0x6e/0xf6
Call Trace:
<IRQ>
stack_trace_save_user+0x10a/0x16d
trace_buffer_unlock_commit_regs+0x185/0x240
trace_event_buffer_commit+0xec/0x330
trace_event_raw_event_irq_handler_entry+0x159/0x1e0
__handle_irq_event_percpu+0x22d/0x440
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x12f/0x3f0
handle_irq+0x34/0x40
do_IRQ+0xa6/0x1f0
common_interrupt+0xf/0xf
</IRQ>
Fix it by calling __range_not_ok() directly instead of access_ok() as
copy_from_user_nmi() does. This is fine here because the actual copy is
inside a pagefault disabled region.
Reported-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190722083216.16192-2-devel@etsukata.com
|
|
On x86-32 with PTI enabled, parts of the kernel page-tables are not shared
between processes. This can cause mappings in the vmalloc/ioremap area to
persist in some page-tables after the region is unmapped and released.
When the region is re-used the processes with the old mappings do not fault
in the new mappings but still access the old ones.
This causes undefined behavior, in reality often data corruption, kernel
oopses and panics and even spontaneous reboots.
Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to
all page-tables in the system before the regions can be re-used.
References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org
|
|
With huge-page ioremap areas the unmappings also need to be synced between
all page-tables. Otherwise it can cause data corruption when a region is
unmapped and later re-used.
Make the vmalloc_sync_one() function ready to sync unmappings and make sure
vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD
is found.
Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org
|
|
Do not require a struct page for the mapped memory location because it
might not exist. This can happen when an ioremapped region is mapped with
2MB pages.
Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.org
|
|
A few boards set clock frequency of their I2C buses with
"clock_frequency" property. The right property is "clock-frequency".
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|