summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-24arm64: Add missing ISB after invalidating TLB in __primary_switchMarc Zyngier
Although there has been a bit of back and forth on the subject, it appears that invalidating TLBs requires an ISB instruction when FEAT_ETS is not implemented by the CPU. From the bible: | In an implementation that does not implement FEAT_ETS, a TLB | maintenance instruction executed by a PE, PEx, can complete at any | time after it is issued, but is only guaranteed to be finished for a | PE, PEx, after the execution of DSB by the PEx followed by a Context | synchronization event Add the missing ISB in __primary_switch, just in case. Fixes: 3c5e9f238bc4 ("arm64: head.S: move KASLR processing out of __enable_mmu()") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210224093738.3629662-3-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-02-24arm64: VHE: Enable EL2 MMU from the idmapMarc Zyngier
Enabling the MMU requires the write to SCTLR_ELx (and the ISB that follows) to live in some identity-mapped memory. Otherwise, the translation will result in something totally unexpected (either fetching the wrong instruction stream, or taking a fault of some sort). This is exactly what happens in mutate_to_vhe(), as this code lives in the .hyp.text section, which isn't identity-mapped. With the right configuration, this explodes badly. Extract the MMU-enabling part of mutate_to_vhe(), and move it to its own function that lives in the idmap. This ensures nothing bad happens. Fixes: f359182291c7 ("arm64: Provide an 'upgrade to VHE' stub hypercall") Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210224093738.3629662-2-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-02-24KVM: arm64: make the hyp vector table entries localJoey Gouly
Make the hyp vector table entries local functions so they are not accidentally referred to outside of this file. Using SYM_CODE_START_LOCAL matches the other vector tables (in hyp-stub.S, hibernate-asm.S and entry.S) Signed-off-by: Joey Gouly <joey.gouly@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210222164956.43514-1-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-02-24initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRDGeert Uytterhoeven
Commit be1859bdc660 ("initramfs: remove redundant dependency on BLK_DEV_INITRD") removed all redundant dependencies on BLK_DEV_INITRD, but the recent addition of zstd support introduced a new one. Fixes: a30d8a39f057 ("usr: Add support for zstd compressed initramfs") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kbuild: remove deprecated 'always' and 'hostprogs-y/m'Masahiro Yamada
These have no more user in the upstream code. The use of them has been warned for a while for external modules. The migration is finished. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kbuild: parse C= and M= before changing the working directoryMasahiro Yamada
If Kbuild recurses to the top Makefile (for example, 'make deb-pkg'), C= and M= are parsed over again, needlessly. Parse them before changing the working directory. After that, sub_make_done is set to 1, so they are parsed just once. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kbuild: reuse this-makefile to define abs_srctreeMasahiro Yamada
Move this-makefile up, and reuse it to define abs_srctree. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfigMasahiro Yamada
Unify the similar build rules. This supports 'make build_config', which builds scripts/kconfig/conf but does not invoke it. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: omit --oldaskconfig option for 'make config'Masahiro Yamada
scripts/kconfig/conf.c line 39 defines the default of input_mode as oldaskconfig. Hence, 'make config' works in the same way even without the --oldaskconfig option given. Note this in the help message. This will be helpful to unify build rules in Makefile in the next commit. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: fix 'invalid option' for help optionMasahiro Yamada
scripts/kconfig/conf supports -? option to show the help message. This is not wired up to Makefile, so nobody would notice this, but it also shows 'invalid option' message. $ ./scripts/kconfig/conf -? ./scripts/kconfig/conf: invalid option -- '?' Usage: ./scripts/kconfig/conf [-s] [option] <kconfig-file> [option] is _one_ of the following: --listnewconfig List new options --helpnewconfig List new options and help text --oldaskconfig Start a new configuration using a line-oriented program ... The reason is the '?' is missing in the short option list passed to getopt_long(). While I fixed this issue, I also changed the option '?' to 'h'. I prefer -h (or --help, if a long option is also desired). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: remove dead code in conf_askvalue()Masahiro Yamada
conf_askvalue() is only called for oldconfig, syncconfig, and oldaskconfig. If it is called for other cases, it is a bug. So, the code after the switch statement is unreachable. Remove the dead code, and clean up the switch statement. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: clean up nested if-conditionals in check_conf()Masahiro Yamada
Unify the outer two if-conditionals into one. This decreases the indent level by one. Also, change the if-else blocks: if (input_mode == listnewconfig) { ... } else if (input_mode == helpnewconfig) { ... } else { ... } into the switch statement. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24kconfig: Remove duplicate call to sym_get_string_value()Mickaël Salaün
Use the saved returned value of sym_get_string_value() instead of calling it twice. Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Link: https://lore.kernel.org/r/20210215181511.2840674-2-mic@digikod.net Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24Makefile: Remove # characters from compiler stringNathan Chancellor
When using AMD's Optimizing C/C++ Compiler (AOCC), the build fails due to a # character in the version string, which is interpreted as a comment: $ make CC=clang defconfig init/main.o include/config/auto.conf.cmd:1374: *** invalid syntax in conditional. Stop. $ sed -n 1374p include/config/auto.conf.cmd ifneq "$(CC_VERSION_TEXT)" "AMD clang version 11.0.0 (CLANG: AOCC_2.3.0-Build#85 2020_11_10) (based on LLVM Mirror.Version.11.0.0)" Remove all # characters in the version string so that the build does not fail unexpectedly. Link: https://github.com/ClangBuiltLinux/linux/issues/1298 Reported-by: Michael Fuckner <michael@fuckner.net> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-24Makefile: reuse CC_VERSION_TEXTNick Desaulniers
I noticed we're invoking $(CC) via $(shell) more than once to check the version. Let's reuse the first string captured in $CC_VERSION_TEXT. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> [masahiro.yamada: CC_VERSION_TEXT is assigned by = instead of :=, so this $(shell ) is evaluated multiple times anyway. The number of $(CC) invocations will be still the same. Replacing 'grep' with the built-in $(findstring ) will give real performance benefit.] Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-23dts: drop dangling c6x symlinkMichal Kubecek
With c6x architecture removal, scripts/dtc/include-prefixes/c6x symlink lost its target. Drop the dangling symlink which triggers some distribution check scripts. Fixes: a579fcfa8e49 ("c6x: remove architecture") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210223204114.E7F55E0155@unicorn.suse.cz
2021-02-23dt-bindings: bcm2711-hdmi: Fix broken schemaMaxime Ripard
For some reason, unevaluatedProperties doesn't work and additionalProperties is required. Fix it by switching to additionalProperties. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20210218152837.1080819-1-maxime@cerno.tech Signed-off-by: Rob Herring <robh@kernel.org>
2021-02-23io-wq: fix race around io_worker grabbingJens Axboe
There's a small window between lookup dropping the reference to the worker and calling wake_up_process() on the worker task, where the worker itself could have exited. We ensure that the worker struct itself is valid, but worker->task may very well be gone by the time we issue the wakeup. Fix the race by using a completion triggered by the reference going to zero, and having exit wait for that completion before proceeding. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io-wq: fix races around manager/worker creation and task exitJens Axboe
These races have always been there, they are just more apparent now that we do early cancel of io-wq when the task exits. Ensure that the io-wq manager sets task state correctly to not miss wakeups for task creation. This is important if we get a wakeup after having marked ourselves as TASK_INTERRUPTIBLE. If we do end up creating workers, then we flip the state back to running, making the subsequent schedule() a no-op. Also increment the wq ref count before forking the thread, to avoid a use-after-free. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io_uring: ensure io-wq context is always destroyed for tasksJens Axboe
If the task ends up doing no IO, the context list is empty and we don't call into __io_uring_files_cancel() when the task exits. This can cause a leak of the io-wq structures. Ensure we always call __io_uring_files_cancel(), even if the task context list is empty. Fixes: 5aa75ed5b93f ("io_uring: tie async worker side to the task context") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()Jens Axboe
In the arch addition of PF_IO_WORKER, I missed parisc and powerpc for some reason. Fix that up, ensuring they handle PF_IO_WORKER like they do PF_KTHREAD in copy_thread(). Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io_uring: cleanup ->user usageJens Axboe
At this point we're only using it for memory accounting, so there's no need to have an extra ->limit_mem - we can just set ->user if we do the accounting, or leave it at NULL if we don't. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io-wq: remove nr_process accountingJens Axboe
We're now just using fork like we would from userspace, so there's no need to try and impose extra restrictions or accounting on the user side of things. That's already being done for us. That also means we don't have to pass in the user_struct anymore, that's correctly inherited through ->creds on fork. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERSJens Axboe
A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23net: remove cmsg restriction from io_uring based send/recvmsg callsJens Axboe
No need to restrict these anymore, as the worker threads are direct clones of the original task. Hence we know for a fact that we can support anything that the regular task can. Since the only user of proto_ops->flags was to flag PROTO_CMSG_DATA_ONLY, kill the member and the flag definition too. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23Revert "proc: don't allow async path resolution of /proc/self components"Jens Axboe
This reverts commit 8d4c3e76e3be11a64df95ddee52e99092d42fc19. No longer needed, as the io-wq worker threads have the right identity. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23Revert "proc: don't allow async path resolution of /proc/thread-self components"Jens Axboe
This reverts commit 0d4370cfe36b7f1719123b621a4ec4d9c7a25f89. No longer needed, as the io-wq worker threads have the right identity. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23block: fix logging on capacity changeMing Lei
Local variable of 'capacity' stores the previous disk capacity, and 'size' variable records the latest disk capacity, so swap them for fixing logging on capacity change. Cc: Christoph Hellwig <hch@lst.de> Fixes: a782483cc1f8 ("block: remove the nr_sects field in struct hd_struct") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23blk-settings: align max_sectors on "logical_block_size" boundaryMikulas Patocka
We get I/O errors when we run md-raid1 on the top of dm-integrity on the top of ramdisk. device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb The ramdisk device has logical_block_size 512 and max_sectors 255. The dm-integrity device uses logical_block_size 4096 and it doesn't affect the "max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have a device with max_sectors not aligned on logical_block_size. The md-raid device sees that the underlying leg has max_sectors 255 and it will split the bios on 255-sector boundary, making the bios unaligned on logical_block_size. In order to fix the bug, we round down max_sectors to logical_block_size. Cc: stable@vger.kernel.org Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23block: reopen the device in blkdev_reread_partChristoph Hellwig
Historically the BLKRRPART ioctls called into the now defunct ->revalidate method, which caused the sd driver to check if any media is present. When the ->revalidate method was removed this revalidation was lost, leading to lots of I/O errors when using the eject command. Fix this by reopening the device to rescan the partitions, and thus calling the revalidation logic in the sd driver. Fixes: 471bd0af544b ("sd: use bdev_check_media_change") Reported--by: Tom Seewald <tseewald@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Tom Seewald <tseewald@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io_uring: fix locked_free_list caches_free()Pavel Begunkov
Don't forget to zero locked_free_nr, it's not a disaster but makes it attempting to flush it with extra locking when there is nothing in the list. Also, don't traverse a potentially long list freeing requests under spinlock, splice the list and do it afterwards. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23io_uring: don't attempt IO reissue from the ring exit pathJens Axboe
If we're exiting the ring, just let the IO fail with -EAGAIN as nobody will care anyway. It's not the right context to reissue from. Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23Merge branch 'for-5.12/dax' into for-5.12/libnvdimmDan Williams
Pick up device-dax updates to merge with libnvdimm device updates for 5.12. * Fix the polarity of EINVAL in a sysfs return code * Drop the unused return code for driver remove() callbacks
2021-02-23Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-02-22 Dave corrects reporting of max TCs to use the value from hardware capabilities and setting of DCBx capability bits when changing between SW and FW LLDP. Brett fixes trusted VF multicast promiscuous not receiving expected packets and corrects VF max packet size when a port VLAN is configured. Henry updates available RSS queues following a change in channel count with a user defined LUT. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: update the number of available RSS queues ice: Fix state bits on LLDP mode switch ice: Account for port VLAN in VF max packet size calculation ice: Set trusted VF as default VSI when setting allmulti on ice: report correct max number of TCs ==================== Link: https://lore.kernel.org/r/20210222235814.834282-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23Merge tag 'keys-misc-20210126' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring updates from David Howells: "Here's a set of minor keyrings fixes/cleanups that I've collected from various people for the upcoming merge window. A couple of them might, in theory, be visible to userspace: - Make blacklist_vet_description() reject uppercase letters as they don't match the all-lowercase hex string generated for a blacklist search. This may want reconsideration in the future, but, currently, you can't add to the blacklist keyring from userspace and the only source of blacklist keys generates lowercase descriptions. - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag. This isn't currently a problem as the blacklist keyring isn't currently writable by userspace. The rest of the patches are cleanups and I don't think they should have any visible effect" * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: watch_queue: rectify kernel-doc for init_watch() certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID certs: Fix blacklist flag type confusion PKCS#7: Fix missing include certs: Fix blacklisted hexadecimal hash string check certs/blacklist: fix kernel doc interface issue crypto: public_key: Remove redundant header file from public_key.h keys: remove trailing semicolon in macro definition crypto: pkcs7: Use match_string() helper to simplify the code PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one encrypted-keys: Replace HTTP links with HTTPS ones crypto: asymmetric_keys: fix some comments in pkcs7_parser.h KEYS: remove redundant memset security: keys: delete repeated words in comments KEYS: asymmetric: Fix kerneldoc security/keys: use kvfree_sensitive() watch_queue: Drop references to /dev/watch_queue keys: Remove outdated __user annotations security: keys: Fix fall-through warnings for Clang
2021-02-23Merge branch 'wireguard-fixes-for-5-12-rc1'Jakub Kicinski
Jason Donenfeld says: ==================== wireguard fixes for 5.12-rc1 This series has a collection of fixes that have piled up for a little while now, that I unfortunately didn't get a chance to send out earlier. 1) Removes unlikely() from IS_ERR(), since it's already implied. 2) Remove a bogus sparse annotation that hasn't been needed for years. 3) Addition test in the test suite for stressing parallel ndo_start_xmit. 4) Slight struct reordering in preparation for subsequent fix. 5) If skb->protocol is bogus, we no longer attempt to send icmp messages. 6) Massive memory usage fix, hit by larger deployments. 7) Fix typo in kconfig dependency logic. (1) and (2) are tiny cleanups, and (3) is just a test, so if you're trying to reduce churn, you could not backport these. But (4), (5), (6), and (7) fix problems and should be applied to stable. IMO, it's probably easiest to just apply them all to stable. ==================== Link: https://lore.kernel.org/r/20210222162549.3252778-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: kconfig: use arm chacha even with no neonJason A. Donenfeld
The condition here was incorrect: a non-neon fallback implementation is available on arm32 when NEON is not supported. Reported-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: queueing: get rid of per-peer ring buffersJason A. Donenfeld
Having two ring buffers per-peer means that every peer results in two massive ring allocations. On an 8-core x86_64 machine, this commit reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which is an 90% reduction. Ninety percent! With some single-machine deployments approaching 500,000 peers, we're talking about a reduction from 7 gigs of memory down to 700 megs of memory. In order to get rid of these per-peer allocations, this commit switches to using a list-based queueing approach. Currently GSO fragments are chained together using the skb->next pointer (the skb_list_* singly linked list approach), so we form the per-peer queue around the unused skb->prev pointer (which sort of makes sense because the links are pointing backwards). Use of skb_queue_* is not possible here, because that is based on doubly linked lists and spinlocks. Multiple cores can write into the queue at any given time, because its writes occur in the start_xmit path or in the udp_recv path. But reads happen in a single workqueue item per-peer, amounting to a multi-producer, single-consumer paradigm. The MPSC queue is implemented locklessly and never blocks. However, it is not linearizable (though it is serializable), with a very tight and unlikely race on writes, which, when hit (some tiny fraction of the 0.15% of partial adds on a fully loaded 16-core x86_64 system), causes the queue reader to terminate early. However, because every packet sent queues up the same workqueue item after it is fully added, the worker resumes again, and stopping early isn't actually a problem, since at that point the packet wouldn't have yet been added to the encryption queue. These properties allow us to avoid disabling interrupts or spinning. The design is based on Dmitry Vyukov's algorithm [1]. Performance-wise, ordinarily list-based queues aren't preferable to ringbuffers, because of cache misses when following pointers around. However, we *already* have to follow the adjacent pointers when working through fragments, so there shouldn't actually be any change there. A potential downside is that dequeueing is a bit more complicated, but the ptr_ring structure used prior had a spinlock when dequeueing, so all and all the difference appears to be a wash. Actually, from profiling, the biggest performance hit, by far, of this commit winds up being atomic_add_unless(count, 1, max) and atomic_ dec(count), which account for the majority of CPU time, according to perf. In that sense, the previous ring buffer was superior in that it could check if it was full by head==tail, which the list-based approach cannot do. But all and all, this enables us to get massive memory savings, allowing WireGuard to scale for real world deployments, without taking much of a performance hit. [1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: device: do not generate ICMP for non-IP packetsJason A. Donenfeld
If skb->protocol doesn't match the actual skb->data header, it's probably not a good idea to pass it off to icmp{,v6}_ndo_send, which is expecting to reply to a valid IP packet. So this commit has that early mismatch case jump to a later error label. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: peer: put frequently used members above cache linesJason A. Donenfeld
The is_dead boolean is checked for every single packet, while the internal_id member is used basically only for pr_debug messages. So it makes sense to hoist up is_dead into some space formerly unused by a struct hole, while demoting internal_api to below the lowest struct cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: selftests: test multiple parallel streamsJason A. Donenfeld
In order to test ndo_start_xmit being called in parallel, explicitly add separate tests, which should all run on different cores. This should help tease out bugs associated with queueing up packets from different cores in parallel. Currently, it hasn't found those types of bugs, but given future planned work, this is a useful regression to avoid. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: socket: remove bogus __be32 annotationJann Horn
The endpoint->src_if4 has nothing to do with fixed-endian numbers; remove the bogus annotation. This was introduced in https://git.zx2c4.com/wireguard-monolithic-historical/commit?id=14e7d0a499a676ec55176c0de2f9fcbd34074a82 in the historical WireGuard repo because the old code used to zero-initialize multiple members as follows: endpoint->src4.s_addr = endpoint->src_if4 = fl.saddr = 0; Because fl.saddr is fixed-endian and an assignment returns a value with the type of its left operand, this meant that sparse detected an assignment between values of different endianness. Since then, this assignment was already split up into separate statements; just the cast survived. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23wireguard: avoid double unlikely() notation when using IS_ERR()Antonio Quartulli
The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-23io_uring: move SQPOLL thread io-wq forked workerJens Axboe
Don't use a kthread for SQPOLL, use a forked worker just like the io-wq workers. With that done, we can drop the various context grabbing we do for SQPOLL, it already has everything it needs. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23net: qrtr: Fix memory leak in qrtr_tun_openTakeshi Misawa
If qrtr_endpoint_register() failed, tun is leaked. Fix this, by freeing tun in error path. syzbot report: BUG: memory leak unreferenced object 0xffff88811848d680 (size 64): comm "syz-executor684", pid 10171, jiffies 4294951561 (age 26.070s) hex dump (first 32 bytes): 80 dd 0a 84 ff ff ff ff 00 00 00 00 00 00 00 00 ................ 90 d6 48 18 81 88 ff ff 90 d6 48 18 81 88 ff ff ..H.......H..... backtrace: [<0000000018992a50>] kmalloc include/linux/slab.h:552 [inline] [<0000000018992a50>] kzalloc include/linux/slab.h:682 [inline] [<0000000018992a50>] qrtr_tun_open+0x22/0x90 net/qrtr/tun.c:35 [<0000000003a453ef>] misc_open+0x19c/0x1e0 drivers/char/misc.c:141 [<00000000dec38ac8>] chrdev_open+0x10d/0x340 fs/char_dev.c:414 [<0000000079094996>] do_dentry_open+0x1e6/0x620 fs/open.c:817 [<000000004096d290>] do_open fs/namei.c:3252 [inline] [<000000004096d290>] path_openat+0x74a/0x1b00 fs/namei.c:3369 [<00000000b8e64241>] do_filp_open+0xa0/0x190 fs/namei.c:3396 [<00000000a3299422>] do_sys_openat2+0xed/0x230 fs/open.c:1172 [<000000002c1bdcef>] do_sys_open fs/open.c:1188 [inline] [<000000002c1bdcef>] __do_sys_openat fs/open.c:1204 [inline] [<000000002c1bdcef>] __se_sys_openat fs/open.c:1199 [inline] [<000000002c1bdcef>] __x64_sys_openat+0x7f/0xe0 fs/open.c:1199 [<00000000f3a5728f>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 [<000000004b38b7ec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space") Reported-by: syzbot+5d6e4af21385f5cfc56a@syzkaller.appspotmail.com Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Link: https://lore.kernel.org/r/20210221234427.GA2140@DESKTOP Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-24s390/cpumf: Add support for complete counter set extractionThomas Richter
Add support to the CPU Measurement counter facility device driver to extract complete counter sets per CPU and per counter set from user space. This includes a new device named /dev/hwctr and support for the device driver functions open, close and ioctl. Other functions are not supported. The ioctl command supports 3 subcommands: S390_HWCTR_START: enables counter sets on a list of CPUs. S390_HWCTR_STOP: disables counter sets on a list of CPUs. S390_HWCTR_READ: reads counter sets on a list of CPUs. The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which returns data. It requires member data_bytes to be positive and indicates the maximum amount of data available to store counter set data. The other ioctl() subcommands do not use this member and it should be set to zero. The S390_HWCTR_READ subcommand returns the following data: The cpuset data is flattened using the following scheme, stored in member data: 0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1 +---------+-----+---------+-----+---------+-----+-----+------+------+ | no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +---------+-----+---------+-----+---------+-----+-----+------+------+ 0xU 0xU+4 0xU+8 0xU+10 0xV-1 +-----+---------+-----+-----+------+------+ | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+-----+------+------+ 0xV 0xV+4 0xV+8 0xV+c +-----+---------+-----+---------+-----+-----+------+------+ | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+---------+-----+-----+------+------+ U and V denote arbitrary hexadezimal addresses. The first integer represents the number of CPUs data was extracted from. This is followed by CPU number and number of counter sets extracted. Both are two integer values. This is followed by the set identifer and number of counters extracted. Both are two integer values. This is followed by the counter values, each element is eight bytes in size. The S390_HWCTR_READ ioctl subcommand is also limited to one call per minute. This ensures that an application does not read out the counter sets too often and reduces the overall CPU performance. The complete counter set extraction is an expensive operation. Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24virtio/s390: implement virtio-ccw revision 2 correctlyCornelia Huck
CCW_CMD_READ_STATUS was introduced with revision 2 of virtio-ccw, and drivers should only rely on it being implemented when they negotiated at least that revision with the device. However, virtio_ccw_get_status() issued READ_STATUS for any device operating at least at revision 1. If the device accepts READ_STATUS regardless of the negotiated revision (which some implementations like QEMU do, even though the spec currently does not allow it), everything works as intended. While a device rejecting the command should also be handled gracefully, we will not be able to see any changes the device makes to the status, such as setting NEEDS_RESET or setting the status to zero after a completed reset. We negotiated the revision to at most 1, as we never bumped the maximum revision; let's do that now and properly send READ_STATUS only if we are operating at least at revision 2. Cc: stable@vger.kernel.org Fixes: 7d3ce5ab9430 ("virtio/s390: support READ_STATUS command for virtio-ccw") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20210216110645.1087321-1-cohuck@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24s390/smp: implement arch_irq_work_raise()Ilya Leoshkevich
The immediate need to have this is to have bpf_send_signal() send the signal ASAP instead of during the next hrtimer interrupt. However, it should also improve irq_work_queue() latencies in general, as well as get s390 out of the lame architectures list [1]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c?h=v5.11#n45 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24s390/topology: move cpumasks away from stackHeiko Carstens
Make cpumasks static variables to avoid potential large stack frames. There shouldn't be any concurrent callers since all current callers are serialized with the cpu hotplug lock. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-24s390/smp: smp_emergency_stop() - move cpumask away from stackHeiko Carstens
Make "cpumask_t cpumask" a static variable to avoid a potential large stack frame. Also protect against potential concurrent callers by introducing a local lock. Note: smp_emergency_stop() gets only called with irqs and machine checks disabled, therefore a cpu local deadlock is not possible. For concurrent callers the first cpu which enters the critical section wins and will stop all other cpus. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>