summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-06-24KVM: stats: Support binary stats retrieval for a VCPUJing Zhang
Add a VCPU ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VCPU stats header, descriptors and data. Define VCPU statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: stats: Support binary stats retrieval for a VMJing Zhang
Add a VM ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VM stats header, descriptors and data. Define VM statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-4-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24sctp: do black hole detection in search complete stateXin Long
Currently the PLPMUTD probe will stop for a long period (interval * 30) after it enters search complete state. If there's a pmtu change on the route path, it takes a long time to be aware if the ICMP TooBig packet is lost or filtered. As it says in rfc8899#section-4.3: "A DPLPMTUD method MUST NOT rely solely on this method." (ICMP PTB message). This patch is to enable the other method for search complete state: "A PL can use the DPLPMTUD probing mechanism to periodically generate probe packets of the size of the current PLPMTU." With this patch, the probe will continue with the current pmtu every 'interval' until the PMTU_RAISE_TIMER 'timeout', which we implement by adding raise_count to raise the probe size when it counts to 30 and removing the SCTP_PL_COMPLETE check for PMTU_RAISE_TIMER. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24net: macsec: fix the length used to copy the key for offloadingAntoine Tenart
The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24PCI: Export pci_dev_trylock() and pci_dev_unlock()Luis Chamberlain
Other places in the kernel use this form, and so just provide a common path for it. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20210623022824.308041-2-mcgrof@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-24libceph: set global_id as soon as we get an auth ticketIlya Dryomov
Commit 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") delayed the setting of global_id too much. It is set only after all tickets are received, but in pre-nautilus clusters an auth ticket and the service tickets are obtained in separate steps (for a total of three MAuth replies). When the service tickets are requested, global_id is used to build an authorizer; if global_id is still 0 we never get them and fail to establish the session. Moving the setting of global_id into protocol implementations. This way global_id can be set exactly when an auth ticket is received, not sooner nor later. Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24libceph: don't pass result into ac->ops->handle_reply()Ilya Dryomov
There is no result to pass in msgr2 case because authentication failures are reported through auth_bad_method frame and in MAuth case an error is returned immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24net: mdiobus: fix fwnode_mdbiobus_register() fallback caseMarcin Wojtas
The fallback case of fwnode_mdbiobus_register() (relevant for !CONFIG_FWNODE_MDIO) was defined with wrong argument name, causing a compilation error. Fix that. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24net: retrieve netns cookie via getsocketoptMartynas Pumputis
It's getting more common to run nested container environments for testing cloud software. One of such examples is Kind [1] which runs a Kubernetes cluster in Docker containers on a single host. Each container acts as a Kubernetes node, and thus can run any Pod (aka container) inside the former. This approach simplifies testing a lot, as it eliminates complicated VM setups. Unfortunately, such a setup breaks some functionality when cgroupv2 BPF programs are used for load-balancing. The load-balancer BPF program needs to detect whether a request originates from the host netns or a container netns in order to allow some access, e.g. to a service via a loopback IP address. Typically, the programs detect this by comparing netns cookies with the one of the init ns via a call to bpf_get_netns_cookie(NULL). However, in nested environments the latter cannot be used given the Kubernetes node's netns is outside the init ns. To fix this, we need to pass the Kubernetes node netns cookie to the program in a different way: by extending getsockopt() with a SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes node netns can retrieve the cookie and pass it to the program instead. Thus, this is following up on Eric's commit 3d368ab87cf6 ("net: initialize net->net_cookie at netns setup") to allow retrieval via SO_NETNS_COOKIE. This is also in line in how we retrieve socket cookie via SO_COOKIE. [1] https://kind.sigs.k8s.io/ Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24block: pass a gendisk to bdev_disk_changedChristoph Hellwig
bdev_disk_changed can only operate on whole devices. Make that clear by passing a gendisk instead of the struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24block: move bdev_disk_changedChristoph Hellwig
Move bdev_disk_changed to block/partitions/core.c, together with the rest of the partition scanning code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123240.441814-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24xdp: Add proper __rcu annotations to redirect map entriesToke Høiland-Jørgensen
XDP_REDIRECT works by a three-step process: the bpf_redirect() and bpf_redirect_map() helpers will lookup the target of the redirect and store it (along with some other metadata) in a per-CPU struct bpf_redirect_info. Next, when the program returns the XDP_REDIRECT return code, the driver will call xdp_do_redirect() which will use the information thus stored to actually enqueue the frame into a bulk queue structure (that differs slightly by map type, but shares the same principle). Finally, before exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will flush all the different bulk queues, thus completing the redirect. Pointers to the map entries will be kept around for this whole sequence of steps, protected by RCU. However, there is no top-level rcu_read_lock() in the core code; instead drivers add their own rcu_read_lock() around the XDP portions of the code, but somewhat inconsistently as Martin discovered[0]. However, things still work because everything happens inside a single NAPI poll sequence, which means it's between a pair of calls to local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could document this intention by using rcu_dereference_check() with rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and lockdep to verify that everything is done correctly. This patch does just that: we add an __rcu annotation to the map entry pointers and remove the various comments explaining the NAPI poll assurance strewn through devmap.c in favour of a longer explanation in filter.c. The goal is to have one coherent documentation of the entire flow, and rely on the RCU annotations as a "standard" way of communicating the flow in the map code (which can additionally be understood by sparse and lockdep). The RCU annotation replacements result in a fairly straight-forward replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE() becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the proper constructs to cast the pointer back and forth between __rcu and __kernel address space (for the benefit of sparse). The one complication is that xskmap has a few constructions where double-pointers are passed back and forth; these simply all gain __rcu annotations, and only the final reference/dereference to the inner-most pointer gets changed. With this, everything can be run through sparse without eliciting complaints, and lockdep can verify correctness even without the use of rcu_read_lock() in the drivers. Subsequent patches will clean these up from the drivers. [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/ [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-6-toke@redhat.com
2021-06-24rcu: Create an unrcu_pointer() to remove __rcu from a pointerPaul E. McKenney
The xchg() and cmpxchg() functions are sometimes used to carry out RCU updates. Unfortunately, this can result in sparse warnings for both the old-value and new-value arguments, as well as for the return value. The arguments can be dealt with using RCU_INITIALIZER(): old_p = xchg(&p, RCU_INITIALIZER(new_p)); But a sparse warning still remains due to assigning the __rcu pointer returned from xchg to the (most likely) non-__rcu pointer old_p. This commit therefore provides an unrcu_pointer() macro that strips the __rcu. This macro can be used as follows: old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p))); Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-2-toke@redhat.com
2021-06-24KVM: stats: Add fd-based API to read binary stats dataJing Zhang
This commit defines the API for userspace and prepare the common functionalities to support per VM/VCPU binary stats data readings. The KVM stats now is only accessible by debugfs, which has some shortcomings this change series are supposed to fix: 1. The current debugfs stats solution in KVM could be disabled when kernel Lockdown mode is enabled, which is a potential rick for production. 2. The current debugfs stats solution in KVM is organized as "one stats per file", it is good for debugging, but not efficient for production. 3. The stats read/clear in current debugfs solution in KVM are protected by the global kvm_lock. Besides that, there are some other benefits with this change: 1. All KVM VM/VCPU stats can be read out in a bulk by one copy to userspace. 2. A schema is used to describe KVM statistics. From userspace's perspective, the KVM statistics are self-describing. 3. With the fd-based solution, a separate telemetry would be able to read KVM stats in a less privileged environment. 4. After the initial setup by reading in stats descriptors, a telemetry only needs to read the stats data itself, no more parsing or setup is needed. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-3-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: stats: Separate generic stats from architecture specific onesJing Zhang
Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24fs: remove bdev_try_to_free_page callbackZhang Yi
After remove the unique user of sop->bdev_try_to_free_page() callback, we could remove the callback and the corresponding blkdev_releasepage() at all. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-9-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2,ext4: add a shrinker to release checkpointed buffersZhang Yi
Current metadata buffer release logic in bdev_try_to_free_page() have a lot of use-after-free issues when umount filesystem concurrently, and it is difficult to fix directly because ext4 is the only user of s_op->bdev_try_to_free_page callback and we may have to add more special refcount or lock that is only used by ext4 into the common vfs layer, which is unacceptable. One better solution is remove the bdev_try_to_free_page callback, but the real problem is we cannot easily release journal_head on the checkpointed buffer, so try_to_free_buffers() cannot release buffers and page under memory pressure, which is more likely to trigger out-of-memory. So we cannot remove the callback directly before we find another way to release journal_head. This patch introduce a shrinker to free journal_head on the checkpointed transaction. After the journal_head got freed, try_to_free_buffers() could free buffer properly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: ensure abort the journal if detect IO error when writing original ↵Zhang Yi
buffer back Although we merged c044f3d8360 ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still fail to detect the buffer write io error flag which may lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introducing a new atomic flag which only have one JBD2_CHECKPOINT_IO_ERROR bit now, and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer which has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24stm class: Spelling fixRandy Dunlap
Drop the repeated word "the" in a comment. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [alexander.shishkin: fixed the commit message] Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210621151246.31891-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24HID: input: Add support for Programmable ButtonsThomas Weißschuh
Map them to KEY_MACRO# event codes. These buttons are defined by HID as follows: "The user defines the function of these buttons to control software applications or GUI objects." This matches the semantics of the KEY_MACRO# input event codes that Linux supports. Also add support for HID "Named Array" collections. Also add hid-debug support for KEY_MACRO#. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-06-24Merge branch 'for-next/smccc' into for-next/coreWill Deacon
Add support for versions v1.2 and 1.3 of the SMC calling convention. * for-next/smccc: arm64: smccc: Support SMCCC v1.3 SVE register saving hint arm64: smccc: Add support for SMCCCv1.2 extended input/output registers
2021-06-24Merge branch 'for-next/perf' into for-next/coreWill Deacon
PMU driver cleanups for managing IRQ affinity and exposing event attributes via sysfs. * for-next/perf: (36 commits) drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe() perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number arm64: perf: Simplify EVENT ATTR macro in perf_event.c drivers/perf: Simplify EVENT ATTR macro in fsl_imx8_ddr_perf.c drivers/perf: Simplify EVENT ATTR macro in xgene_pmu.c drivers/perf: Simplify EVENT ATTR macro in qcom_l3_pmu.c drivers/perf: Simplify EVENT ATTR macro in qcom_l2_pmu.c drivers/perf: Simplify EVENT ATTR macro in SMMU PMU driver perf: Add EVENT_ATTR_ID to simplify event attributes perf/smmuv3: Don't trample existing events with global filter perf/hisi: Constify static attribute_group structs perf: qcom: Remove redundant dev_err call in qcom_l3_cache_pmu_probe() drivers/perf: hisi: Fix data source control arm64: perf: Add more support on caps under sysfs perf: qcom_l2_pmu: move to use request_irq by IRQF_NO_AUTOEN flag arm_pmu: move to use request_irq by IRQF_NO_AUTOEN flag perf: arm_spe: use DEVICE_ATTR_RO macro perf: xgene_pmu: use DEVICE_ATTR_RO macro perf: qcom: use DEVICE_ATTR_RO macro perf: arm_pmu: use DEVICE_ATTR_RO macro ...
2021-06-24media: Fix Media Controller API config checksShuah Khan
Smatch static checker warns that "mdev" can be null: sound/usb/media.c:287 snd_media_device_create() warn: 'mdev' can also be NULL If CONFIG_MEDIA_CONTROLLER is disabled, this file should not be included in the build. The below conditions in the sound/usb/Makefile are in place to ensure that media.c isn't included in the build. sound/usb/Makefile: snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) The following config check in include/media/media-dev-allocator.h is in place to enable the API only when CONFIG_MEDIA_CONTROLLER and CONFIG_USB are enabled. #if defined(CONFIG_MEDIA_CONTROLLER) && defined(CONFIG_USB) This check doesn't work as intended when CONFIG_USB=m. When CONFIG_USB=m, CONFIG_USB_MODULE is defined and CONFIG_USB is not. The above config check doesn't catch that CONFIG_USB is defined as a module and disables the API. This results in sound/usb enabling Media Controller specific ALSA driver code, while Media disables the Media Controller API. Fix the problem requires two changes: 1. Change the check to use IS_ENABLED to detect when CONFIG_USB is enabled as a module or static. Since CONFIG_MEDIA_CONTROLLER is a bool, leave the check unchanged to be consistent with drivers/media/Makefile. 2. Change the drivers/media/mc/Makefile to include mc-dev-allocator.o in mc-objs when CONFIG_USB is enabled. Link: https://lore.kernel.org/alsa-devel/YLeAvT+R22FQ%2FEyw@mwanda/ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-24dmaengine: imx-sdma: Remove platform data headerVladimir Zapolskiy
Since commit 6c5f05a6cd88 ("ARM: imx3: Remove imx3 soc_init()") there are no more users of struct sdma_script_start_addrs outside of the driver itself, thus let's move the struct declaration just to the driver source code and remove the header file as unused one. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20210620191103.156626-1-vz@mleia.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-06-24Merge tag 'usb-serial-5.14-rc1' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next Johan writes: USB-serial updates for 5.14-rc1 Here are the USB-serial updates for 5.14-rc1, including: - gpio support for CP2108 - chars_in_buffer and write_room return-value updates - chars_in_buffer and write_room clean ups Included are also various clean ups. All have been in linux-next with no reported issues. * tag 'usb-serial-5.14-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: cp210x: add support for GPIOs on CP2108 USB: serial: drop irq-flags initialisations USB: serial: mos7840: drop buffer-callback return-value comments USB: serial: mos7720: drop buffer-callback sanity checks USB: serial: io_edgeport: drop buffer-callback sanity checks USB: serial: digi_acceleport: add chars_in_buffer locking USB: serial: digi_acceleport: reduce chars_in_buffer over-reporting USB: serial: make usb_serial_driver::chars_in_buffer return uint USB: serial: make usb_serial_driver::write_room return uint
2021-06-24Merge tag 'asoc-fix-v5.13-rc7' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next ASoC: Fixes for v5.13 A final batch of fixes for v5.13, this is larger than I'd like due to the fixes for a series of suspend issues that Intel turned up in their testing this week.
2021-06-24sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flagBeata Michalska
Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain topology flag, to distinguish between shed_domains where any CPU capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where a full set of CPU capacities is visible to all domain members (SD_ASYM_CPUCAPACITY_FULL). With the distinction between full and partial CPU capacity asymmetry, brought in by the newly introduced flag, the scope of the original SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing behaviour when one is detected on a given sched domain, allowing misfit migrations within sched domains that do not observe full range of CPU capacities but still do have members with different capacity values. It loses though it's meaning when it comes to the lowest CPU asymmetry sched_domain level per-cpu pointer, which is to be now denoted by SD_ASYM_CPUCAPACITY_FULL flag. Signed-off-by: Beata Michalska <beata.michalska@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210603140627.8409-2-beata.michalska@arm.com
2021-06-24crypto: api - Move crypto attr definitions out of crypto.hHerbert Xu
The definitions for crypto_attr-related types and enums are not needed by most Crypto API users. This patch moves them out of crypto.h and into algapi.h/internal.h depending on the extent of their use. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-06-23Merge tag 'v5.14-rockchip-drivers1' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/drivers Yaml conversion of grf, pmu and power-domain bindings, Power-domains for rk3568 + necessary plumbing, Fixes for the usbphy bindings. * tag 'v5.14-rockchip-drivers1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: dt-bindings: soc: rockchip: drop unnecessary #phy-cells from grf.yaml dt-bindings: soc: rockchip: grf: add compatible for RK3308 USB grf dt-bindings: phy: rename phy nodename in phy-rockchip-inno-usb2.yaml dt-bindings: soc: rockchip: convert grf.txt to YAML soc: rockchip: power-domain: add rk3568 powerdomains dt-bindings: power: rockchip: Add bindings for RK3568 Soc dt-bindings: power: rockchip: Convert to json-schema dt-bindings: arm: rockchip: add more compatible strings to pmu.yaml dt-bindings: arm: rockchip: convert pmu.txt to YAML soc: rockchip: power-domain: Add a meaningful power domain name dt-bindings: add power-domain header for RK3568 SoCs Link: https://lore.kernel.org/r/4647955.GXAFRqVoOG@phil Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-23Merge tag 'ixp4xx-arm-soc-v5.14' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik into arm/soc This is a major chunk of IXP4xx modernization: - Fist we move some registers around to make room for the predetermined PCI I/O space. - Then we add some Kconfig options to make it possible to use the old PCI driver in parallell with the new shiny one. - Then we add the new PCI driver and some bindings for it. - On top of this we add an (ages old) patch from Arnd that centralize the CPU/SoC detection in drivers/soc and make the header a standard Linux header to avoid the <mach/*> business in drivers. - Then we split out and modernize some platform data headers for pata, and hwrandom, and top it up with DT bindings and support for hwrandom. * tag 'ixp4xx-arm-soc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik: ixp4xx: fix spelling mistake in Kconfig "Devce" -> "Device" hw_random: ixp4xx: Add OF support hw_random: ixp4xx: Add DT bindings hw_random: ixp4xx: Turn into a module hw_random: ixp4xx: Use SPDX license tag hw_random: ixp4xx: enable compile-testing pata: ixp4xx: split platform data to its own header soc: ixp4xx: move cpu detection to linux/soc/ixp4xx/cpu.h PCI: ixp4xx: Add a new driver for IXP4xx PCI: ixp4xx: Add device tree bindings for IXP4xx ARM/ixp4xx: Make NEED_MACH_IO_H optional ARM/ixp4xx: Move the virtual IObases Link: https://lore.kernel.org/r/CACRpkdbw6HSpp7k6q1FYGmtafLmdAu8bFnpHQOdfBDYYsdLbkw@mail.gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-23Merge tag 'omap-for-v5.14/fixes-not-urgent-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/soc Non-urgent fixes for omaps for v5.14 merge window Warn and block suspend for am335x unless the PM related modules and firmware are loaded and warn otherwise. Otherwise we easily end up with a suspended system with nothing capable of waking it up. We also drop a duplicated prototype for am33xx_init_early(). * tag 'omap-for-v5.14/fixes-not-urgent-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Block suspend for am3 and am4 if PM is not configured ARM: OMAP2+: remove duplicated prototype ARM: dts: dra7: Fix duplicate USB4 target module node ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act bus: ti-sysc: Fix am335x resume hang for usb otg module ARM: OMAP2+: Fix build warning when mmc_omap is not built ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function ARM: OMAP1: Fix use of possibly uninitialized irq variable bus: ti-sysc: Fix missing quirk flags for sata Link: https://lore.kernel.org/r/pull-1624002812-396117@atomide.com Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-23Merge tag 'hisi-arm64-dt-for-5.14' of git://github.com/hisilicon/linux-hisi ↵Olof Johansson
into arm/dt ARM64: DT: HiSilicon ARM64 DT updates for 5.14 - Correct the HiSilicon copyright * tag 'hisi-arm64-dt-for-5.14' of git://github.com/hisilicon/linux-hisi: arm64: dts: hisilicon: use the correct HiSilicon copyright Link: https://lore.kernel.org/r/60CBF4AE.7040301@hisilicon.com Signed-off-by: Olof Johansson <olof@lixom.net>
2021-06-23kunit: Add gnu_printf specifiersDavid Gow
Some KUnit functions use variable arguments to implement a printf-like format string. Use the __printf() attribute to let the compiler warn if invalid format strings are passed in. If the kernel is build with W=1, it complained about the lack of these specifiers, e.g.: ../lib/kunit/test.c:72:2: warning: function ‘kunit_log_append’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Acked-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-23kunit: Assign strings to 'const char*' in STREQ assertionsDavid Gow
Currently, the KUNIT_EXPECT_STREQ() and related macros assign both string arguments to variables of their own type (via typeof()). This seems to be to prevent the macro argument from being evaluated multiple times. However, this doesn't work if one of these is a fixed-length character array, rather than a character pointer, as (for example) char[16] will always allocate a new string. By always using 'const char*' (the type strcmp expects), we're always just taking a pointer to the string, which works even with character arrays. Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-23kunit: Do not typecheck binary assertionsDavid Gow
The use of typecheck() in KUNIT_EXPECT_EQ() and friends is causing more problems than I think it's worth. Things like enums need to have their values explicitly cast, and literals all need to be very precisely typed, else a large warning will be printed. While typechecking does have its uses, the additional overhead of having lots of needless casts -- combined with the awkward error messages which don't mention which types are involved -- makes tests less readable and more difficult to write. By removing the typecheck() call, the two arguments still need to be of compatible types, but don't need to be of exactly the same time, which seems a less confusing and more useful compromise. Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-24Merge tag 'drm-msm-next-2021-06-23b' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-next * devcoredump support for display errors * dpu: irq cleanup/refactor * dpu: dt bindings conversion to yaml * dsi: dt bindings conversion to yaml * mdp5: alpha/blend_mode/zpos support * a6xx: cached coherent buffer support * a660 support * gpu iova fault improvements: - info about which block triggered the fault, etc - generation of gpu devcoredump on fault * assortment of other cleanups and fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs4=qsGBBbyn-4JWqW4-YUSTKh67X3DsPQ=T2D9aXKqNA@mail.gmail.com
2021-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-06-23 The following pull-request contains BPF updates for your *net* tree. We've added 14 non-merge commits during the last 6 day(s) which contain a total of 13 files changed, 137 insertions(+), 64 deletions(-). Note that when you merge net into net-next, there is a small merge conflict between 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy") from bpf with c49661aa6f70 ("skmsg: Remove unused parameters of sk_msg_wait_data()") from net-next. Resolution is to: i) net/ipv4/udp_bpf.c: take udp_msg_wait_data() and remove err parameter from the function, ii) net/ipv4/tcp_bpf.c: take tcp_msg_wait_data() and remove err parameter from the function, iii) for net/core/skmsg.c and include/linux/skmsg.h: remove the sk_msg_wait_data() implementation and its prototype in header. The main changes are: 1) Fix BPF poke descriptor adjustments after insn rewrite, from John Fastabend. 2) Fix regression when using BPF_OBJ_GET with non-O_RDWR flags, from Maciej Żenczykowski. 3) Various bug and error handling fixes for UDP-related sock_map, from Cong Wang. 4) Fix patching of vmlinux BTF IDs with correct endianness, from Tony Ambardar. 5) Two fixes for TX descriptor validation in AF_XDP, from Magnus Karlsson. 6) Fix overflow in size calculation for bpf_map_area_alloc(), from Bui Quang Minh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net/tls: Remove the __TLS_DEC_STATS() macro.Kuniyuki Iwashima
The commit d26b698dd3cd ("net/tls: add skeleton of MIB statistics") introduced __TLS_DEC_STATS(), but it is not used and __SNMP_DEC_STATS() is not defined also. Let's remove it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23tcp: Add stats for socket migration.Kuniyuki Iwashima
This commit adds two stats for the socket migration feature to evaluate the effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE). If the migration fails because of the own_req race in receiving ACK and sending SYN+ACK paths, we do not increment the failure stat. Then another CPU is responsible for the req. Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/ Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23Merge tag 'mlx5-net-next-2021-06-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-net-next-2021-06-22 1) Various minor cleanups and fixes from net-next branch 2) Optimize mlx5 feature check on tx and a fix to allow Vxlan with Ipsec offloads ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-06-23 1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery. From Sabrina Dubroca 2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype for the PREEMPT_RT case. From Varad Gautam. 3) Remove a repeated declaration of xfrm_parse_spi. From Shaokun Zhang. 4) IPv4 beet mode can't handle fragments, but IPv6 does. commit 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets") handled IPv4 and IPv6 the same way. Relax the check for IPv6 because fragments are possible here. From Xin Long. 5) Memory allocation failures are not reported for XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct. Fix this by moving both cases in front of the function. 6) Fix a missing initialization in the xfrm offload fallback fail case for bonding devices. From Ayush Sawal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Skip non-SCTP packets in the new SCTP chunk support for nft_exthdr, from Phil Sutter. 2) Simplify TCP option sanity check for TCP packets, also from Phil. 3) Add a new expression to store when the rule has been used last time. 4) Pass the hook state object to log function, from Florian Westphal. 5) Document the new sysctl knobs to tune the flowtable timeouts, from Oz Shlomo. 6) Fix snprintf error check in the new nfnetlink_hook infrastructure, from Dan Carpenter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net: sched: remove qdisc->empty for lockless qdiscYunsheng Lin
As MISSED and DRAINING state are used to indicate a non-empty qdisc, qdisc->empty is not longer needed, so remove it. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net: sched: implement TCQ_F_CAN_BYPASS for lockless qdiscYunsheng Lin
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb(). This patch calls sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc, which aviod enqueuing and dequeuing operation. As qdisc->empty is not reliable to indicate a empty qdisc because there is a time window between enqueuing and setting qdisc->empty. So we use the MISSED state added in commit a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc"), which indicate there is lock contention, suggesting that it is better not to do the qdisc bypass in order to avoid packet out of order problem. In order to make MISSED state reliable to indicate a empty qdisc, we need to ensure that testing and clearing of MISSED state is within the protection of qdisc->seqlock, only setting MISSED state can be done without the protection of qdisc->seqlock. A MISSED state testing is added without the protection of qdisc->seqlock to aviod doing unnecessary spin_trylock() for contention case. As the enqueuing is not within the protection of qdisc->seqlock, there is still a potential data race as mentioned by Jakub [1]: thread1 thread2 thread3 qdisc_run_begin() # true qdisc_run_begin(q) set(MISSED) pfifo_fast_dequeue clear(MISSED) # recheck the queue qdisc_run_end() enqueue skb1 qdisc empty # true qdisc_run_begin() # true sch_direct_xmit() # skb2 qdisc_run_begin() set(MISSED) When above happens, skb1 enqueued by thread2 is transmited after skb2 is transmited by thread3 because MISSED state setting and enqueuing is not under the qdisc->seqlock. If qdisc bypass is disabled, skb1 has better chance to be transmited quicker than skb2. This patch does not take care of the above data race, because we view this as similar as below: Even at the same time CPU1 and CPU2 write the skb to two socket which both heading to the same qdisc, there is no guarantee that which skb will hit the qdisc first, because there is a lot of factor like interrupt/softirq/cache miss/scheduling afffecting that. There are below cases that need special handling: 1. When MISSED state is cleared before another round of dequeuing in pfifo_fast_dequeue(), and __qdisc_run() might not be able to dequeue all skb in one round and call __netif_schedule(), which might result in a non-empty qdisc without MISSED set. In order to avoid this, the MISSED state is set for lockless qdisc and __netif_schedule() will be called at the end of qdisc_run_end. 2. The MISSED state also need to be set for lockless qdisc instead of calling __netif_schedule() directly when requeuing a skb for a similar reason. 3. For netdev queue stopped case, the MISSED case need clearing while the netdev queue is stopped, otherwise there may be unnecessary __netif_schedule() calling. So a new DRAINING state is added to indicate this case, which also indicate a non-empty qdisc. 4. As there is already netif_xmit_frozen_or_stopped() checking in dequeue_skb() and sch_direct_xmit(), which are both within the protection of qdisc->seqlock, but the same checking in __dev_xmit_skb() is without the protection, which might cause empty indication of a lockless qdisc to be not reliable. So remove the checking in __dev_xmit_skb(), and the checking in the protection of qdisc->seqlock seems enough to avoid the cpu consumption problem for netdev queue stopped case. 1. https://lkml.org/lkml/2021/5/29/215 Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net: sched: avoid unnecessary seqcount operation for lockless qdiscYunsheng Lin
qdisc->running seqcount operation is mainly used to do heuristic locking on q->busylock for locked qdisc, see qdisc_is_running() and __dev_xmit_skb(). So avoid doing seqcount operation for qdisc with TCQ_F_NOLOCK flag. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23soc: qcom: smem_state: Add devm_qcom_smem_state_get()Stephan Gerhold
It is easy to forget to call qcom_smem_state_put() after a qcom_smem_state_get(). Introduce a devm_qcom_smem_state_get() helper function that automates this so that qcom_smem_state_put() is automatically called when a device is removed. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210618111556.53416-1-stephan@gerhold.net Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2021-06-23cfg80211: Add wiphy_info_once()Dmitry Osipenko
Add wiphy_info_once() helper that prints info message only once. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210511211549.30571-1-digetx@gmail.com
2021-06-23x86/fpu: Use pkru_write_default() in copy_init_fpstate_to_fpregs()Thomas Gleixner
There is no point in using copy_init_pkru_to_fpregs() which in turn calls write_pkru(). write_pkru() tries to fiddle with the task's xstate buffer for nothing because the XRSTOR[S](init_fpstate) just cleared the xfeature flag in the xstate header which makes get_xsave_addr() fail. It's a useless exercise anyway because the reinitialization activates the FPU so before the task's xstate buffer can be used again a XRSTOR[S] must happen which in turn dumps the PKRU value. Get rid of the now unused copy_init_pkru_to_fpregs(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.732508792@linutronix.de
2021-06-23mac80211: Switch to a virtual time-based airtime schedulerToke Høiland-Jørgensen
This switches the airtime scheduler in mac80211 to use a virtual time-based scheduler instead of the round-robin scheduler used before. This has a couple of advantages: - No need to sync up the round-robin scheduler in firmware/hardware with the round-robin airtime scheduler. - If several stations are eligible for transmission we can schedule both of them; no need to hard-block the scheduling rotation until the head of the queue has used up its quantum. - The check of whether a station is eligible for transmission becomes simpler (in ieee80211_txq_may_transmit()). The drawback is that scheduling becomes slightly more expensive, as we need to maintain an rbtree of TXQs sorted by virtual time. This means that ieee80211_register_airtime() becomes O(logN) in the number of currently scheduled TXQs because it can change the order of the scheduled stations. We mitigate this overhead by only resorting when a station changes position in the tree, and hopefully N rarely grows too big (it's only TXQs currently backlogged, not all associated stations), so it shouldn't be too big of an issue. To prevent divisions in the fast path, we maintain both station sums and pre-computed reciprocals of the sums. This turns the fast-path operation into a multiplication, with divisions only happening as the number of active stations change (to re-compute the current sum of all active station weights). To prevent this re-computation of the reciprocal from happening too frequently, we use a time-based notion of station activity, instead of updating the weight every time a station gets scheduled or de-scheduled. As queues can oscillate between empty and occupied quite frequently, this can significantly cut down on the number of re-computations. It also has the added benefit of making the station airtime calculation independent on whether the queue happened to have drained at the time an airtime value was accounted. Co-developed-by: Yibo Zhao <yiboz@codeaurora.org> Signed-off-by: Yibo Zhao <yiboz@codeaurora.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20210623134755.235545-1-toke@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-23Merge remote-tracking branch 'regulator/for-5.14' into regulator-nextMark Brown