summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-10-02bus: mhi: core: Introduce counters to track MHI device state transitionsBhaumik Bhatt
Use counters to track MHI device state transitions such as those to M0, M2, or M3 states. This can help in better debug, allowing the user to see the number of transitions to a certain MHI state when queried using debugfs entries or via other mechanisms. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20200929175218.8178-9-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02bus: mhi: core: Use generic name field for an MHI deviceBhaumik Bhatt
An MHI device is not necessarily associated with only channels as we can have one associated with the controller itself. Hence, the chan_name field within the mhi_device structure should instead be replaced with a generic name to accurately reflect any type of MHI device. Reviewed-by: Jeffrey Hugo <jhugo@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20200929175218.8178-7-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02bus: mhi: fix doubled words and struct image_info kernel-docRandy Dunlap
Drop doubled word "table" in kernel-doc. Fix syntax for the kernel-doc notation for struct image_info. Note that the bhi_vec field is private and not part of the kernel-doc. Drop doubled word "device" in a comment. Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [mani: Added bus: prefix to the commit subject] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20200929175218.8178-2-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02USB: UDC: Expand device model API interfaceAlan Stern
The routines used by the UDC core to interface with the kernel's device model, namely usb_add_gadget_udc(), usb_add_gadget_udc_release(), and usb_del_gadget_udc(), provide access to only a subset of the device model's full API. They include functionality equivalent to device_register() and device_unregister() for gadgets, but they omit device_initialize(), device_add(), device_del(), get_device(), and put_device(). This patch expands the UDC API by adding usb_initialize_gadget(), usb_add_gadget(), usb_del_gadget(), usb_get_gadget(), and usb_put_gadget() to fill in the gap. It rewrites the existing routines to call the new ones. CC: Anton Vasilyev <vasilyev@ispras.ru> CC: Evgeny Novikov <novikov@ispras.ru> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-10-01pipe: remove pipe_wait() and fix wakeup race with spliceLinus Torvalds
The pipe splice code still used the old model of waiting for pipe IO by using a non-specific "pipe_wait()" that waited for any pipe event to happen, which depended on all pipe IO being entirely serialized by the pipe lock. So by checking the state you were waiting for, and then adding yourself to the wait queue before dropping the lock, you were guaranteed to see all the wakeups. Strictly speaking, the actual wakeups were not done under the lock, but the pipe_wait() model still worked, because since the waiter held the lock when checking whether it should sleep, it would always see the current state, and the wakeup was always done after updating the state. However, commit 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") split the single wait-queue into two, and in the process also made the "wait for event" code wait for _two_ wait queues, and that then showed a race with the wakers that were not serialized by the pipe lock. It's only splice that used that "pipe_wait()" model, so the problem wasn't obvious, but Josef Bacik reports: "I hit a hang with fstest btrfs/187, which does a btrfs send into /dev/null. This works by creating a pipe, the write side is given to the kernel to write into, and the read side is handed to a thread that splices into a file, in this case /dev/null. The box that was hung had the write side stuck here [pipe_write] and the read side stuck here [splice_from_pipe_next -> pipe_wait]. [ more details about pipe_wait() scenario ] The problem is we're doing the prepare_to_wait, which sets our state each time, however we can be woken up either with reads or writes. In the case above we race with the WRITER waking us up, and re-set our state to INTERRUPTIBLE, and thus never break out of schedule" Josef had a patch that avoided the issue in pipe_wait() by just making it set the state only once, but the deeper problem is that pipe_wait() depends on a level of synchonization by the pipe mutex that it really shouldn't. And the whole "wait for any pipe state change" model really isn't very good to begin with. So rather than trying to work around things in pipe_wait(), remove that legacy model of "wait for arbitrary pipe event" entirely, and actually create functions that wait for the pipe actually being readable or writable, and can do so without depending on the pipe lock serializing everything. Fixes: 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/ Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-01RDMA/uverbs: Expose the new GID query API to user spaceAvihai Horon
Expose the query GID table and entry API to user space by adding two new methods and method handlers to the device object. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Introduce new GID table query APIAvihai Horon
Introduce rdma_query_gid_table which enables querying all the GID tables of a given device and copying the attributes of all valid GID entries to a provided buffer. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Modify enum ib_gid_type and enum rdma_network_typeAvihai Horon
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so enum ib_gid_type will match the gid types of the new query GID table API which will be introduced in the following patches. This change in enum ib_gid_type requires to change also enum rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1 values. Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01overflow: Include header file with SIZE_MAX declarationLeon Romanovsky
The various array_size functions use SIZE_MAX define, but missed limits.h causes to failure to compile code that needs overflow.h. In file included from drivers/infiniband/core/uverbs_std_types_device.c:6: ./include/linux/overflow.h: In function 'array_size': ./include/linux/overflow.h:258:10: error: 'SIZE_MAX' undeclared (first use in this function) 258 | return SIZE_MAX; | ^~~~~~~~ Fixes: 610b15c50e86 ("overflow.h: Add allocation size calculation helpers") Link: https://lore.kernel.org/r/20200913102928.134985-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01spi: pxa2xx: Add SSC2 and SSPSP2 SSP registersCezary Rojewski
Update list of SSP registers with SSC2 and SSPSP2. These registers are utilized by LPT/WPT AudioDSP architecture. While SSC2 shares the same offset (0x40) as SSACDD, description of this register for SSP device present on mentioned AudioDSP is different so define separate constant to avoid any ambiguity. Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200825201743.4926-1-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-02power: supply: bq27xxx: add support for TI bq34z100Krzysztof Kozlowski
Add support for new device: the TI bq34z100-G1, a Wide Range Fuel Gauge for Li-Ion, PbA, NiMH, and NiCd batteries. The device shares a lot with other models, although it has its own differences requiring new quirks. This patch was tested on a system equipped with NiMH batteries. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 90 non-merge commits during the last 8 day(s) which contain a total of 103 files changed, 7662 insertions(+), 1894 deletions(-). Note that once bpf(/net) tree gets merged into net-next, there will be a small merge conflict in tools/lib/bpf/btf.c between commit 1245008122d7 ("libbpf: Fix native endian assumption when parsing BTF") from the bpf tree and the commit 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") from the bpf-next tree. Correct resolution would be to stick with bpf-next, it should look like: [...] /* check BTF magic */ if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) { err = -EIO; goto err_out; } if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { /* definitely not a raw BTF */ err = -EPROTO; goto err_out; } /* get file size */ [...] The main changes are: 1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying BTF-based kernel data structures out of BPF programs, from Alan Maguire. 2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race conditions exposed by it. It was discussed to take these via BPF and networking tree to get better testing exposure, from Paul E. McKenney. 3) Support multi-attach for freplace programs, needed for incremental attachment of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen. 4) libbpf support for appending new BTF types at the end of BTF object, allowing intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko. 5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add a redirect helper into neighboring subsys, from Daniel Borkmann. 6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps from one to another, from Lorenz Bauer. 7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused a verifier issue due to type downgrade to scalar, from John Fastabend. 8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue and epilogue sections, from Maciej Fijalkowski. 9) Add an option to perf RB map to improve sharing of event entries by avoiding remove- on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu. 10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01Merge tag 'soundwire-5.10-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next Vinod writes: soundwire updates for 5.10-rc1 This round of update includes: - Generic bandwidth allocation algorithm from Intel folks - PM support for Intel chipsets - Updates to Intel drivers which makes sdw usable on latest laptops - Support for MMIO SDW controllers found in QC chipsets - Update to subsystem to use helpers in bitfield.h to manage register bits * tag 'soundwire-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (66 commits) soundwire: sysfs: add slave status and device number before probe soundwire: bus: add enumerated Slave device to device list soundwire: remove an unnecessary NULL check soundwire: cadence: add data port test fail interrupt soundwire: intel: enable test modes soundwire: enable Data Port test modes soundwire: intel: use {u32|u16}p_replace_bits soundwire: cadence: use u32p_replace_bits soundwire: qcom: get max rows and cols info from compatible soundwire: qcom: add support to block packing mode soundwire: qcom: clear BIT FIELDs before value set. soundwire: Add generic bandwidth allocation algorithm soundwire: cadence: add parity error injection through debugfs soundwire: bus: export broadcast read/write capability for tests ASoC: codecs: realtek-soundwire: ignore initial PARITY errors soundwire: bus: use quirk to filter out invalid parity errors soundwire: slave: add first_interrupt_done status soundwire: bus: filter-out unwanted interrupt reports ASoC/soundwire: bus: use property to set interrupt masks soundwire: qcom: fix SLIBMUS/SLIMBUS typo ...
2020-10-01regulator: qcom_smd: Add PM660/PM660L regulator supportAngeloGioacchino Del Regno
The PM660 and PM660L are a very very common PMIC combo, found on boards using the SDM630, SDM636, SDM660 (and SDA variants) SoC. PM660 provides 6 SMPS and 19 LDOs (of which one is unaccesible), while PM660L provides 5 SMPS (of which S3 and S4 are combined), 10 LDOs and a Buck-or-Boost (BoB) regulator. The PM660L IC also provides other regulators that are very specialized (for example, for the display) and will be managed in the other appropriate drivers (for example, labibb). Signed-off-by: AngeloGioacchino Del Regno <kholk11@gmail.com> Link: https://lore.kernel.org/r/20200926125549.13191-6-kholk11@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-01RDMA/mlx5: Extend advice MR to support non faulting modeYishai Hadas
Extend advice MR to support non faulting mode, this can improve performance by increasing the populated page tables in the device. Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Enable ODP sync without faultingYishai Hadas
Enable ODP sync without faulting, this improves performance by reducing the number of page faults in the system. The gain from this option is that the device page table can be aligned with the presented pages in the CPU page table without causing page faults. As of that, the overhead on data path from hardware point of view to trigger a fault which end-up by calling the driver to bring the pages will be dropped. Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Improve ODP to use hmm_range_fault()Yishai Hadas
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve performance in a few aspects: This includes: - Dropping the need to allocate and free memory to hold its output - No need any more to use put_page() to unpin the pages - The logic to detect contiguous pages is done based on the returned order, no need to run per page and evaluate. In addition, moving to use hmm_range_fault() enables to reduce page faults in the system with it's snapshot mode, this will be introduced in next patches from this series. As part of this, cleanup some flows and use the required data structures to work with hmm_range_fault(). Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01dm: export dm_copy_name_and_uuidMike Snitzer
Allow DM targets to access the configured name and uuid. Also, bump DM ioctl version. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "A previous commit to prevent AML memory opregions from accessing the kernel memory turned out to be too restrictive. Relax the permission check to permit the ACPI core to map kernel memory used for table overrides" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: permit ACPI core to map kernel memory used for table overrides
2020-10-01debugobjects: Free per CPU pool after CPU unplugZqiang
If a CPU is offlined the debug objects per CPU pool is not cleaned up. If the CPU is never onlined again then the objects in the pool are wasted. Add a CPU hotplug callback which is invoked after the CPU is dead to free the pool. [ tglx: Massaged changelog and added comment about remote access safety ] Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20200908062709.11441-1-qiang.zhang@windriver.com
2020-10-01pipe: Fix memory leaks in create_pipe_files()Qian Cai
Calling pipe2() with O_NOTIFICATION_PIPE could results in memory leaks unless watch_queue_init() is successful. In case of watch_queue_init() failure in pipe2() we are left with inode and pipe_inode_info instances that need to be freed. That failure exit has been introduced in commit c73be61cede5 ("pipe: Add general notification queue support") and its handling should've been identical to nearby treatment of alloc_file_pseudo() failures - it is dealing with the same situation. As it is, the mainline kernel leaks in that case. Another problem is that CONFIG_WATCH_QUEUE and !CONFIG_WATCH_QUEUE cases are treated differently (and the former leaks just pipe_inode_info, the latter - both pipe_inode_info and inode). Fixed by providing a dummy wacth_queue_init() in !CONFIG_WATCH_QUEUE case and by having failures of wacth_queue_init() handled the same way we handle alloc_file_pseudo() ones. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Qian Cai <cai@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-01iommu/vt-d: Check UAPI data processed by IOMMU coreJacob Pan
IOMMU generic layer already does sanity checks on UAPI data for version match and argsz range based on generic information. This patch adjusts the following data checking responsibilities: - removes the redundant version check from VT-d driver - removes the check for vendor specific data size - adds check for the use of reserved/undefined flags Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1601051567-54787-7-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01iommu/uapi: Handle data and argsz filled by usersJacob Pan
IOMMU user APIs are responsible for processing user data. This patch changes the interface such that user pointers can be passed into IOMMU code directly. Separate kernel APIs without user pointers are introduced for in-kernel users of the UAPI functionality. IOMMU UAPI data has a user filled argsz field which indicates the data length of the structure. User data is not trusted, argsz must be validated based on the current kernel data size, mandatory data size, and feature flags. User data may also be extended, resulting in possible argsz increase. Backward compatibility is ensured based on size and flags (or the functional equivalent fields) checking. This patch adds sanity checks in the IOMMU layer. In addition to argsz, reserved/unused fields in padding, flags, and version are also checked. Details are documented in Documentation/userspace-api/iommu.rst Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1601051567-54787-6-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01iommu/uapi: Rename uapi functionsJacob Pan
User APIs such as iommu_sva_unbind_gpasid() may also be used by the kernel. Since we introduced user pointer to the UAPI functions, in-kernel callers cannot share the same APIs. In-kernel callers are also trusted, there is no need to validate the data. We plan to have two flavors of the same API functions, one called through ioctls, carrying a user pointer and one called directly with valid IOMMU UAPI structs. To differentiate both, let's rename existing functions with an iommu_uapi_ prefix. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1601051567-54787-5-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01iommu/uapi: Use named union for user dataJacob Pan
IOMMU UAPI data size is filled by the user space which must be validated by the kernel. To ensure backward compatibility, user data can only be extended by either re-purpose padding bytes or extend the variable sized union at the end. No size change is allowed before the union. Therefore, the minimum size is the offset of the union. To use offsetof() on the union, we must make it named. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/linux-iommu/20200611145518.0c2817d6@x1.home/ Link: https://lore.kernel.org/r/1601051567-54787-4-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-10-01iommu/uapi: Add argsz for user filled dataJacob Pan
As IOMMU UAPI gets extended, user data size may increase. To support backward compatibiliy, this patch introduces a size field to each UAPI data structures. It is *always* the responsibility for the user to fill in the correct size. Padding fields are adjusted to ensure 8 byte alignment. Specific scenarios for user data handling are documented in: Documentation/userspace-api/iommu.rst As there is no current users of the API, struct version is not incremented. Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1601051567-54787-3-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-09-30bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event arraySong Liu
Currently, perf event in perf event array is removed from the array when the map fd used to add the event is closed. This behavior makes it difficult to the share perf events with perf event array. Introduce perf event map that keeps the perf event open with a new flag BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not removed when the original map fd is closed. Instead, the perf event will stay in the map until 1) it is explicitly removed from the array; or 2) the array is freed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
2020-09-30net/mlx5: E-switch, Use PF num in metadata reg c0sunils
Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <sunils@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-30io_uring: provide IORING_ENTER_SQ_WAIT for SQPOLL SQ ring waitsJens Axboe
When using SQPOLL, applications can run into the issue of running out of SQ ring entries because the thread hasn't consumed them yet. The only option for dealing with that is checking later, or busy checking for the condition. Provide IORING_ENTER_SQ_WAIT if applications want to wait on this condition. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30fs: align IOCB_* flags with RWF_* flagsJens Axboe
We have a set of flags that are shared between the two and inherired in kiocb_set_rw_flags(), but we check and set these individually. Reorder the IOCB flags so that the bottom part of the space is synced with the RWF flag space, and then we can do them all in one mask and set operation. The only exception is RWF_SYNC, which needs to mark IOCB_SYNC and IOCB_DSYNC. Do that one separately. This shaves 15 bytes of text from kiocb_set_rw_flags() for me. Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30io_uring: allow disabling rings during the creationStefano Garzarella
This patch adds a new IORING_SETUP_R_DISABLED flag to start the rings disabled, allowing the user to register restrictions, buffers, files, before to start processing SQEs. When IORING_SETUP_R_DISABLED is set, SQE are not processed and SQPOLL kthread is not started. The restrictions registration are allowed only when the rings are disable to prevent concurrency issue while processing SQEs. The rings can be enabled using IORING_REGISTER_ENABLE_RINGS opcode with io_uring_register(2). Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30io_uring: add IOURING_REGISTER_RESTRICTIONS opcodeStefano Garzarella
The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode permanently installs a feature allowlist on an io_ring_ctx. The io_ring_ctx can then be passed to untrusted code with the knowledge that only operations present in the allowlist can be executed. The allowlist approach ensures that new features added to io_uring do not accidentally become available when an existing application is launched on a newer kernel version. Currently is it possible to restrict sqe opcodes, sqe flags, and register opcodes. IOURING_REGISTER_RESTRICTIONS can only be made once. Afterwards it is not possible to change restrictions anymore. This prevents untrusted code from removing restrictions. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30io_uring: use an enumeration for io_uring_register(2) opcodesStefano Garzarella
The enumeration allows us to keep track of the last io_uring_register(2) opcode available. Behaviour and opcodes names don't change. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30io_uring: move io_uring_get_socket() into io_uring.hJens Axboe
Now we have a io_uring kernel header, move this definition out of fs.h and into io_uring.h where it belongs. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30io_uring: don't rely on weak ->files referencesJens Axboe
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30drop_monitor: Filter control packets in drop monitorIdo Schimmel
Previously, devlink called into drop monitor in order to report hardware originated drops / exceptions. devlink intentionally filtered control packets and did not pass them to drop monitor as they were not dropped by the underlying hardware. Now drop monitor registers its probe on a generic 'devlink_trap_report' tracepoint and should therefore perform this filtering itself instead of having devlink do that. Add the trap type as metadata and have drop monitor ignore control packets. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30drop_monitor: Convert to using devlink tracepointIdo Schimmel
Convert drop monitor to use the recently introduced 'devlink_trap_report' tracepoint instead of having devlink call into drop monitor. This is both consistent with software originated drops ('kfree_skb' tracepoint) and also allows drop monitor to be built as a module and still report hardware originated drops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30devlink: Add a tracepoint for trap reportsIdo Schimmel
Add a tracepoint for trap reports so that drop monitor could register its probe on it. Use trace_devlink_trap_report_enabled() to avoid wasting cycles setting the trap metadata if the tracepoint is not enabled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30arm64: permit ACPI core to map kernel memory used for table overridesArd Biesheuvel
Jonathan reports that the strict policy for memory mapped by the ACPI core breaks the use case of passing ACPI table overrides via initramfs. This is due to the fact that the memory type used for loading the initramfs in memory is not recognized as a memory type that is typically used by firmware to pass firmware tables. Since the purpose of the strict policy is to ensure that no AML or other ACPI code can manipulate any memory that is used by the kernel to keep its internal state or the state of user tasks, we can relax the permission check, and allow mappings of memory that is reserved and marked as NOMAP via memblock, and therefore not covered by the linear mapping to begin with. Fixes: 1583052d111f ("arm64/acpi: disallow AML memory opregions to access kernel memory") Fixes: 325f5585ec36 ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-30tcp: add exponential backoff in __tcp_send_ack()Eric Dumazet
Whenever host is under very high memory pressure, __tcp_send_ack() skb allocation fails, and we setup a 200 ms (TCP_DELACK_MAX) timer before retrying. On hosts with high number of TCP sockets, we can spend considerable amount of cpu cycles in these attempts, add high pressure on various spinlocks in mm-layer, ultimately blocking threads attempting to free space from making any progress. This patch adds standard exponential backoff to avoid adding fuel to the fire. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30inet: remove icsk_ack.blockedEric Dumazet
TCP has been using it to work around the possibility of tcp_delack_timer() finding the socket owned by user. After commit 6f458dfb4092 ("tcp: improve latencies of timer triggered events") we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery, so we can get rid of icsk_ack.blocked This frees space that following patch will reuse. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30net: macb: move pdata to private headerAlexandre Belloni
struct macb_platform_data is only used by macb_pci to register the platform device, move its definition to cadence/macb.h and remove platform_data/macb.h Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30HID: add vivaldi HID driverSean O'Brien
Add vivaldi HID driver. This driver allows us to read and report the top row layout of keyboards which provide a vendor-defined (Google) HID usage. Signed-off-by: Sean O'Brien <seobrien@chromium.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-09-30bpf: Add redirect_neigh helper as redirect drop-inDaniel Borkmann
Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add classid helper only based on skb->skDaniel Borkmann
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-30RDMA/core: Remove ucontext->closingJason Gunthorpe
Nothing reads this any more, and the reason for its existence has passed due to the deferred fput() scheme. Fixes: 8ea1f989aa07 ("drivers/IB,usnic: reduce scope of mmap_sem") Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30leds: pca963x: register LEDs immediately after parsing, get rid of platdataMarek Behún
Register LEDs immediately after parsing their properties. This allows us to get rid of platdata, and since no one in tree uses header linux/platform_data/leds-pca963x.h, remove it. Signed-off-by: Marek Behún <marek.behun@nic.cz> Cc: Peter Meerwald <p.meerwald@bct-electronic.com> Cc: Ricardo Ribalda <ribalda@kernel.org> Cc: Zahari Petkov <zahari@balena.io> Signed-off-by: Pavel Machek <pavel@ucw.cz>
2020-09-30leds: tca6507: Absorb platform dataMarek Behún
The only in-tree usage of this driver is via device-tree. No on else includes linux/leds-tca6507.h, so absorb the definition of platdata structure. Signed-off-by: Marek Behún <marek.behun@nic.cz> Cc: NeilBrown <neilb@suse.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
2020-09-30media: v4l2-subdev.h: fix a kernel-doc markupMauro Carvalho Chehab
As reported by Sphinx: ./Documentation/driver-api/media/v4l2-subdev:490: ./include/media/v4l2-subdev.h:384: WARNING: Unparseable C cross-reference: 'struct' Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 6] struct ------^ The markup there is wrong: &struct &v4l2_input -> &struct v4l2_input Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>