summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-07-28rhashtable: Fix unprotected RCU dereference in __rht_ptrHerbert Xu
The rcu_dereference call in rht_ptr_rcu is completely bogus because we've already dereferenced the value in __rht_ptr and operated on it. This causes potential double readings which could be fatal. The RCU dereference must occur prior to the comparison in __rht_ptr. This patch changes the order of RCU dereference so that it is done first and the result is then fed to __rht_ptr. The RCU marking changes have been minimised using casts which will be removed in a follow-up patch. Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...") Reported-by: "Gong, Sishuai" <sishuai@purdue.edu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28Add pldmfw library for PLDM firmware updateJacob Keller
The pldmfw library is used to implement common logic needed to flash devices based on firmware files using the format described by the PLDM for Firmware Update standard. This library consists of logic to parse the PLDM file format from a firmware file object, as well as common logic for sending the relevant PLDM header data to the device firmware. A simple ops table is provided so that device drivers can implement device specific hardware interactions while keeping the common logic to the pldmfw library. This library will be used by the Intel ice networking driver as part of implementing device flash update via devlink. The library aims to be vendor and device agnostic. For this reason, it has been placed in lib/pldmfw, in the hopes that other devices which use the PLDM firmware file format may benefit from it in the future. However, do note that not all features defined in the PLDM standard have been implemented. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28of_address: Add bus type match for pci ranges parserJiaxun Yang
So the parser can be used to parse range property of ISA bus. As they're all using PCI-like method of range property, there is no need start a new parser. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-07-28net: improve the user pointer check in init_user_sockptrChristoph Hellwig
Make sure not just the pointer itself but the whole range lies in the user address space. For that pass the length and then use the access_ok helper to do the check. Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: remove sockptr_advanceChristoph Hellwig
sockptr_advance never properly worked. Replace it with _offset variants of copy_from_sockptr and copy_to_sockptr. Fixes: ba423fdaa589 ("net: add a new sockptr_t type") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: make sockptr_is_null strict aliasing safeChristoph Hellwig
While the kernel in general is not strict aliasing safe we can trivially do that in sockptr_is_null without affecting code generation, so always check the actually assigned union member. Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net/mlx5e: Modify uplink state on interface up/downRon Diskin
When setting the PF interface up/down, notify the firmware to update uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled. This behavior will prevent sending traffic out on uplink port when PF is down, such as sending traffic from a VF interface which is still up. Currently when calling mlx5e_open/close(), the driver only sends PAOS command to notify the firmware to set the physical port state to up/down, however, it is not sufficient. When VF is in "auto" state, it follows the uplink state, which was not updated on mlx5e_open/close() before this patch. When switchdev mode is enabled and uplink representor is first enabled, set the uplink port state value back to its FW default "AUTO". Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28mm/notifier: add migration invalidation typeRalph Campbell
Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start() which flushes all device private page mappings whether or not a page is being migrated to/from device private memory. In order to not disrupt device mappings that are not being migrated, shift the responsibility for clearing device private mappings to the device driver and leave CPU page table unmapping handled by migrate_vma_setup(). To support this, the caller of migrate_vma_setup() should always set struct migrate_vma::pgmap_owner to a non NULL value that matches the device private page->pgmap->owner. This value is then passed to the struct mmu_notifier_range with a new event type which the driver's invalidation function can use to avoid device MMU invalidations. Link: https://lore.kernel.org/r/20200723223004.9586-4-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28mm/migrate: add a flags parameter to migrate_vmaRalph Campbell
The src_owner field in struct migrate_vma is being used for two purposes, it acts as a selection filter for which types of pages are to be migrated and it identifies device private pages owned by the caller. Split this into separate parameters so the src_owner field can be used just to identify device private pages owned by the caller of migrate_vma_setup(). Rename the src_owner field to pgmap_owner to reflect it is now used only to identify which device private pages to migrate. Link: https://lore.kernel.org/r/20200723223004.9586-3-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28block: Remove callback typedefs for blk_mq_opsDaniel Wagner
No need to define typedefs for the callbacks, because there is not a single user except blk_mq_ops. Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-28Merge tag 'usb-serial-5.9-rc1' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next Johan writes: USB-serial updates for 5.9-rc1 Here are the USB-serial updates for 5.9-rc1, including: - console flow-control support - simulated line-breaks on some ch341 - hardware flow-control fixes for cp210x - break-detection and sysrq fixes for ftdi_sio - sysrq optimisations - input parity checking for cp210x Included are also some new device ids and various clean ups. All have been in linux-next with no reported issues. * tag 'usb-serial-5.9-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: (31 commits) USB: serial: qcserial: add EM7305 QDL product ID USB: serial: iuu_phoenix: fix led-activity helpers USB: serial: sierra: clean up special-interface handling USB: serial: cp210x: use in-kernel types in port data USB: serial: cp210x: drop unnecessary packed attributes USB: serial: cp210x: add support for TIOCGICOUNT USB: serial: cp210x: add support for line-status events USB: serial: cp210x: disable interface on errors in open USB: serial: drop redundant transfer-buffer casts USB: serial: drop extern keyword from function declarations USB: serial: drop unnecessary sysrq include USB: serial: add sysrq break-handler dummy USB: serial: inline sysrq dummy function USB: serial: only process sysrq when enabled USB: serial: only set sysrq timestamp for consoles USB: serial: ftdi_sio: fix break and sysrq handling USB: serial: ftdi_sio: clean up receive processing USB: serial: ftdi_sio: make process-packet buffer unsigned USB: serial: use fallthrough pseudo-keyword USB: serial: ch341: fix missing simulated-break margin ...
2020-07-28of/irq: Make of_msi_map_rid() PCI bus agnosticLorenzo Pieralisi
There is nothing PCI bus specific in the of_msi_map_rid() implementation other than the requester ID tag for the input ID space. Rename requester ID to a more generic ID so that the translation code can be used by all busses that require input/output ID translations. No functional change intended. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200619082013.13661-11-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28of/irq: make of_msi_map_get_device_domain() bus agnosticDiana Craciun
of_msi_map_get_device_domain() is PCI specific but it need not be and can be easily changed to be bus agnostic in order to be used by other busses by adding an IRQ domain bus token as an input parameter. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200619082013.13661-10-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28of/device: Add input id to of_dma_configure()Lorenzo Pieralisi
Devices sitting on proprietary busses have a device ID space that is owned by the respective bus and related firmware bindings. In order to let the generic OF layer handle the input translations to an IOMMU id, for such busses the current of_dma_configure() interface should be extended in order to allow the bus layer to provide the device input id parameter - that is retrieved/assigned in bus specific code and firmware. Augment of_dma_configure() to add an optional input_id parameter, leaving current functionality unchanged. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Link: https://lore.kernel.org/r/20200619082013.13661-8-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28of/iommu: Make of_map_rid() PCI agnosticLorenzo Pieralisi
There is nothing PCI specific (other than the RID - requester ID) in the of_map_rid() implementation, so the same function can be reused for input/output IDs mapping for other busses just as well. Rename the RID instances/names to a generic "id" tag. No functionality change intended. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200619082013.13661-7-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28ACPI/IORT: Add an input ID to acpi_dma_configure()Lorenzo Pieralisi
Some HW devices are created as child devices of proprietary busses, that have a bus specific policy defining how the child devices wires representing the devices ID are translated into IOMMU and IRQ controllers device IDs. Current IORT code provides translations for: - PCI devices, where the device ID is well identified at bus level as the requester ID (RID) - Platform devices that are endpoint devices where the device ID is retrieved from the ACPI object IORT mappings (Named components single mappings). A platform device is represented in IORT as a named component node For devices that are child devices of proprietary busses the IORT firmware represents the bus node as a named component node in IORT and it is up to that named component node to define in/out bus specific ID translations for the bus child devices that are allocated and created in a bus specific manner. In order to make IORT ID translations available for proprietary bus child devices, the current ACPI (and IORT) code must be augmented to provide an additional ID parameter to acpi_dma_configure() representing the child devices input ID. This ID is bus specific and it is retrieved in bus specific code. By adding an ID parameter to acpi_dma_configure(), the IORT code can map the child device ID to an IOMMU stream ID through the IORT named component representing the bus in/out ID mappings. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lore.kernel.org/r/20200619082013.13661-6-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28ACPI/IORT: Make iort_msi_map_rid() PCI agnosticLorenzo Pieralisi
There is nothing PCI specific in iort_msi_map_rid(). Rename the function using a bus protocol agnostic name, iort_msi_map_id(), and convert current callers to it. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Will Deacon <will@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lore.kernel.org/r/20200619082013.13661-4-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28ACPI/IORT: Make iort_get_device_domain IRQ domain agnosticLorenzo Pieralisi
iort_get_device_domain() is PCI specific but it need not be, since it can be used to retrieve IRQ domain nexus of any kind by adding an irq_domain_bus_token input to it. Make it PCI agnostic by also renaming the requestor ID input to a more generic ID name. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c Cc: Will Deacon <will@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lore.kernel.org/r/20200619082013.13661-3-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-28Merge tag 'v5.8-rc7' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-28net/mlx5: Hold pages RB tree per VFEran Ben Elisha
Per page request event, FW request to allocated or release pages for a single function. Driver maintains FW pages object per function, so there is no need to hold one global page data-base. Instead, have a page data-base per function, which will improve performance release flow in all cases, especially for "release all pages". As the range of function IDs is large and not sequential, use xarray to store a per function ID page data-base, where the function ID is the key. Upon first allocation of a page to a function ID, create the page data-base per function. This data-base will be released only at pagealloc mechanism cleanup. NIC: ConnectX-4 Lx CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz Test case: 32 VFs, measure release pages on one VF as part of FLR Before: 0.021 Sec After: 0.014 Sec The improvement depends on amount of VFs and memory utilization by them. Time measurements above were taken from idle system. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28lockdep: Move list.h inclusion into lockdep.hHerbert Xu
Currently lockdep_types.h includes list.h without actually using any of its macros or functions. All it needs are the type definitions which were moved into types.h long ago. This potentially causes inclusion loops because both are included by many core header files. This patch moves the list.h inclusion into lockdep.h. Note that we could probably remove it completely but that could potentially result in compile failures should any end users not include list.h directly and also be unlucky enough to not get list.h via some other header file. Reported-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200716063649.GA23065@gondor.apana.org.au
2020-07-28usb: chipidea: add query_available_role interfacePeter Chen
The glue layer may need to know current available role to do some setting, eg, the wakeup setting. So we add ci_hdrc_query_available_role for that. Signed-off-by: Peter Chen <peter.chen@nxp.com>
2020-07-28math64: New DIV_S64_ROUND_CLOSEST helperChunyan Zhang
Provide DIV_S64_ROUND_CLOSEST helper which uses div_s64 to perform division rounded to the closest integer using signed 64bit dividend and signed 32bit divisor. Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-07-27sched: Fix a typo in a comment王文虎
Change the comment typo: "direcly" -> "directly". Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/AAcAXwBTDSpsKN-5iyIOtaqk.1.1595857191899.Hmail.wenhu.wang@vivo.com
2020-07-27fsnotify: create method handle_inode_event() in fsnotify_operationsAmir Goldstein
The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: add support for FAN_REPORT_NAMEAmir Goldstein
Introduce a new fanotify_init() flag FAN_REPORT_NAME. It requires the flag FAN_REPORT_DIR_FID and there is a constant for setting both flags named FAN_REPORT_DFID_NAME. For a group with flag FAN_REPORT_NAME, the parent fid and name are reported for directory entry modification events (create/detete/move) and for events on non-directory objects. Events on directories themselves are reported with their own fid and "." as the name. The parent fid and name are reported with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that parent fid is reported with into type FAN_EVENT_INFO_TYPE_DFID, but with an appended null terminated name string. Link: https://lore.kernel.org/r/20200716084230.30611-21-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: add basic support for FAN_REPORT_DIR_FIDAmir Goldstein
For now, the flag is mutually exclusive with FAN_REPORT_FID. Events include a single info record of type FAN_EVENT_INFO_TYPE_DFID with a directory file handle. For now, events are only reported for: - Directory modification events - Events on children of a watching directory - Events on directory objects Soon, we will add support for reporting the parent directory fid for events on non-directories with filesystem/mount mark and support for reporting both parent directory fid and child fid. Link: https://lore.kernel.org/r/20200716084230.30611-19-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: remove check that source dentry is positiveAmir Goldstein
Remove the unneeded check for positive source dentry in fsnotify_move(). fsnotify_move() hook is mostly called from vfs_rename() under lock_rename() and vfs_rename() starts with may_delete() test that verifies positive source dentry. The only other caller of fsnotify_move() - debugfs_rename() also verifies positive source. Link: https://lore.kernel.org/r/20200716084230.30611-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: send event with parent/name info to sb/mount/non-dir marksAmir Goldstein
Similar to events "on child" to watching directory, send event with parent/name info if sb/mount/non-dir marks are interested in parent/name info. The FS_EVENT_ON_CHILD flag can be set on sb/mount/non-dir marks to specify interest in parent/name info for events on non-directory inodes. Events on "orphan" children (disconnected dentries) are sent without parent/name info. Events on directories are sent with parent/name info only if the parent directory is watching. After this change, even groups that do not subscribe to events on children could get an event with mark iterator type TYPE_CHILD and without mark iterator type TYPE_INODE if fanotify has marks on the same objects. dnotify and inotify event handlers can already cope with that situation. audit does not subscribe to events that are possible on child, so won't get to this situation. nfsd does not access the marks iterator from its event handler at the moment, so it is not affected. This is a bit too fragile, so we should prepare all groups to cope with mark type TYPE_CHILD preferably using a generic helper. Link: https://lore.kernel.org/r/20200716084230.30611-16-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: pass dir and inode arguments to fsnotify()Amir Goldstein
The arguments of fsnotify() are overloaded and mean different things for different event types. Replace the to_tell argument with separate arguments @dir and @inode, because we may be sending to both dir and child. Using the @data argument to pass the child is not enough, because dirent events pass this argument (for audit), but we do not report to child. Document the new fsnotify() function argumenets. Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: create helper fsnotify_inode()Amir Goldstein
Simple helper to consolidate biolerplate code. Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27PCI: Add Intel QuickAssist device IDsGiovanni Cabiddu
Add device IDs for the following Intel QuickAssist devices: DH895XCC, C3XXX and C62X. The defines in this patch are going to be referenced in two independent drivers, qat and vfio-pci. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27fsnotify: add object type "child" to object type iteratorAmir Goldstein
The object type iterator is used to collect all the marks of a specific group that have interest in an event. It is used by fanotify to get a single handle_event callback when an event has a match to either of inode/sb/mount marks of the group. The nature of fsnotify events is that they are associated with at most one sb at most one mount and at most one inode. When a parent and child are both watching, two events are sent to backend, one associated to parent inode and one associated to the child inode. This results in duplicate events in fanotify, which usually get merged before user reads them, but this is sub-optimal. It would be better if the same event is sent to backend with an object type iterator that has both the child inode and its parent, and let the backend decide if the event should be reported once (fanotify) or twice (inotify). Link: https://lore.kernel.org/r/20200716084230.30611-9-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize test for FAN_REPORT_FIDAmir Goldstein
As preparation for new flags that report fids, define a bit set of flags for a group reporting fids, currently containing the only bit FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: remove event FAN_DIR_MODIFYAmir Goldstein
It was never enabled in uapi and its functionality is about to be superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group flag FAN_REPORT_NAME. Keep a place holder variable name_event instead of removing the name recording code since it will be used by the new events. Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fs: define inode flags using bit numbersEric Biggers
Define the VFS inode flags using bit numbers instead of hardcoding powers of 2, which has become unwieldy now that we're up to 65536. No change in the actual values. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27regset: kill user_regset_copyout{,_zero}()Al Viro
no callers left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27regset(): kill ->get_size()Al Viro
not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27regset: kill ->get()Al Viro
no instances left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27regset: new method and helpers for itAl Viro
->regset_get() takes task+regset+buffer, returns the amount of free space left in the buffer on success and -E... on error. buffer is represented as struct membuf - a pair of (kernel) pointer and amount of space left Primitives for writing to such: * membuf_write(buf, data, size) * membuf_zero(buf, size) * membuf_store(buf, value) These are implemented as inlines (in case of membuf_store - a macro). All writes are sequential; they become no-ops when there's no space left. Return value of all primitives is the amount of space left after the operation, so they can be used as return values of ->regset_get(). Example of use: // stores pt_regs of task + 64 bytes worth of zeroes + 32bit PID of task int foo_get(struct task_struct *task, const struct regset *regset, struct membuf to) { membuf_write(&to, task_pt_regs(task), sizeof(struct pt_regs)); membuf_zero(&to, 64); return membuf_store(&to, (u32)task_tgid_vnr(task)); } regset_get()/regset_get_alloc() taught to use that thing if present. By the end of the series all users of ->get() will be converted; then ->get() and ->get_size() can go. Note that unlike ->get() this thing always starts at offset 0 and, since it only writes to kernel buffer, can't fail on copyout. It can, of course, fail for other reasons, but those tend to be less numerous. The caller guarantees that the buffer size won't be bigger than regset->n * regset->size. That simplifies life for quite a few instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27copy_regset_to_user(): do all copyout at once.Al Viro
Turn copy_regset_to_user() into regset_get_alloc() + copy_to_user(). Now all ->get() calls have a kernel buffer as destination. Note that we'd already eliminated the callers of copy_regset_to_user() with non-zero offset; now that argument is simply unused. Uninlined, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27kill elf_fpxregs_tAl Viro
all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not been defined on any architecture since 2010 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27take fdpic-related parts of elf_prstatus outAl Viro
The only architecture where we might end up using both is arm, and there we definitely don't want fdpic-related fields in elf_prstatus - coredump layout of ELF binaries should not depend upon having the kernel built with the support of ELF_FDPIC ones. Just move the fdpic-modified variant into binfmt_elf_fdpic.c (and call it elf_prstatus_fdpic there) [name stolen from nico] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27unexport linux/elfcore.hAl Viro
It's unusable from userland - it uses elf_gregset_t, which is not provided by exported headers. glibc has it in sys/procfs.h, but the same file defines struct elf_prstatus, so linux/elfcore.h can't be included once sys/procfs.h has been pulled. Same goes for uclibc and dietlibc simply doesn't have elf_gregset_t defined anywhere. IOW, no userland source is including that thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27introduction of regset ->get() wrappers, switching ELF coredumps to thoseAl Viro
Two new helpers: given a process and regset, dump into a buffer. regset_get() takes a buffer and size, regset_get_alloc() takes size and allocates a buffer. Return value in both cases is the amount of data actually dumped in case of success or -E... on error. In both cases the size is capped by regset->n * regset->size, so ->get() is called with offset 0 and size no more than what regset expects. binfmt_elf.c callers of ->get() are switched to using those; the other caller (copy_regset_to_user()) will need some preparations to switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27i2c: also convert placeholder function to return errnoWolfram Sang
All i2c_new_device-alike functions return ERR_PTR these days, but this fallback function was missed. Fixes: 2dea645ffc21 ("i2c: acpi: Return error pointers from i2c_acpi_new_device()") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [wsa: changed from 'ENOSYS' to 'ENODEV'] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-07-27fsnotify: pass dir argument to handle_event() callbackAmir Goldstein
The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27linux/kernel.h: Add PTR_ALIGN_DOWN macroKishon Vijay Abraham I
Add a macro for aligning down a pointer. This is useful to get an aligned register address when a device allows only word access and doesn't allow half word or byte access. Link: https://lore.kernel.org/r/20200722110317.4744-4-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Rob Herring <robh@kernel.org>
2020-07-27genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7Meir Lichtinger
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access flag. ConnectX-7 supports setting relaxed ordering read/write mkey attribute by UMR, indicated by new HCA capabilities. With ConnectX-7 driver uses UMR when user set relaxed ordering access flag, in contrast to previous silicon models. Specifically it includes setting relvant flags of mkey context mask in UMR control segment, and relaxed ordering write and read flags in UMR mkey context segment. Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>