summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core/uverbs_main.c
AgeCommit message (Collapse)Author
2020-01-16RDMA/uverbs: Add ioctl command to get a device contextJason Gunthorpe
Allow future extensions of the get context command through the uverbs ioctl kabi. Unlike the uverbs version this does not return an async_fd as well, that has to be done with another command. Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOCJason Gunthorpe
Allow the async FD to be allocated separately from the context. This is necessary to introduce the ioctl to create a context, as an ioctl should only ever create a single uobject at a time. If multiple async FDs are created then the first one is used to deliver affiliated events from any ib_uevent_object, with all subsequent ones will receive only unaffiliated events. Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Use READ_ONCE for ib_ufile.async_fileJason Gunthorpe
The writer for async_file holds the ucontext_lock, while the readers are left unlocked. Most readers rely on an implicit locking, either by having a uobject (which cannot be created before a context) or by holding the ib_ufile kref. However ib_uverbs_free_hw_resources() has no implicit lock and has a possible race. Make this all clear and sane by using READ_ONCE consistently. Link: https://lore.kernel.org/r/1578504126-9400-15-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Make ib_uverbs_async_event_file into a uobjectJason Gunthorpe
This makes async events aligned with completion events as both are full uobjects of FD type and use the same uobject lifecycle. A bunch of duplicate code is consolidated and the general flow between the two FDs is now very similar. Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify type usage for ib_uverbs_async_handler()Jason Gunthorpe
This function works on an ib_uverbs_async_file. Accept that as a parameter instead of the struct ib_uverbs_file. Consoldiate all the callers working from an ib_uevent_object to a single function and locate the async_file directly from the struct ib_uobject instead of using context_ptr. Link: https://lore.kernel.org/r/1578504126-9400-11-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_wq.uobjectJason Gunthorpe
This is a struct ib_uwq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_srq.uobjectJason Gunthorpe
This is a struct ib_usrq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_qp.uobjectJason Gunthorpe
This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_cq.uobjectJason Gunthorpe
This is a struct ib_ucq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Make ib_ucq_object use ib_uevent_objectJason Gunthorpe
Any uobject that sends events into the async_event_file should be using ib_uevent_object so it can use the standard uevent based helper functions. CQ pushes events into both the async_event and the comp_channel in an open coded way. Move the async events related stuff to ib_uevent_object. Link: https://lore.kernel.org/r/1578504126-9400-6-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify destruction of FD uobjectsJason Gunthorpe
FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10RDMA/core: Fix locking in ib_uverbs_event_readJason Gunthorpe
This should not be using ib_dev to test for disassociation, during disassociation is_closed is set under lock and the waitq is triggered. Instead check is_closed and be sure to re-obtain the lock to test the value after the wait_event returns. Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-05RDMA/core: Move core content from ib_uverbs to ib_coreMichal Kalderon
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23compat_ioctl: move more drivers to compat_ptr_ioctlArnd Bergmann
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...
2019-08-21RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'Jason Gunthorpe
This is a significant simplification, no extra list is kept per FD, and the interval tree is now shared between all the ucontexts, reducing overhead if there are multiple ucontexts active. Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25RDMA/core: Annotate destroy of mutex to ensure that it is released as unlockedParav Pandit
While compiled with CONFIG_DEBUG_MUTEXES, the kernel ensures that mutex is not held during destroy. Hence add mutex_destroy() for mutexes used in RDMA modules. Link: https://lore.kernel.org/r/20190723065733.4899-2-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-18RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEVJason Gunthorpe
Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-10RDMA: Move owner into struct ib_device_opsJason Gunthorpe
This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
2019-05-06*: convert stream-like files from nonseekable_open -> stream_openKirill Smelkov
Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock"), search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Among cases converted 14 were potentially vulnerable to read vs write deadlock (see details in 10dce8af3422): drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. and the rest were just safe to convert to stream_open because their read and write do not use ppos at all and corresponding file_operations do not have methods that assume @offset file access(*): arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. One can also recheck/review the patch via generating it with explanation comments included via $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain" (*) This second group also contains cases with read/write deadlocks that stream_open.cocci don't yet detect, but which are still valid to convert to stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c calls wait_for_completion_interruptible() in its .read, but stream_open.cocci currently detects only "wait_event*" as blocking. Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> Cc: Anatolij Gustschin <agust@denx.de> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com> Cc: Corey Minyard <minyard@acm.org> Cc: Harald Welte <laforge@gnumonks.org> Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs] Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Johan Hovold <johan@kernel.org> Cc: David Herrmann <dh.herrmann@googlemail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Jean Delvare <jdelvare@suse.com> Acked-by: Guenter Roeck <linux@roeck-us.net> [watchdog/* hwmon/*] Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec] Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec] Cc: Benson Leung <bleung@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome] Cc: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*] Cc: Mark Brown <broonie@kernel.org> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Wan ZongShun <mcuos.com@gmail.com> Cc: Zwane Mwaikambo <zwanem@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-04-29rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE useLinus Torvalds
The parameter to ZERO_PAGE() was wrong, but since all architectures except for MIPS and s390 ignore it, it wasn't noticed until 0-day reported the build error. Fixes: 67f269b37f9b ("RDMA/ucontext: Fix regression with disassociate") Cc: stable@vger.kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "One core bug fix and a few driver ones - FRWR memory registration for hfi1/qib didn't work with with some iovas causing a NFSoRDMA failure regression due to a fix in the NFS side - A command flow error in mlx5 allowed user space to send a corrupt command (and also smash the kernel stack we've since learned) - Fix a regression and some bugs with device hot unplug that was discovered while reviewing Andrea's patches - hns has a failure if the user asks for certain QP configurations" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Bugfix for mapping user db RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page IB/mlx5: Fix scatter to CQE in DCT QP creation IB/rdmavt: Fix frwr memory registration
2019-04-24Merge branch 'rdma_mmap' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24RDMA: Remove rdma_user_mmap_pageJason Gunthorpe
Upon further research drivers that want this should simply call the core function vm_insert_page(). The VMA holds a reference on the page and it will be automatically freed when the last reference drops. No need for disassociate to sequence the cleanup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24RDMA/ucontext: Fix regression with disassociateJason Gunthorpe
When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-19coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-28RDMA: Check net namespace access for uverbs, umad, cma and nldevParav Pandit
Introduce an API rdma_dev_access_netns() to check whether a rdma device can be accessed from the specified net namespace or not. Use rdma_dev_access_netns() while opening character uverbs, umad network device and also check while rdma cm_id binds to rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA/uverbs: Add a __user annotation to a pointerBart Van Assche
This patch avoids that sparse and smatch report the following: warning: cast removes address space of expression Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: 3a6532c9af1a ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flowsShamir Rabinovitch
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04Merge tag 'v5.0-rc5' into rdma.git for-nextJason Gunthorpe
Linux 5.0-rc5 Needed to merge the include/uapi changes so we have an up to date single-tree for these files. Patches already posted are also expected to need this for dependencies.
2019-01-30RDMA: Add indication for in kernel API support to IB deviceGal Pressman
Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociateYishai Hadas
The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: <stable@vger.kernel.org> # 4.19 Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/uverbs: Fix OOPs upon device disassociationYishai Hadas
The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it. Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file. To fix that move the kref_put() to be in ib_uverbs_release_file(). BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60 Cc: <stable@vger.kernel.org> # 4.2 Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUTJason Gunthorpe
When the ioctl interface for the write commands was introduced it did not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes rdma-core in userspace to not mark the buffers as written for valgrind. Along the same lines it turns out we have always missed marking the driver data. Fixing both of these makes valgrind work properly with rdma-core and ioctl. Fixes: 4785860e04bc ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-10IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-12RDMA: Start use ib_device_opsKamal Heib
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-03RDMA/uverbs: Use only attrs for the write() handler signatureJason Gunthorpe
All of the old arguments can be derived from the uverbs_attr_bundle structure, so get rid of the redundant arguments. Most of the prior work has been removing users of the arguments to allow this to be a simple patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use uverbs_attr_bundle to pass ucore for write/write_exJason Gunthorpe
This creates a consistent way to access the two core buffers across write and write_ex handlers. Remove the open coded ucore conversion in the write/ex compatibility handlers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-26RDMA/uverbs: Use uverbs_attr_bundle to pass udata for writeJason Gunthorpe
Now that we have metadata describing the command format the core code can directly compute the udata pointers and all the really ugly ib_uverbs_init_udata() calls can be removed from the handlers. This means all the write() handlers are no longer sensitive to the layout of the command buffer. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write_exJason Gunthorpe
The core code needs to compute the udata so we may as well pass it in the uverbs_attr_bundle instead of on the stack. This converts the simple case of write_ex() which already has a core calculation. Also change the write() path to use the attrs for ib_uverbs_init_udata() instead of on the stack. This lets the write to write_ex compatibility path continue to follow the lead of the _ex path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Prohibit write() calls with too small buffersJason Gunthorpe
The size meta-data in the prior patch describes the smallest acceptable buffer for the write() interface. Globally check this in the core code. This is necessary in the case of write() methods that have a driver udata to prevent computing a negative udata buffer length. The return code of -ENOSPC is chosen here as some of the handlers already use this code, however many other handler use EINVAL. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Make write() handlers return 0 on successJason Gunthorpe
Currently they return the command length, while all other handlers return 0. This makes the write path closer to the write_ex and ioctl path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for writeJason Gunthorpe
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22RDMA/uverbs: Convert the write interface to use uverbs_apiJason Gunthorpe
This organizes the write commands into objects and links them to the uverbs_api data structure. The command path is reworked to use uapi instead of its internal structures. The command mask is moved from a runtime check to a registration time check in the uapi. Since the write interface does not have the object ID as part of the command, the radix bins are converted into linear lists to support the lookup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22RDMA/uverbs: Add helpers to mark uapi functions as unsupportedJason Gunthorpe
We have many cases where parts of the uapi are not supported in a driver, needs a certain protocol, or whatever. It is best to reflect this directly into the struct uverbs_api when it is built so that everything is simply blocked off, and future introspection can report a proper supported list. This is done by adding some additional helpers to the definition list language that disable objects based on a 'supported' call back, and a helper that disables based on a NULL struct ib_device function pointer. Disablement is global. For instance, if a driver disables an object then everything connected to that object is removed, including core methods. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>