summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-11-01Merge tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull CDROM updates from Jens Axboe: "On behalf of Phillip, here are the CDROM updates for the 5.16-rc1 merge window: - Add ioctl for improved media change detection (Lukas) - Reformat some documentation (Phillip) - Redundant variable removal (luo)" * tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-block: cdrom: Remove redundant variable and its assignment cdrom: docs: reformat table in Documentation/userspace-api/ioctl/cdrom.rst drivers/cdrom: improved ioctl for media change detection
2021-11-01Merge tag 'for-5.16/scsi-ma-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull SCSI multi-actuator support from Jens Axboe: "This adds SCSI support for the recently merged block multi-actuator support. Since this was sitting on top of the block tree, the SCSI side asked me to queue it up." * tag 'for-5.16/scsi-ma-2021-10-29' of git://git.kernel.dk/linux-block: doc: Fix typo in request queue sysfs documentation doc: document sysfs queue/independent_access_ranges attributes libata: support concurrent positioning ranges log scsi: sd: add concurrent positioning ranges support
2021-11-01Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...
2021-11-01cgroup: bpf: Move wrapper for __cgroup_bpf_*() to kernel/bpf/cgroup.cHe Fengqing
In commit 324bda9e6c5a("bpf: multi program support for cgroup+bpf") cgroup_bpf_*() called from kernel/bpf/syscall.c, but now they are only used in kernel/bpf/cgroup.c, so move these function to kernel/bpf/cgroup.c, like cgroup_bpf_replace(). Signed-off-by: He Fengqing <hefengqing@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-11-01skmsg: Lose offset info in sk_psock_skb_ingressLiu Jian
If sockmap enable strparser, there are lose offset info in sk_psock_skb_ingress(). If the length determined by parse_msg function is not skb->len, the skb will be converted to sk_msg multiple times, and userspace app will get the data multiple times. Fix this by get the offset and length from strp_msg. And as Cong suggested, add one bit in skb->_sk_redir to distinguish enable or disable strparser. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211029141216.211899-1-liujian56@huawei.com
2021-11-01Merge tag 'tpmdd-next-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Only bug fixes" * tag 'tpmdd-next-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm_tis_spi: Add missing SPI ID tpm: fix Atmel TPM crash caused by too frequent queries tpm: Check for integer overflow in tpm2_map_response_body() tpm: tis: Kconfig: Add helper dependency on COMPILE_TEST
2021-11-01Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...
2021-11-01Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to ice and iavf drivers and virtchnl header file. Brett removes vlan_promisc argument from a function call for ice driver. In the virtchnl header file he removes an unused, reserved define and converts raw value defines to instead use the BIT macro. Marcin adds syncing of MAC addresses when creating switchdev VFs to remove error messages on link up and stops showing buffer information for port representors to remove duplicated entries being displayed for ice driver. Karen introduces a helper to go from pci_dev to iavf_adapter in the iavf driver. Przemyslaw fixes an issue where iavf was attempting to free IRQs before calling disable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01vdpa: Enable user to set mac and mtu of vdpa deviceParav Pandit
$ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000 $ vdpa dev config show bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:11:22:33:44:55", "link ": "up", "link_announce ": false, "mtu": 9000, } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-5-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-11-01vdpa: Use kernel coding style for structure commentsParav Pandit
As subsequent patch adds new structure field with comment, move the structure comment to follow kernel coding style. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-4-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01vdpa: Introduce query of device config layoutParav Pandit
Introduce a command to query a device config layout. An example query of network vdpa device: $ vdpa dev add name bar mgmtdev vdpasim_net $ vdpa dev config show bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:35:09:19:48:05", "link ": "up", "link_announce ": false, "mtu": 1500, } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-3-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01vdpa: Introduce and use vdpa device get, set config helpersParav Pandit
Subsequent patches enable get and set configuration either via management device or via vdpa device' config ops. This requires synchronization between multiple callers to get and set config callbacks. Features setting also influence the layout of the configuration fields endianness. To avoid exposing synchronization primitives to callers, introduce helper for setting the configuration and use it. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-2-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_ring: validate used buffer lengthJason Wang
This patch validate the used buffer length provided by the device before trying to use it. This is done by record the in buffer length in a new field in desc_state structure during virtqueue_add(), then we can fail the virtqueue_get_buf() when we find the device is trying to give us a used buffer length which is greater than the in buffer length. Since some drivers have already done the validation by themselves, this patch tries to makes the core validation optional. For the driver that doesn't want the validation, it can set the suppress_used_validation to be true (which could be overridden by force_used_validation module parameter). To be more efficient, a dedicate array is used for storing the validate used length, this helps to eliminate the cache stress if validation is done by the driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_config: introduce a new .enable_cbs methodJason Wang
This patch introduces a new method to enable the callbacks for config and virtqueues. This will be used for making sure the virtqueue callbacks are only enabled after virtio_device_ready() if transport implements this method. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01vdpa: add new callback get_vq_num_min in vdpa_config_opsWu Zongyong
This callback is optional. For vdpa devices that not support to change virtqueue size, get_vq_num_min and get_vq_num_max will return the same value, so that users can choose a correct value for that device. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/f4af5b0abd660d9a29ab6b2f67bd6df10284a230.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01vdpa: fix typoWu Zongyong
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/4b5153262e4ba64986bb567d7425ad4829ca7bcc.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio-pci: introduce legacy device moduleWu Zongyong
Split common codes from virtio-pci-legacy so vDPA driver can reuse it later. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/71605acde5e97fcb2760a6973e406279fb1bbd33.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-31platform/chrome: cros_ec_proto: Use EC struct for featuresPrashant Malani
The Chrome EC's features are returned through an ec_response_get_features struct, but they are stored in an independent array. Although the two are effectively the same at present (2 unsigned 32 bit ints), there is the possibility that they could go out of sync. Avoid this by only using the EC struct to store the features. Signed-off-by: Prashant Malani <pmalani@chromium.org> Link: https://lore.kernel.org/r/20211004170716.86601-1-pmalani@chromium.org Signed-off-by: Benson Leung <bleung@chromium.org>
2021-10-31Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel
'arm/tegra', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next
2021-10-31sched/fair: Wait before decaying max_newidle_lb_costVincent Guittot
Decay max_newidle_lb_cost only when it has not been updated for a while and ensure to not decay a recently changed value. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20211019123537.17146-4-vincent.guittot@linaro.org
2021-10-31Merge tag 'kvmarm-5.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
2021-10-30task_stack: Fix end_of_stack() for architectures with upwards-growing stackHelge Deller
The function end_of_stack() returns a pointer to the last entry of a stack. For architectures like parisc where the stack grows upwards return the pointer to the highest address in the stack. Without this change I faced a crash on parisc, because the stackleak functionality wrote STACKLEAK_POISON to the lowest address and thus overwrote the first 4 bytes of the task_struct which included the TIF_FLAGS. Signed-off-by: Helge Deller <deller@gmx.de>
2021-10-30locking: Remove spin_lock_flags() etcArnd Bergmann
parisc, ia64 and powerpc32 are the only remaining architectures that provide custom arch_{spin,read,write}_lock_flags() functions, which are meant to re-enable interrupts while waiting for a spinlock. However, none of these can actually run into this codepath, because it is only called on architectures without CONFIG_GENERIC_LOCKBREAK, or when CONFIG_DEBUG_LOCK_ALLOC is set without CONFIG_LOCKDEP, and none of those combinations are possible on the three architectures. Going back in the git history, it appears that arch/mn10300 may have been able to run into this code path, but there is a good chance that it never worked. On the architectures that still exist, it was already impossible to hit back in 2008 after the introduction of CONFIG_GENERIC_LOCKBREAK, and possibly earlier. As this is all dead code, just remove it and the helper functions built around it. For arch/ia64, the inline asm could be cleaned up, but it seems safer to leave it untouched. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Link: https://lore.kernel.org/r/20211022120058.1031690-1-arnd@kernel.org
2021-10-30tty: Fix extra "not" in TTY_DRIVER_REAL_RAW descriptionAnssi Hannula
TTY_DRIVER_REAL_RAW flag (which is always set for e.g. serial ports) documentation says that driver must always set special character handling flags in certain conditions. However, as the following sentence makes clear, what is actually intended is the opposite. Fix that by removing the unintended double negation. Acked-by: Johan Hovold <johan@kernel.org> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Link: https://lore.kernel.org/r/20211027102124.3049414-1-anssi.hannula@bitwise.fi Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-29mailbox: imx: support i.MX8ULP S4 MUPeng Fan
Like i.MX8 SCU, i.MX8ULP S4 also has vendor specific protocol. - bind SCU/S4 MU part to share one tx/rx/init API to make code simple. - S4 msg max size is very large, so alloc the space at driver probe, not use local on stack variable. - S4 MU has 8 TR and 4 RR which is different with i.MX8 MU, so adapt code to reflect this. Tested on i.MX8MP, i.MX8ULP Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29mailbox: apple: Add driver for Apple mailboxesSven Peter
Apple SoCs such as the M1 come with various co-processors. Mailboxes are used to communicate with those. This driver adds support for two variants of those mailboxes. Signed-off-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-10-29net/mlx5: Allow skipping counter refresh on creationPaul Blakey
CT creates a counter for each CT rule, and for each such counter, fs_counters tries to queue mlx5_fc_stats_work() work again via mod_delayed_work(0) call to refresh all counters. This call has a large performance impact when reaching high insertion rate and accounts for ~8% of the insertion time when using software steering. Allow skipping the refresh of all counters during counter creation. Change CT to use this refresh skipping for it's counters. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29virtchnl: Use the BIT() macro for capability/offload flagsBrett Creeley
Currently raw hex values are used to define specific bits for each capability/offload in virtchnl.h. Using raw hex values makes it unclear which bits are used/available. Fix this by using the BIT() macro so it's immediately obvious which bits are used/available. Also, move the VIRTCHNL_VF_CAP_ADV_LINK_SPEED define in the correct place to line up with the other bit values and add a comment for its purpose. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29virtchnl: Remove unused VIRTCHNL_VF_OFFLOAD_RSVD defineBrett Creeley
Remove unused define that is currently marked as reserved. This will open up space for a new feature if/when it's introduced. Also, there is no reason to keep unused defines around. Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29signal: Implement force_fatal_sigEric W. Biederman
Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal handler was set to SIG_DFL. Reimplement force_sigsegv based upon this new helper. This fixes force_sigsegv so that when it forces the default signal handler to be used the code now forces the signal to be unblocked as well. Reusing the tested logic in force_sig_info_to_task that was built for force_sig_seccomp this makes the implementation trivial. This is interesting both because it makes force_sigsegv simpler and because there are a couple of buggy places in the kernel that call do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight forward way today for those places to simply force the exit of a process with the chosen signal. Creating force_fatal_sig allows those places to be implemented with normal signal exits. Link: https://lkml.kernel.org/r/20211020174406.17889-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29PCI: Add pci_find_dvsec_capability to find designated VSECBen Widawsky
Add pci_find_dvsec_capability to locate a Designated Vendor-Specific Extended Capability with the specified Vendor ID and Capability ID. The Designated Vendor-Specific Extended Capability (DVSEC) allows one or more "vendor" specific capabilities that are not tied to the Vendor ID of the PCI component. Where the DVSEC Vendor may be a standards body like CXL. Cc: David E. Box <david.e.box@linux.intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-pci@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163379787943.692348.6814373487017444007.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29block: remove blk_{get,put}_requestChristoph Hellwig
These are now pointless wrappers around blk_mq_{alloc,free}_request, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29mctp: Add flow extension to skbJeremy Kerr
This change adds a new skb extension for MCTP, to represent a request/response flow. The intention is to use this in a later change to allow i2c controllers to correctly configure a multiplexer over a flow. Since we have a cleanup function in the core path (if an extension is present), we'll need to make CONFIG_MCTP a bool, rather than a tristate. Includes a fix for a build warning with clang: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28mm: filemap: check if THP has hwpoisoned subpage for PMD page faultYang Shi
When handling shmem page fault the THP with corrupted subpage could be PMD mapped if certain conditions are satisfied. But kernel is supposed to send SIGBUS when trying to map hwpoisoned page. There are two paths which may do PMD map: fault around and regular fault. Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") the thing was even worse in fault around path. The THP could be PMD mapped as long as the VMA fits regardless what subpage is accessed and corrupted. After this commit as long as head page is not corrupted the THP could be PMD mapped. In the regular fault path the THP could be PMD mapped as long as the corrupted page is not accessed and the VMA fits. This loophole could be fixed by iterating every subpage to check if any of them is hwpoisoned or not, but it is somewhat costly in page fault path. So introduce a new page flag called HasHWPoisoned on the first tail page. It indicates the THP has hwpoisoned subpage(s). It is set if any subpage of THP is found hwpoisoned by memory failure and after the refcount is bumped successfully, then cleared when the THP is freed or split. The soft offline path doesn't need this since soft offline handler just marks a subpage hwpoisoned when the subpage is migrated successfully. But shmem THP didn't get split then migrated at all. Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28bpf: Add bpf_kallsyms_lookup_name helperKumar Kartikeya Dwivedi
This helper allows us to get the address of a kernel symbol from inside a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can relocate typeless ksym vars. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
2021-10-28bpf: Add bloom filter map implementationJoanne Koong
This patch adds the kernel-side changes for the implementation of a bpf bloom filter map. The bloom filter map supports peek (determining whether an element is present in the map) and push (adding an element to the map) operations.These operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_UPDATE_ELEM -> push The bloom filter map does not have keys, only values. In light of this, the bloom filter map's API matches that of queue stack maps: user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem APIs to query or add an element to the bloom filter map. When the bloom filter map is created, it must be created with a key_size of 0. For updates, the user will pass in the element to add to the map as the value, with a NULL key. For lookups, the user will pass in the element to query in the map as the value, with a NULL key. In the verifier layer, this requires us to modify the argument type of a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in the syscall layer, we need to copy over the user value so that in bpf_map_peek_elem, we know which specific value to query. A few things to please take note of: * If there are any concurrent lookups + updates, the user is responsible for synchronizing this to ensure no false negative lookups occur. * The number of hashes to use for the bloom filter is configurable from userspace. If no number is specified, the default used will be 5 hash functions. The benchmarks later in this patchset can help compare the performance of using different number of hashes on different entry sizes. In general, using more hashes decreases both the false positive rate and the speed of a lookup. * Deleting an element in the bloom filter map is not supported. * The bloom filter map may be used as an inner map. * The "max_entries" size that is specified at map creation time is used to approximate a reasonable bitmap size for the bloom filter, and is not otherwise strictly enforced. If the user wishes to insert more entries into the bloom filter than "max_entries", they may do so but they should be aware that this may lead to a higher false positive rate. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
2021-10-28Merge branch irq/misc-5.16 into irq/irqchip-nextMarc Zyngier
* irq/misc-5.16: : . : Misc irqchip fixes for 5.16: : - MAINTAINERS update for the ARM VIC DT binding : - Allow drivers using the IRQCHIP_PLATFORM_DRIVER_BEGIN/END : infrastructure to use COMPILE_TEST without CONFIG_OF : - DT updates : - Detangle h8300 linux/irqchip.h inclusion : . h8300: Fix linux/irqchip.h include mess dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings irqchip: Fix compile-testing without CONFIG_OF MAINTAINERS: update arm,vic.yaml reference Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable") 4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28Merge branch irq/irq_cpu_offline into irq/irqchip-nextMarc Zyngier
* irq/irq_cpu_offline: : . : Make irq_cpu_{on,off}line() deprecated kernel API, and only : enable it for some obscure Cavium platform after having : moved all the other users away from it. : : Next step, drop the platform itself. : . genirq: Hide irq_cpu_{on,off}line() behind a deprecated option irqchip/mips-gic: Get rid of the reliance on irq_cpu_online() MIPS: loongson64: Drop call to irq_cpu_offline() Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-28Merge branch irq/remove-handle-domain-irq-20211026 into irq/irqchip-nextMarc Zyngier
* irq/remove-handle-domain-irq-20211026: : Large rework of the architecture entry code from Mark Rutland. : From the cover letter: : : <quote> : The handle_domain_{irq,nmi}() functions were oringally intended as a : convenience, but recent rework to entry code across the kernel tree has : demonstrated that they cause more pain than they're worth and prevent : architectures from being able to write robust entry code. : : This series reworks the irq code to remove them, handling the necessary : entry work consistently in entry code (be it architectural or generic). : </quote> MIPS: irq: Avoid an unused-variable error irq: remove handle_domain_{irq,nmi}() irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: riscv: perform irqentry in entry code irq: openrisc: perform irqentry in entry code irq: csky: perform irqentry in entry code irq: arm64: perform irqentry in entry code irq: arm: perform irqentry in entry code irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: add generic_handle_arch_irq() irq: unexport handle_irq_desc() irq: simplify handle_domain_{irq,nmi}() irq: mips: simplify do_domain_IRQ() irq: mips: stop (ab)using handle_domain_irq() irq: mips: simplify bcm6345_l1_irq_handle() irq: mips: avoid nested irq_enter() Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-28Merge tag 'mlx5-net-next-5.15-rc7' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Merge mlx5-next into net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Drivers: hv: vmbus: Remove unused code to check for subchannelsMichael Kelley
The last caller of vmbus_are_subchannels_present() was removed in commit c967590457ca ("scsi: storvsc: Fix a race in sub-channel creation that can cause panic"). Remove this dead code, and the utility function invoke_sc_cb() that it is the only caller of. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1635191674-34407-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-28Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VMTianyu Lan
Mark vmbus ring buffer visible with set_memory_decrypted() when establish gpadl handle. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-5-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-28Merge remote-tracking branch 'tip/x86/cc' into hyperv-nextWei Liu
2021-10-28BackMerge tag 'v5.15-rc7' into drm-nextDave Airlie
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-10-27Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27inet: remove races in inet{6}_getname()Eric Dumazet
syzbot reported data-races in inet_getname() multiple times, it is time we fix this instead of pretending applications should not trigger them. getsockname() and getpeername() are not really considered fast path. v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration needed when CONFIG_CGROUP_BPF=n, as reported by kernel test robot <lkp@intel.com> syzbot typical report: BUG: KCSAN: data-race in __inet_hash_connect / inet_getname write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1: __inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831 inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853 tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275 __inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728 __sys_connect_file net/socket.c:1896 [inline] __sys_connect+0x254/0x290 net/socket.c:1913 __do_sys_connect net/socket.c:1923 [inline] __se_sys_connect net/socket.c:1920 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1920 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0: inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790 __sys_getsockname+0x11d/0x1b0 net/socket.c:1946 __do_sys_getsockname net/socket.c:1961 [inline] __se_sys_getsockname net/socket.c:1958 [inline] __x64_sys_getsockname+0x3e/0x50 net/socket.c:1958 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000 -> 0xdee0 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27block: Add a helper to validate the block sizeXie Yongji
There are some duplicated codes to validate the block size in block drivers. This limitation actually comes from block layer, so this patch tries to add a new block layer helper for that. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>