linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-03-16	svcrdma: De-duplicate code that locates Write and Reply chunks	Chuck Lever
	Cache the locations of the Requester-provided Write list and Reply chunk so that the Send path doesn't need to parse the Call header again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Use struct xdr_stream to decode ingress transport headers	Chuck Lever
	The logic that checks incoming network headers has to be scrupulous. De-duplicate: replace open-coded buffer overflow checks with the use of xdr_stream helpers that are used most everywhere else XDR decoding is done. One minor change to the sanity checks: instead of checking the length of individual segments, cap the length of the whole chunk to be sure it can fit in the set of pages available in rq_pages. This should be a better test of whether the server can handle the chunks in each request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Remove svcrdma_cm_event() trace point	Chuck Lever
	Clean up. This trace point is no longer needed because the RDMA/core CMA code has an equivalent trace point that was added by commit ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection Manager"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Create a generic tracing class for displaying xdr_buf layout	Chuck Lever
	This class can be used to create trace points in either the RPC client or RPC server paths. It simply displays the length of each part of an xdr_buf, which is useful to determine that the transport and XDR codecs are operating correctly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	SUNRPC: Clean up: Replace dprintk and BUG_ON call sites in svcauth_gss.c	Chuck Lever
	Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	SUNRPC: Add xdr_pad_size() helper	Chuck Lever
	Introduce a helper function to compute the XDR pad size of a variable-length XDR object. Clean up: Replace open-coded calculation of XDR pad sizes. I'm sure I haven't found every instance of this calculation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	nfsd: Fix NFSv4 READ on RDMA when using readv	Chuck Lever
	svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	sunrpc: Replace zero-length array with flexible-array member	Gustavo A. R. Silva
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	irqchip/sifive-plic: Enable/Disable external interrupts upon cpu online/offline	Atish Patra
	Currently, PLIC threshold is only initialized once in the beginning. However, threshold can be set to disabled if a CPU is marked offline with CPU hotplug feature. This will not allow to change the irq affinity to a CPU that just came online. Add PLIC specific CPU hotplug callbacks and enable the threshold when a CPU comes online. Take this opportunity to move the external interrupt enable code from trap init to PLIC driver as well. On cpu offline path, the driver performs the exact opposite operations i.e. disable the interrupt and the threshold. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20200302231146.15530-2-atish.patra@wdc.com
2020-03-16	floppy: separate the FDC's base address from its registers	Willy Tarreau
	FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be defined relative to FD_IOPORT, which is the FDC's base address, itself a macro depending on the "fdc" local or global variable. This patch changes this so that the register macros above now only reference the address offset, and that the FDC's address is explicitly passed in each call to fd_inb() and fd_outb(), thus removing the macro. With this change there is no more implicit usage of the local/global "fdc" variable. One place in the ARM code used to check if the port was equal to FD_DOR, this was changed to testing the register by applying a mask to the port, as was already done in the sparc code. There are still occurrences of fd_inb() and fd_outb() in the PARISC code and these ones remain unaffected since they already used to work with a base address and a register offset. The sparc, m68k and parisc code could now be slightly cleaned up to benefit from the macro definitions above instead of the equivalent hard-coded values. Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu Cc: Ian Molton <spyro@f2s.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16	SUNRPC: Remove xdr_buf_read_mic()	Chuck Lever
	Clean up: this function is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16	SUNRPC: Add a flag to avoid reference counts on credentials	Trond Myklebust
	Add a flag to signal to the RPC layer that the credential is already pinned for the duration of the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16	Merge branch 'timers/drivers/timer-ti-dm' into timers/drivers/next	Daniel Lezcano

2020-03-16	clocksource/drivers/timer-ti-dm: Enable autoreload in set_pwm	Lokesh Vutla
	dm timer ops set_load() api allows to configure the load value and to set the auto reload feature. But auto reload feature is independent of load value and should be part of configuring pwm. This way pwm can be disabled by disabling auto reload feature using set_pwm() so that the current pwm cycle will be completed. Else pwm disabling causes the cycle to be stopped abruptly. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200305082715.15861-7-lokeshvutla@ti.com
2020-03-16	clocksource/drivers/timer-ti-dm: Add support to get pwm current status	Lokesh Vutla
	omap_dm_timer_ops provide support to configure the pwm but there is no support to get the current status. For configuring pwm it is advised to check the current hw status instead of relying on pwm framework. So implement a new timer ops to get the current status of pwm. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Tony Lindgen <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200305082715.15861-6-lokeshvutla@ti.com
2020-03-16	clocksource/drivers/timer-ti-dm: Implement cpu_pm notifier for context save ↵	Lokesh Vutla
	and restore omap_dm_timer_enable() restores the entire context(including counter) based on 2 conditions: - If get_context_loss_count is populated and context is lost. - If get_context_loss_count is not populated update unconditionally. Case2 has a side effect of updating the counter register even though context is not lost. When timer is configured in pwm mode, this is causing undesired behaviour in the pwm period. Instead of using get_context_loss_count call back, implement cpu_pm notifier with context save and restore support. And delete the get_context_loss_count callback all together. Suggested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> [tony@atomide.com: removed pm_runtime calls from cpuidle calls] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200316111453.15441-1-lokeshvutla@ti.com
2020-03-16	clocksource/drivers/timer-ti-dm: Prepare for using cpuidle	Tony Lindgren
	Let's add runtime_suspend and resume functions and atomic enabled flag. This way we can use these when converting to use cpuidle for saving and restoring device context. And we need to maintain the driver state in the driver as documented in "9. Autosuspend, or automatically-delayed suspends" in the Documentation/power/runtime_pm.rst document related to using driver private lock and races with runtime_suspend(). Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200305082715.15861-3-lokeshvutla@ti.com
2020-03-16	drm/mm: Allow drm_mm_initialized() to be used outside of the locks	Chris Wilson
	Mark up the potential racy read in drm_mm_initialized(), as we want a cheap and cheerful check: [ 121.098731] BUG: KCSAN: data-race in _i915_gem_object_create_stolen [i915] / rm_hole [ 121.098766] [ 121.098789] write (marked) to 0xffff8881f01ed330 of 8 bytes by task 3568 on cpu 3: [ 121.098831] rm_hole+0x64/0x140 [ 121.098860] drm_mm_insert_node_in_range+0x3d3/0x6c0 [ 121.099254] i915_gem_stolen_insert_node_in_range+0x91/0xe0 [i915] [ 121.099646] _i915_gem_object_create_stolen+0x9d/0x100 [i915] [ 121.100047] i915_gem_object_create_region+0x7a/0xa0 [i915] [ 121.100451] i915_gem_object_create_stolen+0x33/0x50 [i915] [ 121.100849] intel_engine_create_ring+0x1af/0x280 [i915] [ 121.101242] __execlists_context_alloc+0xce/0x3d0 [i915] [ 121.101635] execlists_context_alloc+0x25/0x40 [i915] [ 121.102030] intel_context_alloc_state+0xb6/0xf0 [i915] [ 121.102420] __intel_context_do_pin+0x1ff/0x220 [i915] [ 121.102815] i915_gem_do_execbuffer+0x46b4/0x4c20 [i915] [ 121.103211] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 121.103244] drm_ioctl_kernel+0xe4/0x120 [ 121.103269] drm_ioctl+0x297/0x4c7 [ 121.103296] ksys_ioctl+0x89/0xb0 [ 121.103321] __x64_sys_ioctl+0x42/0x60 [ 121.103349] do_syscall_64+0x6e/0x2c0 [ 121.103377] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 121.103403] [ 121.103426] read to 0xffff8881f01ed330 of 8 bytes by task 3109 on cpu 1: [ 121.103819] _i915_gem_object_create_stolen+0x30/0x100 [i915] [ 121.104228] i915_gem_object_create_region+0x7a/0xa0 [i915] [ 121.104631] i915_gem_object_create_stolen+0x33/0x50 [i915] [ 121.105025] intel_engine_create_ring+0x1af/0x280 [i915] [ 121.105420] __execlists_context_alloc+0xce/0x3d0 [i915] [ 121.105818] execlists_context_alloc+0x25/0x40 [i915] [ 121.106202] intel_context_alloc_state+0xb6/0xf0 [i915] [ 121.106595] __intel_context_do_pin+0x1ff/0x220 [i915] [ 121.106985] i915_gem_do_execbuffer+0x46b4/0x4c20 [i915] [ 121.107375] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 121.107409] drm_ioctl_kernel+0xe4/0x120 [ 121.107437] drm_ioctl+0x297/0x4c7 [ 121.107464] ksys_ioctl+0x89/0xb0 [ 121.107489] __x64_sys_ioctl+0x42/0x60 [ 121.107511] do_syscall_64+0x6e/0x2c0 [ 121.107535] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200309121529.16497-1-chris@chris-wilson.co.uk
2020-03-16	dma-direct: provide a arch_dma_clear_uncached hook	Christoph Hellwig
	This allows the arch code to reset the page tables to cached access when freeing a dma coherent allocation that was set to uncached using arch_dma_set_uncached. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-03-16	dma-direct: make uncached_kernel_address more general	Christoph Hellwig
	Rename the symbol to arch_dma_set_uncached, and pass a size to it as well as allow an error return. That will allow reusing this hook for in-place pagetable remapping. As the in-place remap doesn't always require an explicit cache flush, also detangle ARCH_HAS_DMA_PREP_COHERENT from ARCH_HAS_DMA_SET_UNCACHED. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-03-16	dma-direct: remove the cached_kernel_address hook	Christoph Hellwig
	dma-direct now finds the kernel address for coherent allocations based on the dma address, so the cached_kernel_address hooks is unused and can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-03-16	netlink: add nl_set_extack_cookie_u32()	Michal Kubecek
	Similar to existing nl_set_extack_cookie_u64(), add new helper nl_set_extack_cookie_u32() which sets extack cookie to a u32 value. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16	macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)	Era Mayflower
	Netlink support of extended packet number cipher suites, allows adding and updating XPN macsec interfaces. Added support in: * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites. * Setting and getting 64bit packet numbers with of SAs. * Setting (only on SA creation) and getting ssci of SAs. * Setting salt when installing a SAK. Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1: * MACSEC_CIPHER_ID_GCM_AES_XPN_128 * MACSEC_CIPHER_ID_GCM_AES_XPN_256 In addition, added 2 new netlink attribute types: * MACSEC_SA_ATTR_SSCI * MACSEC_SA_ATTR_SALT Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw. Signed-off-by: Era Mayflower <mayflowerera@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16	macsec: Support XPN frame handling - IEEE 802.1AEbw	Era Mayflower
	Support extended packet number cipher suites (802.1AEbw) frames handling. This does not include the needed netlink patches. * Added xpn boolean field to `struct macsec_secy`. * Added ssci field to `struct_macsec_tx_sa` (802.1AE figure 10-5). * Added ssci field to `struct_macsec_rx_sa` (802.1AE figure 10-5). * Added salt field to `struct macsec_key` (802.1AE 10.7 NOTE 1). * Created pn_t type for easy access to lower and upper halves. * Created salt_t type for easy access to the "ssci" and "pn" parts. * Created `macsec_fill_iv_xpn` function to create IV in XPN mode. * Support in PN recovery and preliminary replay check in XPN mode. In addition, according to IEEE 802.1AEbw figure 10-5, the PN of incoming frame can be 0 when XPN cipher suite is used, so fixed the function `macsec_validate_skb` to fail on PN=0 only if XPN is off. Signed-off-by: Era Mayflower <mayflowerera@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16	Merge tag 'usb-for-v5.7' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next Felipe writes: USB: changes for v5.7 merge window Lots of changes on dwc3 this time, most of them from Thinh fixing a bunch of really old mishaps on the driver. DWC2 got support for STM32MP15 and a couple RockChip SoCs while DWC3 learned about Amlogic A1 family. Apart from these, we have a few spelling fixes and other minor non-critical fixes all over the place. Signed-off-by: Felipe Balbi <balbi@kernel.org> * tag 'usb-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb: (41 commits) dt-bindings: usb: add documentation for aspeed usb-vhub ARM: dts: aspeed-g4: add vhub port and endpoint properties ARM: dts: aspeed-g5: add vhub port and endpoint properties ARM: dts: aspeed-g6: add usb functions usb: gadget: aspeed: add ast2600 vhub support usb: gadget: aspeed: read vhub properties from device tree usb: gadget: aspeed: support per-vhub usb descriptors usb: gadget: f_phonet: Replace zero-length array with flexible-array member usb: gadget: composite: Inform controller driver of self-powered usb: gadget: amd5536udc: fix spelling mistake "reserverd" -> "reserved" udc: s3c-hsudc: Silence warning about supplies during deferred probe usb: dwc2: Silence warning about supplies during deferred probe dt-bindings: usb: dwc2: add compatible property for rk3368 usb dt-bindings: usb: dwc2: add compatible property for rk3328 usb usb: gadget: add raw-gadget interface usb: dwc2: Implement set_selfpowered() usb: dwc3: qcom: Replace <linux/clk-provider.h> by <linux/of_clk.h> usb: dwc3: core: don't do suspend for device mode if already suspended usb: dwc3: Rework resets initialization to be more flexible usb: dwc3: Rework clock initialization to be more flexible ...
2020-03-16	clk: imx7d: Add PXP clock	Laurent Pinchart
	The PXP has a single CCGR clock gate, gating both the IPG_CLK_ROOT and the MAIN_AXI_CLK_ROOT. Add a single clock to cover both. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2020-03-15	net: mii: add linkmode_adv_to_mii_adv_x()	Russell King
	Add a helper to convert a linkmode advertisement to a clause 37 advertisement value for 1000base-x and 2500base-x. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15	net: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variant	Russell King
	Add a LPA to linkmode decoder for 1000BASE-X protocols; this decoder only provides the modify semantics similar to other such decoders. This replaces the unused mii_lpa_to_ethtool_lpa_x() helper. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15	Merge tag 'locking-urgent-2020-03-15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Thomas Gleixner: "Fix for yet another subtle futex issue. The futex code used ihold() to prevent inodes from vanishing, but ihold() does not guarantee inode persistence. Replace the inode pointer with a per boot, machine wide, unique inode identifier. The second commit fixes the breakage of the hash mechanism which causes a 100% performance regression" * tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Unbreak futex hashing futex: Fix inode life-time issue
2020-03-15	Merge tag 'iommu-fixes-v5.6-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - Intel VT-d fixes: - RCU list handling fixes - Replace WARN_TAINT with pr_warn + add_taint for reporting firmware issues - DebugFS fixes - Fix for hugepage handling in iova_to_phys implementation - Fix for handling VMD devices, which have a domain number which doesn't fit into 16 bits - Warning message fix - MSI allocation fix for iommu-dma code - Sign-extension fix for io page-table code - Fix for AMD-Vi to properly update the is-running bit when AVIC is used * tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Populate debugfs if IOMMUs are detected iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE iommu/vt-d: Ignore devices with out-of-spec domain number iommu/vt-d: Fix the wrong printing in RHSA parsing iommu/vt-d: Fix debugfs register reads iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: Silence RCU-list debugging warnings iommu/vt-d: Fix RCU-list bugs in intel_iommu_init() iommu/dma: Fix MSI reservation allocation iommu/io-pgtable-arm: Fix IOVA validation for 32-bit iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page iommu/vt-d: Fix RCU list debugging warnings
2020-03-15	netfilter: nf_tables: add nft_set_elem_update_expr() helper function	Pablo Neira Ayuso
	This helper function runs the eval path of the stateful expression of an existing set element. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: nf_tables: statify nft_expr_init()	Pablo Neira Ayuso
	Not exposed anymore to modules, statify this function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: nf_tables: add nft_set_elem_expr_alloc()	Pablo Neira Ayuso
	Add helper function to create stateful expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	nft_set_pipapo: Introduce AVX2-based lookup implementation	Stefano Brivio
	If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using 256-bit wide AVX2 operations for bucket loads and bitwise intersections. In most cases, this implementation consistently outperforms rbtree set instances despite the fact they are configured to use a given, single, ranged data type out of the ones used for performance measurements by the nft_concat_range.sh kselftest. That script, injecting packets directly on the ingoing device path with pktgen, reports, averaged over five runs on a single AMD Epyc 7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below. CONFIG_RETPOLINE was not set here. Note that this is not a fair comparison over hash and rbtree set types: non-ranged entries (used to have a reference for hash types) would be matched faster than this, and matching on a single field only (which is the case for rbtree) is also significantly faster. However, it's not possible at the moment to choose this set type for non-ranged entries, and the current implementation also needs a few minor adjustments in order to match on less than two fields. ---------------.-----------------------------------.------------. AMD Epyc 7402 \| baselines, Mpps \| this patch \| 1 thread \|___________________________________\|____________\| 3.35GHz \| \| \| \| \| \| 768KiB L1D$ \| netdev \| hash \| rbtree \| \| \| ---------------\| hook \| no \| single \| \| pipapo \| type entries \| drop \| ranges \| field \| pipapo \| AVX2 \| ---------------\|--------\|--------\|--------\|--------\|------------\| net,port \| \| \| \| \| \| 1000 \| 19.0 \| 10.4 \| 3.8 \| 4.0 \| 7.5 +87% \| ---------------\|--------\|--------\|--------\|--------\|------------\| port,net \| \| \| \| \| \| 100 \| 18.8 \| 10.3 \| 5.8 \| 6.3 \| 8.1 +29% \| ---------------\|--------\|--------\|--------\|--------\|------------\| net6,port \| \| \| \| \| \| 1000 \| 16.4 \| 7.6 \| 1.8 \| 2.1 \| 4.8 +128% \| ---------------\|--------\|--------\|--------\|--------\|------------\| port,proto \| \| \| \| \| \| 30000 \| 19.6 \| 11.6 \| 3.9 \| 0.5 \| 2.6 +420% \| ---------------\|--------\|--------\|--------\|--------\|------------\| net6,port,mac \| \| \| \| \| \| 10 \| 16.5 \| 5.4 \| 4.3 \| 3.4 \| 4.7 +38% \| ---------------\|--------\|--------\|--------\|--------\|------------\| net6,port,mac, \| \| \| \| \| \| proto 1000 \| 16.5 \| 5.7 \| 1.9 \| 1.4 \| 3.6 +26% \| ---------------\|--------\|--------\|--------\|--------\|------------\| net,mac \| \| \| \| \| \| 1000 \| 19.0 \| 8.4 \| 3.9 \| 2.5 \| 6.4 +156% \| ---------------'--------'--------'--------'--------'------------' A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: flowtable: add tunnel match offload support	wenxu
	This patch support both ipv4 and ipv6 tunnel_id, tunnel_src and tunnel_dst match for flowtable offload Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: Replace zero-length array with flexible-array member	Gustavo A. R. Silva
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) in net/bridge/netfilter/ebtables.c This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: nf_tables: make all set structs const	Florian Westphal
	They do not need to be writeable anymore. v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: nf_tables: make sets built-in	Florian Westphal
	Placing nftables set support in an extra module is pointless: 1. nf_tables needs dynamic registeration interface for sake of one module 2. nft heavily relies on sets, e.g. even simple rule like "nft ... tcp dport { 80, 443 }" will not work with _SETS=n. IOW, either nftables isn't used or both nf_tables and nf_tables_set modules are needed anyway. With extra module: 307K net/netfilter/nf_tables.ko 79K net/netfilter/nf_tables_set.ko text data bss dec filename 146416 3072 545 150033 nf_tables.ko 35496 1817 0 37313 nf_tables_set.ko This patch: 373K net/netfilter/nf_tables.ko 178563 4049 545 183157 nf_tables.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: nft_tunnel: add support for geneve opts	Xin Long
	Like vxlan and erspan opts, geneve opts should also be supported in nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14) allows a geneve packet to carry multiple geneve opts. So with this patch, nftables/libnftnl would do: # nft add table ip filter # nft add chain ip filter input { type filter hook input priority 0 \; } # nft add tunnel filter geneve_02 { type geneve\; id 2\; \ ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \ sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \ opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; } # nft list tunnels table filter table ip filter { tunnel geneve_02 { id 2 ip saddr 192.168.1.1 ip daddr 192.168.1.2 sport 9000 dport 9001 tos 18 ttl 64 flags 1 geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890 } } v1->v2: - no changes, just post it separately. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	netfilter: xtables: Add snapshot of hardidletimer target	Manoj Basapathi
	This is a snapshot of hardidletimer netfilter target. This patch implements a hardidletimer Xtables target that can be used to identify when interfaces have been idle for a certain period of time. Timers are identified by labels and are created when a rule is set with a new label. The rules also take a timeout value (in seconds) as an option. If more than one rule uses the same timer label, the timer will be restarted whenever any of the rules get a hit. One entry for each timer is created in sysfs. This attribute contains the timer remaining for the timer to expire. The attributes are located under the xt_idletimer class: /sys/class/xt_idletimer/timers/<label> When the timer expires, the target module sends a sysfs notification to the userspace, which can then decide what to do (eg. disconnect to save power) Compared to IDLETIMER, HARDIDLETIMER can send notifications when CPU is in suspend too, to notify the timer expiry. v1->v2: Moved all functionality into IDLETIMER module to avoid code duplication per comment from Florian. Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15	usb: gadget: add raw-gadget interface	Andrey Konovalov
	USB Raw Gadget is a kernel module that provides a userspace interface for the USB Gadget subsystem. Essentially it allows to emulate USB devices from userspace. Enabled with CONFIG_USB_RAW_GADGET. Raw Gadget is currently a strictly debugging feature and shouldn't be used in production. Raw Gadget is similar to GadgetFS, but provides a more low-level and direct access to the USB Gadget layer for the userspace. The key differences are: 1. Every USB request is passed to the userspace to get a response, while GadgetFS responds to some USB requests internally based on the provided descriptors. However note, that the UDC driver might respond to some requests on its own and never forward them to the Gadget layer. 2. GadgetFS performs some sanity checks on the provided USB descriptors, while Raw Gadget allows you to provide arbitrary data as responses to USB requests. 3. Raw Gadget provides a way to select a UDC device/driver to bind to, while GadgetFS currently binds to the first available UDC. 4. Raw Gadget uses predictable endpoint names (handles) across different UDCs (as long as UDCs have enough endpoints of each required transfer type). 5. Raw Gadget has ioctl-based interface instead of a filesystem-based one. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
2020-03-14	soc: qcom: apr: Add avs/audio tracking functionality	Sibi Sankar
	Use PDR helper functions to track the protection domains that the apr services are dependent upon on SDM845 SoC, specifically the "avs/audio" service running on ADSP Q6. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Link: https://lore.kernel.org/r/20200312120842.21991-4-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-03-14	soc: qcom: Introduce Protection Domain Restart helpers	Sibi Sankar
	Qualcomm SoCs (starting with MSM8998) allow for multiple protection domains to run on the same Q6 sub-system. This allows for services like ATH10K WLAN FW to have their own separate address space and crash/recover without disrupting the modem and other PDs running on the same sub-system. The PDR helpers introduces an abstraction that allows for tracking/controlling the life cycle of protection domains running on various Q6 sub-systems. Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Link: https://lore.kernel.org/r/20200312120842.21991-2-sibis@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-03-14	net: sched: RED: Introduce an ECN nodrop mode	Petr Machata
	When the RED Qdisc is currently configured to enable ECN, the RED algorithm is used to decide whether a certain SKB should be marked. If that SKB is not ECN-capable, it is early-dropped. It is also possible to keep all traffic in the queue, and just mark the ECN-capable subset of it, as appropriate under the RED algorithm. Some switches support this mode, and some installations make use of it. To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is configured with this flag, non-ECT traffic is enqueued instead of being early-dropped. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14	net: sched: Allow extending set of supported RED flags	Petr Machata
	The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool of global RED flags. These are passed in tc_red_qopt.flags. However none of these qdiscs validate the flag field, and just copy it over wholesale to internal structures, and later dump it back. (An exception is GRED, which does validate for VQs -- however not for the main setup.) A broken userspace can therefore configure a qdisc with arbitrary unsupported flags, and later expect to see the flags on qdisc dump. The current ABI therefore allows storage of several bits of custom data to qdisc instances of the types mentioned above. How many bits, depends on which flags are meaningful for the qdisc in question. E.g. SFQ recognizes flags ECN and HARDDROP, and the rest is not interpreted. If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it, and at the same time it needs to retain the possibility to store 6 bits of uninterpreted data. Likewise RED, which adds a new flag later in this patchset. To that end, this patch adds a new function, red_get_flags(), to split the passed flags of RED-like qdiscs to flags and user bits, and red_validate_flags() to validate the resulting configuration. It further adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14	net: phy: Add XLGMII interface define	Jose Abreu
	Add a define for XLGMII interface. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A small collection of fixes. I'll make another sweep soon to look for more fixes for this -rc series. - Mark device node const in of_clk_get_parent APIs to ease landing changes in users later - Fix flag for Qualcomm SC7180 video clocks where we thought it would never turn off but actually hardware takes care of it - Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because this clk is always on anyway - Correct some bad dt-binding numbers for i.MX8MN SoCs" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8mn: Fix incorrect clock defines clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk of: clk: Make of_clk_get_parent_{count,name}() parameter const
2020-03-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your net-next tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13	sanitize handling of nd->last_type, kill LAST_BIND	Al Viro
	->last_type values are set in 3 places: path_init() (sets to LAST_ROOT), link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND). The are checked in walk_component(), lookup_last() and do_last(). They also get copied to the caller by filename_parentat(). In the last 3 cases the value is what we had at the return from link_path_walk(). In case of walk_component() it's either directly downstream from assignment in link_path_walk() or, when called by lookup_last(), the value we have at the return from link_path_walk(). The value at the entry into link_path_walk() can survive to return only if the pathname contains nothing but slashes. Note that pick_link() never returns such - pure jumps are handled directly. So for the calls of link_path_walk() for trailing symlinks it does not matter what value had been there at the entry; the value at the return won't depend upon it. There are 3 call chains that might have pick_link() storing LAST_BIND: 1) pick_link() from step_into() from walk_component() from link_path_walk(). In that case we will either be parsing the next component immediately after return into link_path_walk(), which will overwrite the ->last_type before anyone has a chance to look at it, or we'll fail, in which case nobody will be looking at ->last_type at all. 2) pick_link() from step_into() from walk_component() from lookup_last(). The value is never looked at due to the above; it won't affect the value seen at return from any link_path_walk(). 3) pick_link() from step_into() from do_last(). Ditto. In other words, assignemnt in pick_link() is pointless, and so is LAST_BIND itself; nothing ever looks at that value. Kill it off. And make link_path_walk() _always_ assign ->last_type - in the only case when the value at the entry might survive to the return that value is always LAST_ROOT, inherited from path_init(). Move that assignment from path_init() into the beginning of link_path_walk(), to consolidate the things. Historical note: LAST_BIND used to be used for the kludge with trailing pure jump symlinks (extra iteration through the top-level loop). No point keeping it anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()	Al Viro
	New LOOKUP flag, telling path_lookupat() to act as path_mountpointat(). IOW, traverse mounts at the final point and skip revalidation of the location where it ends up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>