linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-11-08	Merge branch 'ila-make-identifier-format-optional-and-other-fixes'	David S. Miller
	Tom Herbert says: ==================== ila: make identifier format optional and other fixes The identifier type and checksum neutral mapping bits are optional in identifier formats. This patch set fixes the implementation to make them optional and configurable. Specific items: - Clean up checksum diff code in ILA - Add checksum neutral mapping auto so that checksum neutral mapping can be configured without requiring use of the C-bit - Add identifier type configuration and allow identifier type to be configured so that the identifier type field does not need to be present - Added ILA documention: ila.txt I have fixes for ILA in iproute2 that will be poseted separately. Tested: Ran netperf TCP_RR on various combinations of checksum mode and the two supported identifier types. v2: - Add proper sign off - In ILA LWT, only check prefix length includes identifier type if identifier type is enabled (ILA_ATYPE_USE_FORMAT). - Add a hook type so that it can be specified whether ILA translation is done on input or output route funciton in LWT. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	ila: Add ila.txt	Tom Herbert
	Add documenation for kernel ILA. This describes ILA, features, configuration gives some examples. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	ila: Add a hook type for LWT routes	Tom Herbert
	In LWT tunnels both an input and output route method is defined. If both of these are executed in the same path then double translation happens and the effect is not correct. This patch adds a new attribute that indicates the hook type. Two values are defined for route output and route output. ILA translation is only done for the one that is set. The default is to enable ILA on route output. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	ila: allow configuration of identifier type	Tom Herbert
	Allow identifier to be explicitly configured for a mapping. This can either be one of the identifier types specified in the ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the identifier type is inferred from the identifier type field. If a value other than ILA_ATYPE_USE_FORMAT is set for a mapping then it is assumed that the identifier type field is not present in an identifier. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	ila: add checksum neutral map auto	Tom Herbert
	Add checksum neutral auto that performs checksum neutral mapping without using the C-bit. This is enabled by configuration of a mapping. The checksum neutral function has been split into ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former handles the C-bit and includes it in the adjustment value. The latter just sets the adjustment value on the locator diff only. Added configuration for checksum neutral map aut in ila_lwt and ila_xlat. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	ila: cleanup checksum diff	Tom Herbert
	Consolidate computing checksum diff into one function. Add get_csum_diff_iaddr that computes the checksum diff between an address argument and locator being written. get_csum_diff calls this using the destination address in the IP header as the argument. Also moved ila_init_saved_csum to be close to the checksum diff functions. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07	Input: st1232 - remove obsolete platform device support	Geert Uytterhoeven
	Commit 1fa59bda21c7fa36 ("ARM: shmobile: Remove legacy board code for Armadillo-800 EVA"), removed the last user of st1232_pdata and the "st1232-ts" platform device. All remaining users use DT. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-11-07	Merge tag 'v4.14-rc8' into next	Dmitry Torokhov
	Merge with mainline to bring in SPDX markings to avoid annoying merge problems when some header files get deleted.
2017-11-07	Input: synaptics-rmi4 - RMI4 can also use SMBUS version 3	Yiannis Marangos
	Some Synaptics devices, such as LEN0073, use SMBUS version 3. Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com> Acked-by: Benjamin Tissoires <benjamion.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-11-07	Input: tsc200x-core - set INPUT_PROP_DIRECT	Martin Kepplinger
	If INPUT_PROP_DIRECT is set, userspace doesn't have to fall back to old ways of identifying touchscreen devices. In order to identify a touchscreen device, Android for example, seems to already depend on INPUT_PROP_DIRECT to be present in drivers. udev still checks for either BTN_TOUCH or INPUT_PROP_DIRECT. Checking for BTN_TOUCH however can quite easily lead to false positives; it's a code that not only touchscreen device drivers use. According to the documentation, touchscreen drivers should have this property set and in order to make life easy for userspace, let's set it. Signed-off-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-11-07	Input: elan_i2c - add ELAN060C to the ACPI table	Kai-Heng Feng
	ELAN060C touchpad uses elan_i2c as its driver. It can be found on Lenovo ideapad 320-14AST. BugLink: https://bugs.launchpad.net/bugs/1727544 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-11-08	net/mlx5e/core/en_fs: fix pointer dereference after free in ↵	Gustavo A. R. Silva
	mlx5e_execute_l2_action hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced by accessing hn->ai.addr Fix this by copying the MAC address into a local variable for its safe use in all possible execution paths within function mlx5e_execute_l2_action. Addresses-Coverity-ID: 1417789 Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	xdp: Sample xdp program implementing ip forward	Christina Jacob
	Implements port to port forwarding with route table and arp table lookup for ipv4 packets using bpf_redirect helper function and lpm_trie map. Signed-off-by: Christina Jacob <Christina.Jacob@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	net: fec: Let fec_ptp have its own interrupt routine	Troy Kisky
	This is better for code locality and should slightly speed up normal interrupts. This also allows PPS clock output to start working for i.mx7. This is because i.mx7 was already using the limit of 3 interrupts, and needed another. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	net: mvpp2: Prevent userspace from changing TX affinities	Marc Zyngier
	The mvpp2 driver can't cope at all with the TX affinities being changed from userspace, and spit an endless stream of [ 91.779920] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing [ 91.779930] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780402] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780406] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780415] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780418] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing rendering the box completely useless (I've measured around 600k interrupts/s on a 8040 box) once irqbalance kicks in and start doing its job. Obviously, the driver was never designed with this in mind. So let's work around the problem by preventing userspace from interacting with these interrupts altogether. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	Merge branch 'hv_netvsc-fix-a-hang-on-channel-mtu-changes'	David S. Miller
	Vitaly Kuznetsov says: ==================== hv_netvsc: fix a hang on channel/mtu changes It was found that netvsc driver doesn't survive e.g. test. I was able to identify a hang in guest/host communication, it is fixed by PATCH1 of this series. PATCH2 is a cosmetic change masking unneeded messages. Changes since v1: - Throw away patches 2 and 3 of the original series as one is unneeded and the other is not justified [Eric Dumazet, Stephen Hemminger] so I'm only fixing the hang now, the crash doesn't reproduce. Will keep an eye on it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	hv_netvsc: hide warnings about uninitialized/missing rndis device	Vitaly Kuznetsov
	Hyper-V hosts are known to send RNDIS messages even after we halt the device in rndis_filter_halt_device(). Remove user visible messages as they are not really useful. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	hv_netvsc: netvsc_teardown_gpadl() split	Vitaly Kuznetsov
	It was found that in some cases host refuses to teardown GPADL for send/ receive buffers (probably when some work with these buffere is scheduled or ongoing). Change the teardown logic to be: 1) Send NVSP_MSG1_TYPE_REVOKE_* messages 2) Close the channel 3) Teardown GPADLs. This seems to work reliably. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	net: phy: leds: Add support for "link" trigger	Maciej S. Szmigiero
	Currently, we create a LED trigger for any link speed known to a PHY. These triggers only fire when their exact link speed had been negotiated (they aren't cumulative, that is, they don't fire for "their or any higher" link speed). What we are missing, however, is a trigger which will fire on any link speed known to the PHY. Such trigger can then be used for implementing a poor man's substitute of the "link" LED on boards that lack it. Let's add it. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08	net: phy: leds: Refactor "no link" handler into a separate function	Maciej S. Szmigiero
	Currently, phy_led_trigger_change_speed() is handling a "no link" condition like it was some kind of an error (using "goto" to a code at the function end). However, having no link at PHY is an ordinary operational state, so let's handle it in an appropriately named separate function so it is more obvious what the code is doing. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07	PCI: Move PCI_QUIRKS to the PCI bus menu	Randy Dunlap
	Localize PCI_QUIRKS in the PCI bus menu. Move PCI_QUIRKS to the PCI bus menu instead of the (often broken) General Setup EXPERT menu. The prompt still depends on EXPERT. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-11-07	alpha/PCI: Make pdev_save_srm_config() static	Bjorn Helgaas
	pdev_save_srm_config() and struct pdev_srm_saved_conf are only used in arch/alpha/kernel/pci.c, so make them static there. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2017-11-07	PCI: Remove unused declarations	Bjorn Helgaas
	Remove these unused declarations: pcibios_config_init() # never defined anywhere pcibios_scan_root() # only defined by x86 pcibios_get_irq_routing_table() # only defined by x86 pcibios_set_irq_routing() # only defined by x86 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2017-11-07	PCI: Remove redundant pci_dev, pci_bus, resource declarations	Bjorn Helgaas
	<linux/pci.h> defines struct pci_bus and struct pci_dev and includes the struct resource definition before including <asm/pci.h>. Nobody includes <asm/pci.h> directly, so they don't need their own declarations. Remove the redundant struct pci_dev, pci_bus, resource declarations. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> # CRIS Acked-by: Ralf Baechle <ralf@linux-mips.org> # MIPS
2017-11-07	PCI: Remove redundant pcibios_set_master() declarations	Bjorn Helgaas
	All users of pcibios_set_master() include <linux/pci.h>, which already has a declaration. Remove the unnecessary declarations from the <asm/pci.h> files. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> # CRIS Acked-by: Ralf Baechle <ralf@linux-mips.org> # MIPS
2017-11-07	PCI/PME: Handle invalid data when reading Root Status	Qiang
	PCIe PME and native hotplug share the same interrupt number, so hotplug interrupts are also processed by PME. In some cases, e.g., a Link Down interrupt, a device may be present but unreachable, so when we try to read its Root Status register, the read fails and we get all ones data (0xffffffff). Previously, we interpreted that data as PCI_EXP_RTSTA_PME being set, i.e., "some device has asserted PME," so we scheduled pcie_pme_work_fn(). This caused an infinite loop because pcie_pme_work_fn() tried to handle PME requests until PCI_EXP_RTSTA_PME is cleared, but with the link down, PCI_EXP_RTSTA_PME can't be cleared. Check for the invalid 0xffffffff data everywhere we read the Root Status register. 1469d17dd341 ("PCI: pciehp: Handle invalid data when reading from non-existent devices") added similar checks in the hotplug driver. Signed-off-by: Qiang Zheng <zhengqiang10@huawei.com> [bhelgaas: changelog, also check in pcie_pme_work_fn(), use "~0" to follow other similar checks] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-11-07	MAINTAINERS: Remove Gabriele Paoloni as HiSilicon PCI maintainer	Gabriele Paoloni
	Gabriele is now moving to a different role, so remove him as HiSilicon PCI maintainer. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> [bhelgaas: Thanks for all your help, Gabriele, and best wishes!] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Zhou Wang <wangzhou1@hisilicon.com>
2017-11-07	MAINTAINERS: Remove Stephen Bates as Microsemi Switchtec maintainer	Sebastian Andrzej Siewior
	Just sent an email there and received an autoreply because he no longer works there. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-11-07	PCI: hv: Use effective affinity mask	Dexuan Cui
	The effective_affinity_mask is always set when an interrupt is assigned in __assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic apic_physflat: -> default_cpu_mask_to_apicid() -> irq_data_update_effective_affinity(), but it looks d->common->affinity remains all-1's before the user space or the kernel changes it later. In the early allocation/initialization phase of an IRQ, we should use the effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3 VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts. Tested-by: Adrian Suhov <v-adsuho@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Cc: stable@vger.kernel.org Cc: Jork Loeser <jloeser@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com>
2017-11-08	PM / Domains: Remove gpd_dev_ops.active_wakeup() callback	Geert Uytterhoeven
	There are no more users left of the gpd_dev_ops.active_wakeup() callback. All have been converted to GENPD_FLAG_ACTIVE_WAKEUP. Hence remove the callback. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08	soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP	Geert Uytterhoeven
	Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own flag-based callback. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08	soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP	Geert Uytterhoeven
	Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own flag-based callback. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08	ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP	Geert Uytterhoeven
	Set the newly introduced GENPD_FLAG_ACTIVE_WAKEUP, which allows to remove the driver's own "always true" callback. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-08	PM / Domains: Allow genpd users to specify default active wakeup behavior	Geert Uytterhoeven
	It is quite common for PM Domains to require slave devices to be kept active during system suspend if they are to be used as wakeup sources. To enable this, currently each PM Domain or driver has to provide its own gpd_dev_ops.active_wakeup() callback. Introduce a new flag GENPD_FLAG_ACTIVE_WAKEUP to consolidate this. If specified, all slave devices configured as wakeup sources will be kept active during system suspend. PM Domains that need more fine-grained controls, based on the slave device, can still provide their own callbacks, as before. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-07	MIPS: BCM47XX: Fix LED inversion for WRT54GSv1	Mirko Parthey
	The WLAN LED on the Linksys WRT54GSv1 is active low, but the software treats it as active high. Fix the inverted logic. Fixes: 7bb26b169116 ("MIPS: BCM47xx: Fix LEDs on WRT54GS V1.0") Signed-off-by: Mirko Parthey <mirko.parthey@web.de> Looks-ok-by: Rafał Miłecki <zajec5@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.17+ Patchwork: https://patchwork.linux-mips.org/patch/16071/ Signed-off-by: James Hogan <jhogan@kernel.org>
2017-11-07	MIPS: Lantiq: Fix ASC0/ASC1 clocks	Martin Schiller
	ASC1 is available on every Lantiq SoC (also AmazonSE) and should be enabled like the other generic xway clocks instead of ASC0, which is only available for AR9 and Danube. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: John Crispin <john@phrozen.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Felix Fietkau <nbd@nbd.name> Cc: Martin Schiller <ms@dev.tdt.de> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16145/ [jhogan@kernel.org: Drop braces] Signed-off-by: James Hogan <jhogan@kernel.org>
2017-11-07	SUNRPC: Improve ordering of transport processing	Trond Myklebust
	Since it can take a while before a specific thread gets scheduled, it is better to just implement a first come first served queue mechanism. That way, if a thread is already scheduled and is idle, it can pick up the work to do from the queue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	nfsd: deal with revoked delegations appropriately	Andrew Elble
	If a delegation has been revoked by the server, operations using that delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1 case, and NFS4ERR_BAD_STATEID otherwise. The server needs NFSv4.1 clients to explicitly free revoked delegations. If the server returns NFS4ERR_DELEG_REVOKED, the client will do that; otherwise it may just forget about the delegation and be unable to recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a SEQUENCE reply. That can cause the Linux 4.1 client to loop in its stage manager. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	svcrdma: Enqueue after setting XPT_CLOSE in completion handlers	Chuck Lever
	I noticed the server was sometimes not closing the connection after a flushed Send. For example, if the client responds with an RNR NAK to a Reply from the server, that client might be deadlocked, and thus wouldn't send any more traffic. Thus the server wouldn't have any opportunity to notice the XPT_CLOSE bit has been set. Enqueue the transport so that svcxprt notices the bit even if there is no more transport activity after a flushed completion, QP access error, or device removal event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	nfsd: use nfs->ns.inum as net ID	Vasily Averin
	Publishing of net pointer is not safe, let's use nfs->ns.inum instead Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	rpc: remove some BUG()s	J. Bruce Fields
	It would be kinder to WARN() and recover in several spots here instead of BUG()ing. Also, it looks like the read_u32_from_xdr_buf() call could actually fail, though it might require a broken (or malicious) client, so convert that to just an error return. Reported-by: Weston Andros Adamson <dros@monkey.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	svcrdma: Preserve CB send buffer across retransmits	Chuck Lever
	During each NFSv4 callback Call, an RDMA Send completion frees the page that contains the RPC Call message. If the upper layer determines that a retransmit is necessary, this is too soon. One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 callback request, the following BUG fires on the NFS server: kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 kernel: flags: 0x2fffff80000000() kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 kernel: page dumped because: nonzero _refcount kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: Call Trace: kernel: dump_stack+0x62/0x80 kernel: bad_page+0xfe/0x11a kernel: free_pages_check_bad+0x76/0x78 kernel: free_pcppages_bulk+0x364/0x441 kernel: ? ttwu_do_activate.isra.61+0x71/0x78 kernel: free_hot_cold_page+0x1c5/0x202 kernel: __put_page+0x2c/0x36 kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] This issue exists all the way back to v4.5, but refactoring and code re-organization prevents this simple patch from applying to kernels older than v4.12. The fix is the same, however, if someone needs to backport it. Reported-by: Ben Coddington <bcodding@redhat.com> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314 Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	nfds: avoid gettimeofday for nfssvc_boot time	Arnd Bergmann
	do_gettimeofday() is deprecated and we should generally use time64_t based functions instead. In case of nfsd, all three users of nfssvc_boot only use the initial time as a unique token, and are not affected by it overflowing, so they are not affected by the y2038 overflow. This converts the structure to timespec64 anyway and adds comments to all uses, to document that we have thought about it and avoid having to look at it again. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t	Elena Reshetova
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_file.fi_ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t	Elena Reshetova
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t	Elena Reshetova
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_stid.sc_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	lockd: double unregister of inetaddr notifiers	Vasily Averin
	lockd_up() can call lockd_unregister_notifiers twice: inside lockd_start_svc() when it calls lockd_svc_exit_thread() and then in error path of lockd_up() Patch forces lockd_start_svc() to unregister notifiers in all error cases and removes extra unregister in error path of lockd_up(). Fixes: cb7d224f82e4 "lockd: unregister notifier blocks if the service ..." Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	nfsd4: catch some false session retries	J. Bruce Fields
	The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that the client is making a call that matches a previous (slot, seqid) pair but that isn't actually a replay, because some detail of the call doesn't actually match the previous one. Catching every such case is difficult, but we may as well catch a few easy ones. This also handles the case described in the previous patch, in a different way. The spec does however require us to catch the case where the difference is in the rpc credentials. This prevents somebody from snooping another user's replies by fabricating retries. (But the practical value of the attack is limited by the fact that the replies with the most sensitive data are READ replies, which are not normally cached.) Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	nfsd4: fix cached replies to solo SEQUENCE compounds	J. Bruce Fields
	Currently our handling of 4.1+ requests without "cachethis" set is confusing and not quite correct. Suppose a client sends a compound consisting of only a single SEQUENCE op, and it matches the seqid in a session slot (so it's a retry), but the previous request with that seqid did not have "cachethis" set. The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP, but the protocol only allows that to be returned on the op following the SEQUENCE, and there is no such op in this case. The protocol permits us to cache replies even if the client didn't ask us to. And it's easy to do so in the case of solo SEQUENCE compounds. So, when we get a solo SEQUENCE, we can either return the previously cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some way from the original call. Currently, we're returning a corrupt reply in the case a solo SEQUENCE matches a previous compound with more ops. This actually matters because the Linux client recently started doing this as a way to recover from lost replies to idempotent operations in the case the process doing the original reply was killed: in that case it's difficult to keep the original arguments around to do a real retry, and the client no longer cares what the result is anyway, but it would like to make sure that the slot's sequence id has been incremented, and the solo SEQUENCE assures that: if the server never got the original reply, it will increment the sequence id. If it did get the original reply, it won't increment, and nothing else that about the reply really matters much. But we can at least attempt to return valid xdr! Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07	sunrcp: make function _svc_create_xprt static	Colin Ian King
	The function _svc_create_xprt is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol '_svc_create_xprt' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>