linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-03-09	libnvdimm, pmem: clear poison on write	Dan Williams
	If a write is directed at a known bad block perform the following: 1/ write the data 2/ send a clear poison command 3/ invalidate the poison out of the cache hierarchy Cc: <x86@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	Doc: nfs: Fix typos in Documentation/filesystems/nfs	Masanari Iida
	This patch fix spelling typos found in Documentation/filesystems/nfs Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-09	libnvdimm, pmem: fix kmap_atomic() leak in error path	Dan Williams
	When we enounter a bad block we need to kunmap_atomic() before returning. Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	nvdimm/btt: don't allocate unused major device number	NeilBrown
	alloc_disk(0) does not require or use a ->major number, all devices are allocated with a major of BLOCK_EXT_MAJOR. So don't allocate btt_major. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	nvdimm/blk: don't allocate unused major device number	NeilBrown
	When alloc_disk(0) is used ->major is completely ignored, all devices are allocated with a "major" of BLOCK_EXT_MAJOR. So don't allocate nd_blk_major Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	ACPI / OSL: Add support to install tables via initrd	Lv Zheng
	This patch adds support to install tables from initrd. If a table in the initrd wasn't used by the override mechanism, the table would be installed after initializing all RSDT/XSDT tables. Link: https://lkml.org/lkml/2014/2/28/368 Reported-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / OSL: Clean up initrd table override code	Lv Zheng
	This patch cleans up the initrd table override code by merging redundant logics and re-ordering code blocks. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	PNP / ACPI: add ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid type	Harb Abdulhamid
	An error message is printed for resources of type 19, which is a valid supported resource type. The Firmware Test Suite tool (fwts) reports this as a test failure. This change fixes the false test failures for ASL that use type 19 (ACPI_RESOURCE_TYPE_SERIAL_BUS) resources. Signed-off-by: Harb Abdulhamid <harba@codeaurora.org> Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / util: remove redundant check if element is NULL	Colin Ian King
	element is &package->package.elements[i] which can never be NULL so the check to see if it is NULL is redundant and can be removed. Detected with static analysis by CoverityScan Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI: Add acpi_force_32bit_fadt_addr option to force 32 bit FADT addresses	Colin Ian King
	Some HP laptops seem to have invalid 64 bit FADT X_PM* addresses which are causing various boot issues. In these cases, it would be useful to force ACPI to use the valid legacy 32 bit equivalent PM addresses. Add a acpi_force_32bit_fadt_addr to set the ACPICA acpi_gbl_use32_bit_fadt_addresses to TRUE to force this override. Link: https://bugs.launchpad.net/bugs/1529381 Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	drivers/acpi: make pmic/intel_pmic_crc.c explicitly non-modular	Paul Gortmaker
	The Kconfig currently controlling compilation of this code is: drivers/acpi/Kconfig:config CRC_PMIC_OPREGION drivers/acpi/Kconfig: bool "ACPI operation region support for CrystalCove PMIC" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple modular references, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We also delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	drivers/acpi: make apei/ghes.c more explicitly non-modular	Paul Gortmaker
	The Kconfig currently controlling compilation of this code is: config ACPI_APEI_GHES bool "APEI Generic Hardware Error Source" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We replace module.h with moduleparam.h as we are keeping the pre-existing module_param that the file has, as currently that is the easiest way to maintain compatibility with the existing boot arg use cases. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	drivers/acpi: make bgrt driver explicitly non-modular	Paul Gortmaker
	The Kconfig for this driver is currently: config ACPI_BGRT bool "Boottime Graphics Resource Table support" ...meaning that it currently is not being built as a module by anyone. Lets remove all modular references, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We also delete the MODULE_LICENSE tag etc. since all that information was (or is now) contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / CPPC: use MRTT/MPAR to decide if/when a req can be sent	Prakash, Prashanth
	The ACPI spec defines Minimum Request Turnaround Time(MRTT) and Maximum Periodic Access Rate(MPAR) to prevent the OSPM from sending too many requests than the platform can handle. For further details on these parameters please refer to section 14.1.3 of ACPI 6.0 spec. This patch includes MRTT/MPAR in deciding if or when a CPPC request can be sent to the platform to make sure CPPC implementation is compliant to the spec. Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / CPPC: replace writeX/readX to PCC with relaxed version	Prakash, Prashanth
	We do not have a strict read/write order requirement while accessing PCC subspace. The only requirement is all access should be committed before triggering the PCC doorbell to transfer the ownership of PCC to the platform and this requirement is enforced by the PCC driver. Profiling on a many core system shows improvement of about 1.8us on average per freq change request(about 10% improvement on average). Since these operations are executed while holding the pcc_lock, reducing this time helps the CPPC implementation to scale much better as the number of cores increases. Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	mailbox: pcc: optimized pcc_send_data	Prakash, Prashanth
	pcc_send_data() can be invoked during the execution of performance critical code as in cppc_cpufreq driver. With acpi_* APIs, the doorbell register accessed in pcc_send_data() if present in system memory will be searched (in cached virt to phys addr mapping), mapped, read/written and then unmapped. These operations take significant amount of time. This patch maps the performance critical doorbell register during init and then reads/writes to it directly using the mapped virtual address. This patch + similar changes to CPPC acpi driver reduce the time per freq. transition from around 200us to about 20us for the CPPC cpufreq driver Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / CPPC: optimized cpc_read and cpc_write	Prakash, Prashanth
	cpc_read and cpc_write are used while holding the pcc_lock spin_lock, so they need to be as fast as possible. acpi_os_read/write_memory APIs linearly search through a list for cached mapping which is quite expensive. Since the PCC subspace is already mapped into virtual address space during initialization, we can just add the offset and access the necessary CPPC registers. This patch + similar changes to PCC driver reduce the time per freq. transition from around 200us to about 20us for the CPPC cpufreq driver. Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / CPPC: Optimize PCC Read Write operations	Ashwin Chaugule
	Previously the send_pcc_cmd() code checked if the PCC operation had completed before returning from the function. This check was performed regardless of the PCC op type (i.e. Read/Write). Knowing the type of cmd can be used to optimize the check and avoid needless waiting. e.g. with Write ops, the actual Writing is done before calling send_pcc_cmd(). And the subsequent Writes will check if the channel is free at the entry of send_pcc_cmd() anyway. However, for Read cmds, we need to wait for the cmd completion bit to be flipped, since the actual Read ops follow after returning from the send_pcc_cmd(). So, only do the looping check at the end for Read ops. Also, instead of using udelay() calls, use ktime as a means to check for deadlines. The current deadline in which the Remote should flip the cmd completion bit is defined as N * Nominal latency. Where N is arbitrary and large enough to work on slow emulators and Nominal latency comes from the ACPI table (PCCT). This helps in working around the CONFIG_HZ effects on udelay() and also avoids needing different ACPI tables for Silicon and Emulation platforms. Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	Documentation: kselftest: Remove duplicate word	Zhiyi Sun
	Remove duplicate word "for" in kselftest.txt. Signed-off-by: Zhiyi Sun <zhiyisun@msn.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-09	doc: fix grammar	Javi Merino
	Some minor typos: - make is unbindable -> make it unbindable - a underlying -> an underlying - different version -> different versions Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-09	Documentation: Howto: Fixed subtitles style	Philippe Loctaux
	Fixed subtitles style, aligned them with their header. Signed-off-by: Philippe Loctaux <phil@philippeloctaux.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-09	ACPI / video: remove unused device_decode array	Colin Ian King
	device_decode is now no longer used, so we may as well remove it. Fixes gcc 6 warning: drivers/acpi/acpi_video.c:221:19: warning: ‘device_decode’ defined but not used [-Wunused-const-variable] static const char device_decode[][30] = { ^~~~~~~~~~~~~ Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	ACPI / EC: Deny write access unless requested by module param	Oleg Drokin
	In debugfs it's not enough to just set file mode to read-only to deny write access to a file, instead just don't provide the write method unless write access is really requested. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Acked-by: Thomas Renninger <trenn@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	NFC: microread: Drop platform data header file	Jean Delvare
	Originally I only wanted to drop the unneeded inclusion of <linux/i2c.h>, but then noticed that struct microread_nfc_platform_data isn't actually used, and MICROREAD_DRIVER_NAME is redefined in the only file where it is used, so we can get rid of the header file and dead code altogether. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-03-09	ACPI / fan: Make struct dev_pm_ops const	Kaiyen Chang
	Silence the following checkpatch warning: WARNING: struct dev_pm_ops should normally be const. Signed-off-by: Kaiyen Chang <kaiyen.chang@intel.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	Merge tag 'for-v4.5-rc/omap-critical-fixes-a' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending into fixes ARM: OMAP2+: critical DRA7xx fix for v4.5-rc Force the DRA7xx Ethernet internal clock source to stay enabled per TI erratum i877: http://www.ti.com/lit/er/sprz429h/sprz429h.pdf Otherwise, if the Ethernet internal clock source is disabled, the chip will age prematurely, and the RGMII I/O timing will soon fail to meet the delay time and skew specifications for 1000Mbps Ethernet. This fix should go in as soon as possible. Basic build, boot, and PM test results are available here: http://www.pwsan.com/omap/testlogs/omap-critical-fixes-for-v4.5-rc/20160307014209/ * tag 'for-v4.5-rc/omap-critical-fixes-a' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending: ARM: dts: dra7: do not gate cpsw clock due to errata i877 ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property Signed-off-by: Olof Johansson <olof@lixom.net>
2016-03-09	Merge branch 'kcm'	David S. Miller
	Tom Herbert says: ==================== kcm: Kernel Connection Multiplexor (KCM) Kernel Connection Multiplexor (KCM) is a facility that provides a message based interface over TCP for generic application protocols. The motivation for this is based on the observation that although TCP is byte stream transport protocol with no concept of message boundaries, a common use case is to implement a framed application layer protocol running over TCP. To date, most TCP stacks offer byte stream API for applications, which places the burden of message delineation, message I/O operation atomicity, and load balancing in the application. With KCM an application can efficiently send and receive application protocol messages over TCP using a datagram interface. In order to delineate message in a TCP stream for receive in KCM, the kernel implements a message parser. For this we chose to employ BPF which is applied to the TCP stream. BPF code parses application layer messages and returns a message length. Nearly all binary application protocols are parsable in this manner, so KCM should be applicable across a wide range of applications. Other than message length determination in receive, KCM does not require any other application specific awareness. KCM does not implement any other application protocol semantics-- these are are provided in userspace or could be implemented in a kernel module layered above KCM. KCM implements an NxM multiplexor in the kernel as diagrammed below: +------------+ +------------+ +------------+ +------------+ \| KCM socket \| \| KCM socket \| \| KCM socket \| \| KCM socket \| +------------+ +------------+ +------------+ +------------+ \| \| \| \| +-----------+ \| \| +----------+ \| \| \| \| +----------------------------------+ \| Multiplexor \| +----------------------------------+ \| \| \| \| \| +---------+ \| \| \| ------------+ \| \| \| \| \| +----------+ +----------+ +----------+ +----------+ +----------+ \| Psock \| \| Psock \| \| Psock \| \| Psock \| \| Psock \| +----------+ +----------+ +----------+ +----------+ +----------+ \| \| \| \| \| +----------+ +----------+ +----------+ +----------+ +----------+ \| TCP sock \| \| TCP sock \| \| TCP sock \| \| TCP sock \| \| TCP sock \| +----------+ +----------+ +----------+ +----------+ +----------+ The KCM sockets provide the datagram interface to applications, Psocks are the state for each attached TCP connection (i.e. where message delineation is performed on receive). A description of the APIs and design can be found in the included Documentation/networking/kcm.txt. In this patch set: - Add MSG_BATCH flag. This is used in sendmsg msg_hdr flags to indicate that more messages will be sent on the socket. The stack may batch messages up if it is beneficial for transmission. - In sendmmsg, set MSG_BATCH in all sub messages except for the last one. - In order to allow sendmmsg to contain multiple messages with SOCK_SEQPAKET we allow each msg_hdr in the sendmmsg to set MSG_EOR. - Add KCM module - This supports SOCK_DGRAM and SOCK_SEQPACKET. - KCM documentation v2: - Added splice and page operations. - Assemble receive messages in place on TCP socket (don't have a separate assembly queue. - Based on above, enforce maxmimum receive message to be the size of the recceive socket buffer. - Support message assembly timeout. Use the timeout value in sk_rcvtimeo on the TCP socket. - Tested some with a couple of other production applications, see ~5% improvement in application latency. Testing: Dave Watson has integrated KCM into Thrift and we intend to put these changes into open source. Example of this is in: https://github.com/djwatson/fbthrift/commit/ dd7e0f9cf4e80912fdb90f6cd394db24e61a14cc Some initial KCM Thrift benchmark numbers (comment from Dave) Thrift by default ties a single connection to a single thread. KCM is instead able to load balance multiple connections across multiple epoll loops easily. A test sending ~5k bytes of data to a kcm thrift server, dropping the bytes on recv: QPS Latency / std dev Latency without KCM 70336 209/123 with KCM 70353 191/124 A test sending a small request, then doing work in the epoll thread, before serving more requests: QPS Latency / std dev Latency without KCM 14282 559/602 with KCM 23192 344/234 At the high end, there's definitely some additional kernel overhead: Cranking the pipelining way up, with lots of small requests QPS Latency / std dev Latency without KCM 1863429 127/119 with KCM 1337713 192/241 --- So for a "realistic" workload, KCM performs pretty well (second case). Under extreme conditions of highest tps we still have some work to do. In its nature a multiplexor will spread work between CPUs which is logically good for load balancing but coan conflict with the goal promoting affinity. Batching messages on both send and receive are the means to recoup performance. Future support: - Integration with TLS (TLS-in-kernel is a separate initiative). - Page operations/splice support - Unconnected KCM sockets. Will be able to attach sockets to different destinations, AF_KCM addresses with be used in sendmsg and recvmsg to indicate destination - Explore more utility in performing BPF inline with a TCP data stream (setting SO_MARK, rxhash for messages being sent received on KCM sockets). - Performance work - Diagnose performance issues under high message load FAQ (Questions posted on LWN) Q: Why do this in the kernel? A: Because the kernel is good at scheduling threads and steering packets to threads. KCM fits well into this model since it allows the unit of work for scheduling and steering to be the application layer messages themselves. KCM should be thought of as generic application protocol acceleration. It to the philosophy that the kernel provides generic and extensible interfaces. Q: How can adding code in the path yield better performance? A: It is true that for just sending receiving a single message there would be some performance loss since the code path is longer (for instance comparing netperf to KCM). But for real production applications performance takes on many dynamics. Parallelism, context switching, affinity, granularity of locking, and load balancing are all relevant. The theory of KCM is that by an application-centric interface, the kernel can provide better support for these performance characteristics. Q: Why not use an existing message-oriented protocol such as RUDP, DCCP, SCTP, RDS, and others? A: Because that would entail using a completely new transport protocol. Deploying a new protocol at scale is either a huge undertaking or fundamentally infeasible. This is true in either the Internet and in the data center due in a large part to protocol ossification. Besides, KCM we want KCM to work existing, well deployed application protocols that we couldn't change even if we wanted to (e.g. http/2). KCM simply defines a new interface method, it does not redefine any aspect of the transport protocol nor application protocol, nor set any new requirements on these. Neither does KCM attempt to implement any application protocol logic other than message deliniation in the stream. These are fundamental requirement of KCM. Q: How does this affect TCP? A: It doesn't, not in the slightest. The use of KCM can be one-sided, KCM has no effect on the wire. Q: Why force TCP into doing something it's not designed for? A: TCP is defined as transport protocol and there is no standard that says the API into TCP must be stream based sockets, or for that matter sockets at all (or even that TCP needs to be implemented in a kernel). KCM is not inconsistent with the design of TCP just because to makes an message based interface over TCP, if it were then every application protocol sending messages over TCP would also be! :-) Q: What about the problem of a connections with very slow rate of incoming data? As a result your application can get storms of very short reads. And it actually happens a lot with connection from mobile devices and it is a problem for servers handling a lot of connections. A: The storm of short reads will occur regardless of whether KCM is used or not. KCM does have one advantage in this scenario though, it will only wake up the application when a full message has been received, not for each packet that makes up part of a bigger messages. If a bunch of small messages are received, the application can receive messages in batches using recvmmsg. Q: Why not just use DPDK, or at least provide KCM like functionality in DPDK? A: DPDK, or more generally OS bypass presumably with a TCP stack in userland, presents a different model of load balancing than that of KCM (and the kernel). KCM implements load balancing of messages across the threads of an application, whereas DPDK load balances based on queues which are more static and coarse-grained since multiple connections are bound to queues. DPDK works best when processing of packets is silo'ed in a thread on the CPU processing a queue, and packet processing (for both the stack and application) is fairly uniform. KCM works well for applications where the amount of work to process messages varies an application work is commonly delegated to worker threads often on different CPUs. The message based interface over TCP is something that could be provide by a DPDK or OS bypass library. Q: I'm not quite seeing this for HTTP. Maybe for HTTP/2, I guess, or web sockets? A: Yes. KCM is most appropriate for message based protocols over TCP where is easy to deduce the message length (e.g. a length field) and the protocol implements its own message ordering semantics. Fortunately this encompasses many modern protocols. Q: How is memory limited and controlled? A: In v2 all data for messages is now kept in socket buffers, either those for TCP or KCM, so socket buffer limits are applicable. This includes receive messages assembly which is now done ont teh TCP socket buffer instead of a separate queue-- this has the consequence that the TCP socket buffer limit provides an enforceable maxmimum message size. Additionally, a timeout may be set for messages assembly. The value used for this is taken from sk_rcvtimeo of the TCP socket. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Add description in Documentation	Tom Herbert
	Add kcm.txt to desribe KCM and interfaces. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Add receive message timeout	Tom Herbert
	This patch adds receive timeout for message assembly on the attached TCP sockets. The timeout is set when a new messages is started and the whole message has not been received by TCP (not in the receive queue). If the completely message is subsequently received the timer is cancelled, if the timer expires the RX side is aborted. The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket to KCM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Add memory limit for receive message construction	Tom Herbert
	Message assembly is performed on the TCP socket. This is logically equivalent of an application that performs a peek on the socket to find out how much memory is needed for a receive buffer. The receive socket buffer also provides the maximum message size which is checked. The receive algorithm is something like: 1) Receive the first skbuf for a message (or skbufs if multiple are needed to determine message length). 2) Check the message length against the number of bytes in the TCP receive queue (tcp_inq()). - If all the bytes of the message are in the queue (incluing the skbuf received), then proceed with message assembly (it should complete with the tcp_read_sock) - Else, mark the psock with the number of bytes needed to complete the message. 3) In TCP data ready function, if the psock indicates that we are waiting for the rest of the bytes of a messages, check the number of queued bytes against that. - If there are still not enough bytes for the message, just return - Else, clear the waiting bytes and proceed to receive the skbufs. The message should now be received in one tcp_read_sock Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Sendpage support	Tom Herbert
	Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both dgram and seqpacket ops. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Splice support	Tom Herbert
	Implement kcm_splice_read. This is supported only for seqpacket. Add kcm_seqpacket_ops and set splice read to kcm_splice_read. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Add statistics and proc interfaces	Tom Herbert
	This patch adds various counters for KCM. These include counters for messages and bytes received or sent, as well as counters for number of attached/unattached TCP sockets and other error or edge events. The statistics are exposed via a proc interface. /proc/net/kcm provides statistics per KCM socket and per psock (attached TCP sockets). /proc/net/kcm_stats provides aggregate statistics. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	kcm: Kernel Connection Multiplexor module	Tom Herbert
	This module implements the Kernel Connection Multiplexor. Kernel Connection Multiplexor (KCM) is a facility that provides a message based interface over TCP for generic application protocols. With KCM an application can efficiently send and receive application protocol messages over TCP using datagram sockets. For more information see the included Documentation/networking/kcm.txt Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	tcp: Add tcp_inq to get available receive bytes on socket	Tom Herbert
	Create a common kernel function to get the number of bytes available on a TCP socket. This is based on code in INQ getsockopt and we now call the function for that getsockopt. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	net: Walk fragments in __skb_splice_bits	Tom Herbert
	Add walking of fragments in __skb_splice_bits. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	net: Add MSG_BATCH flag	Tom Herbert
	Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to indicate that more messages will follow (i.e. a batch of messages is being sent). This is similar to MSG_MORE except that the following messages are not merged into one packet, they are sent individually. sendmmsg is updated so that each contained message except for the last one is marked as MSG_BATCH. MSG_BATCH is a performance optimization in cases where a socket implementation can benefit by transmitting packets in a batch. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	net: Allow MSG_EOR in each msghdr of sendmmsg	Tom Herbert
	This patch allows setting MSG_EOR in each individual msghdr passed in sendmmsg. This allows a sendmmsg to send multiple messages when using SOCK_SEQPACKET. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	net: Make sock_alloc exportable	Tom Herbert
	Export it for cases where we want to create sockets by hand. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	rcu: Add list_next_or_null_rcu	Tom Herbert
	This is a convenience function that returns the next entry in an RCU list or NULL if at the end of the list. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09	Merge tag 'pci-v4.5-fixes-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Here's another fix for v4.5. It fixes an ARM regression in v4.0 that causes many boxes to crash on boot, including cns3xxx, dove, footbridge, iopl13xx, ip32x, iop33x, ixp4xx, ks8695, mv78xx0, orion5x, pxa, sa1100, etc. The change is in code that's only built for ARM and ARM64. Summary: Enumeration: Allow generic PCI domains without bridge "parent" pointer (Krzysztof Hałasa)" * tag 'pci-v4.5-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr()
2016-03-09	livepatch: Fix the error message about unresolvable ambiguity	Petr Mladek
	klp_find_callback() stops the search when sympos is not defined and a second symbol of the same name is found. It means that the current error message about the unresolvable ambiguity always prints "(2 matches)". Let's remove this information. The total number of occurrences is not much helpful. The author of the patch still must put a non-trivial effort into searching the right position in the object file. [jkosina@suse.cz: fixed grammar as suggested by Josh] Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-09	Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"	Viresh Kumar
	Revert commit 3510fac45492 (cpufreq: postfix policy directory with the first CPU in related_cpus). Earlier, the policy->kobj was added to the kobject core, before ->init() callback was called for the cpufreq drivers. Which allowed those drivers to add or remove, driver dependent, sysfs files/directories to the same kobj from their ->init() and ->exit() callbacks. That isn't possible anymore after commit 3510fac45492. Now, there is no other clean alternative that people can adopt. Its better to revert the earlier commit to allow cpufreq drivers to create/remove sysfs files from ->init() and ->exit() callbacks. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09	pmem: don't allocate unused major device number	NeilBrown
	When alloc_disk(0) or alloc_disk-node(0, XX) is used, the ->major number is completely ignored: all devices are allocated with a major of BLOCK_EXT_MAJOR. So there is no point allocating pmem_major. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	ACPI: Change NFIT driver to insert new resource	Toshi Kani
	ACPI 6 defines persistent memory (PMEM) ranges in multiple firmware interfaces, e820, EFI, and ACPI NFIT table. This EFI change, however, leads to hit a bug in the grub bootloader, which treats EFI_PERSISTENT_MEMORY type as regular memory and corrupts stored user data [1]. Therefore, BIOS may set generic reserved type in e820 and EFI to cover PMEM ranges. The kernel can initialize PMEM ranges from ACPI NFIT table alone. This scheme causes a problem in the iomem table, though. On x86, for instance, e820_reserve_resources() initializes top-level entries (iomem_resource.child) from the e820 table at early boot-time. This creates "reserved" entry for a PMEM range, which does not allow region_intersects() to check with PMEM type. Change acpi_nfit_register_region() to call acpi_nfit_insert_resource(), which calls insert_resource() to insert a PMEM entry from NFIT when the iomem table does not have a PMEM entry already. That is, when a PMEM range is marked as reserved type in e820, it inserts "Persistent Memory" entry, which results as follows. + "Persistent Memory" + "reserved" This allows the EINJ driver, which calls region_intersects() to check PMEM ranges, to work continuously even if BIOS sets reserved type (or sets nothing) to PMEM ranges in e820 and EFI. [1]: https://lists.gnu.org/archive/html/grub-devel/2015-11/msg00209.html Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	resource: Export insert_resource and remove_resource	Toshi Kani
	insert_resource() and remove_resouce() are called by producers of resources, such as FW modules and bus drivers. These modules may be implemented as loadable modules. Export insert_resource() and remove_resouce() so that they can be called from such modules. link: https://lkml.org/lkml/2016/3/8/872 Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	resource: Add remove_resource interface	Toshi Kani
	insert_resource() and insert_resource_conflict() are called by resource producers to insert a new resource. When there is any conflict, they move conflicting resources down to the children of the new resource. There is no destructor of these interfaces, however. Add remove_resource(), which removes a resource previously inserted by insert_resource() or insert_resource_conflict(), and moves the children up to where they were before. __release_resource() is changed to have @release_child, so that this function can be used for remove_resource() as well. Also add comments to clarify that these functions are intended for producers of resources to avoid any confusion with request/release_resource() for consumers. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	resource: Change __request_region to inherit from immediate parent	Toshi Kani
	__request_region() sets 'flags' of a new resource from @parent as it inherits the parent's attribute. When a target resource has a conflict, this function inserts the new resource entry under the conflicted entry by updating @parent. In this case, the new resource entry needs to inherit attribute from the updated parent. This conflict is a typical case since __request_region() is used to allocate a new resource from a specific resource range. For instance, request_mem_region() calls __request_region() with @parent set to &iomem_resource, which is the root entry of the whole iomem range. When this request results in inserting a new entry "DEV-A" under "BUS-1", "DEV-A" needs to inherit from the immediate parent "BUS-1" as it holds specific attribute for the range. root (&iomem_resource) : + "BUS-1" + "DEV-A" Change __request_region() to set 'flags' and 'desc' of a new entry from the immediate parent. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09	iwlwifi: mvm: update GSCAN capabilities	Ayala Beker
	Gscan capabilities were updated with new capabilities supported by the device. While at it, simplify the firmware support conditional and move both conditions into the WARN() to make it easier to undertand and use the unlikely() for both. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2016-03-09	iwlwifi: mvm: don't try to offload AES-CMAC in AP/IBSS modes	Johannes Berg
	The firmware/hardware only supports checking AES-CMAC on RX, not using it on TX. For station mode this is fine, since it's the only thing it will ever do. For AP mode, it never receives such frames, but must be able to transmit them. This is currently broken since we try to enable them for hardware crypto (for RX only) and then treat them as TX_CMD_SEC_EXT, leading to FIFO underruns during TX so the frames never go out to the air. To fix this, simply use software on TX in AP (and IBSS) mode. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>