summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-09-23clk: change the type of clk_hw_onecell_data.num to unsigned intMasahiro Yamada
The "num" is the number of clk_hw entries in the structure, so "unsigned int" would be a better fit. (size_t looks like data size we count by byte.) Besides, struct clk_onecell_data already uses unsigned int for "clk_num". Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-09-23IB/core: remove ib_get_dma_mrChristoph Hellwig
We now only use it from ib_alloc_pd to create a local DMA lkey if the device doesn't provide one, or a global rkey if the ULP requests it. This patch removes ib_get_dma_mr and open codes the functionality in ib_alloc_pd so that we can simplify the code and prevent abuse of the functionality. As a side effect we can also simplify things by removing the valid access bit check, and the PD refcounting. In the future I hope to also remove the per-PD global MR entirely by shifting this work into the HW drivers, as one step towards avoiding the struct ib_mr overload for various different use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: rename pd->local_mr to pd->__internal_mrChristoph Hellwig
This has two reasons: a) to clearly mark that drivers don't have any business using it, and b) because we're going to use it for the (dangerous) global rkey soon, so that drivers don't create on themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23blkcg: Annotate blkg_hint correctlyBart Van Assche
Avoid that sparse complains about blkg_hint manipulations. Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23rxrpc: Add a tracepoint to log which packets will be retransmittedDavid Howells
Add a tracepoint to log in rxrpc_resend() which packets will be retransmitted. Note that if a positive ACK comes in whilst we have dropped the lock to retransmit another packet, the actual retransmission may not happen, though some of the effects will (such as altering the congestion management). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add tracepoint for ACK proposalDavid Howells
Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint to log injected Rx packet lossDavid Howells
Add a tracepoint to log received packets that get discarded due to Rx packet loss. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepointDavid Howells
Add a tracepoint to log transmission of DATA packets (including loss injection). Adjust the ACK transmission tracepoint to include the packet serial number and to line this up with the DATA transmission display. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint for the call timerDavid Howells
Add a tracepoint to log call timer initiation, setting and expiry. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23svcrdma: support Remote InvalidationChuck Lever
Support Remote Invalidation. A private message is exchanged with the client upon RDMA transport connect that indicates whether Send With Invalidation may be used by the server to send RPC replies. The invalidate_rkey is arbitrarily chosen from among rkeys present in the RPC-over-RDMA header's chunk lists. Send With Invalidate improves performance only when clients can recognize, while processing an RPC reply, that an rkey has already been invalidated. That has been submitted as a separate change. In the future, the RPC-over-RDMA protocol might support Remote Invalidation properly. The protocol needs to enable signaling between peers to indicate when Remote Invalidation can be used for each individual RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23rpcrdma: RDMA/CM private message data structureChuck Lever
Introduce data structure used by both client and server to exchange implementation details during RDMA/CM connection establishment. This is an experimental out-of-band exchange between Linux RPC-over-RDMA Version One implementations, replacing the deprecated CCP (see RFC 5666bis). The purpose of this extension is to enable prototyping of features that might be introduced in a subsequent version of RPC-over-RDMA. Suggested by Christoph Hellwig and Devesh Sharma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23svcrdma: Tail iovec leaves an orphaned DMA mappingChuck Lever
The ctxt's count field is overloaded to mean the number of pages in the ctxt->page array and the number of SGEs in the ctxt->sge array. Typically these two numbers are the same. However, when an inline RPC reply is constructed from an xdr_buf with a tail iovec, the head and tail often occupy the same page, but each are DMA mapped independently. In that case, ->count equals the number of pages, but it does not equal the number of SGEs. There's one more SGE, for the tail iovec. Hence there is one more DMA mapping than there are pages in the ctxt->page array. This isn't a real problem until the server's iommu is enabled. Then each RPC reply that has content in that iovec orphans a DMA mapping that consists of real resources. krb5i and krb5p always populate that tail iovec. After a couple million sent krb5i/p RPC replies, the NFS server starts behaving erratically. Reboot is needed to clear the problem. Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23Merge tag 'v4.8-rc6' into develLinus Walleij
Linux 4.8-rc6
2016-09-23Merge branch 'gpio-irq-validmask' of /home/linus/linux-pinctrl into develLinus Walleij
2016-09-23gpiolib: Make it possible to exclude GPIOs from IRQ domainMika Westerberg
When using GPIO irqchip helpers to setup irqchip for a gpiolib based driver, it is not possible to select which GPIOs to add to the IRQ domain. Instead it just adds all GPIOs which is not always desired. For example there might be GPIOs that for some reason cannot generated normal interrupts at all. To support this we add a flag irq_need_valid_mask to struct gpio_chip. When this flag is set the core allocates irq_valid_mask that holds one bit for each GPIO the chip has. By default all bits are set but drivers can manipulate this using set_bit() and clear_bit() accordingly. Then when gpiochip_irqchip_add() is called, this mask is checked and all GPIOs with bit is set are added to the IRQ domain created for the GPIO chip. Suggested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-09-23bpf: add helper to invalidate hashDaniel Borkmann
Add a small helper that complements 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") for invalidating the current skb->hash after mangling on headers via direct packet write. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port fast ageingVivien Didelot
Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides the generic switch logic. Introduce a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net_sched: act_vlan: add helper inlines to access tcf_vlan infoOr Gerlitz
Needed e.g for offloading drivers to pick the relevant attributes. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net_sched: sch_fq: account for schedule/timers driftsEric Dumazet
It looks like the following patch can make FQ very precise, even in VM or stressed hosts. It matters at high pacing rates. We take into account the difference between the time that was programmed when last packet was sent, and current time (a drift of tens of usecs is often observed) Add an EWMA of the unthrottle latency to help diagnostics. This latency is the difference between current time and oldest packet in delayed RB-tree. This accounts for the high resolution timer latency, but can be different under stress, as fq_check_throttled() can be opportunistically be called from a dequeue() called after an enqueue() for a different flow. Tested: // Start a 10Gbit flow $ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr & Before patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17106.04 756876.84 1102.75 1119049.02 0.00 0.00 0.52 After patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17867.00 800245.90 1151.77 1183172.12 0.00 0.00 0.52 A new iproute2 tc can output the 'unthrottle latency' : $ tc -s qd sh dev eth0 | grep latency 0 gc, 0 highprio, 32490767 throttled, 2382 ns latency Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23Merge tag 'linux-can-fixes-for-4.8-20160922' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-09-22 this is a pull request of one patch for the upcoming linux-4.8 release. The patch by Sergei Miroshnichenko fixes a potential deadlock in the generic CAN device code that cann occour after a bus-off. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23sctp: improve how SSN, TSN and ASCONF serial are comparedMarcelo Ricardo Leitner
Make it similar to time_before() macros: - easier to understand - make use of typecheck() to avoid working on unexpected variable types (made the issue on previous patch visible) - for _[lg]te versions, slighly faster, as the compiler used to generate a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle (for _lte): Before, for sctp_outq_sack: if (primary->cacc.changeover_active) { 1f01: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1f08: 74 6e je 1f78 <sctp_outq_sack+0xe8> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1f0a: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax return ((s) - (t)) & TSN_SIGN_BIT; } static inline int TSN_lte(__u32 s, __u32 t) { return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT); 1f10: 8b 7d bc mov -0x44(%rbp),%edi 1f13: 39 c7 cmp %eax,%edi 1f15: 74 25 je 1f3c <sctp_outq_sack+0xac> 1f17: 39 f8 cmp %edi,%eax 1f19: 78 21 js 1f3c <sctp_outq_sack+0xac> primary->cacc.changeover_active = 0; After: if (primary->cacc.changeover_active) { 1ee7: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1eee: 74 73 je 1f63 <sctp_outq_sack+0xf3> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1ef0: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax 1ef6: 2b 45 b4 sub -0x4c(%rbp),%eax 1ef9: 85 c0 test %eax,%eax 1efb: 7e 26 jle 1f23 <sctp_outq_sack+0xb3> primary->cacc.changeover_active = 0; *_lt() generated pretty much the same code. Tested with gcc (GCC) 6.1.1 20160621. This patch also removes SSN_lte as it is not used and cleanups some comments. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-09-23mtd: nand: Provide nand_cleanup() function to free NAND related resourcesRichard Weinberger
Provide a nand_cleanup() function to free all nand related resources without unregistering the mtd device. This should allow drivers to call mtd_device_unregister() and handle its return value and still being able to cleanup all nand related resources. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Daniel Walter <dwalter@sigma-star.at> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Add an option to maximize the ECC strengthBoris Brezillon
The generic NAND DT bindings allows one to tweak the ECC strength and step size to their need. It can be used to lower the ECC strength to match a bootloader/firmware config, but might also be used to get a better reliability. In the latter case, the user might want to use the maximum ECC strength without having to explicitly calculate the exact value (this value not only depends on the OOB size, but also on the NAND controller, and can be tricky to extract). Add a generic 'nand-ecc-maximize' DT property and the associated NAND_ECC_MAXIMIZE flag, to let ECC controller drivers select the best ECC strength and step-size on their own. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Rob Herring <robh@kernel.org>
2016-09-23mtd: nand: automate NAND timings selectionBoris Brezillon
The NAND framework provides several helpers to query timing modes supported by a NAND chip, but this implies that all NAND controller drivers have to implement the same timings selection dance. Also currently NAND devices can be resetted at arbitrary places which also resets the timing for ONFI chips to timing mode 0. Provide a common logic to select the best timings based on ONFI or ->onfi_timing_mode_default information. Hook this into nand_reset() to make sure the new timing is applied each time during a reset. NAND controller willing to support timings adjustment should just implement the ->setup_data_interface() method. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2016-09-23mtd: nand: Expose data interface for ONFI mode 0Sascha Hauer
The nand layer will need ONFI mode 0 to use it as timing mode before and right after reset. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Add function to convert ONFI mode to data_interfaceSascha Hauer
onfi_init_data_interface() initializes a data interface with values from a given ONFI mode. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Introduce nand_data_interfaceSascha Hauer
Currently we have no data structure to fully describe a NAND timing. We only have struct nand_sdr_timings for NAND timings in SDR mode, but nothing for DDR mode and also no container to store both types of timing. This patch adds struct nand_data_interface which stores the timing type and a union of different timings. This can be used to pass to drivers in order to configure the timing. Add kerneldoc for struct nand_sdr_timings while touching it anyway. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Create a NAND reset functionSascha Hauer
When NAND devices are resetted some initialization may have to be done, like for example they have to be configured for the timing mode that shall be used. To get a common place where this initialization can be implemented create a nand_reset() function. This currently only issues a NAND_CMD_RESET to the NAND device. The places issuing this command manually are replaced with a call to nand_reset(). Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: remove unnecessary 'extern' from function declarationsSascha Hauer
'extern' is not necessary for function declarations. To prevent people from adding the keyword to new declarations remove the existing ones. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: import nand_hw_control_init()Marc Gonzalez
The code to initialize a struct nand_hw_control is duplicated across several drivers. Factorize it using an inline function. Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23netfilter: nf_queue: improve queue range support for bridge familyLiping Zhang
After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace"), we can queue packets to the user space in bridge family. But when the user specify the queue range, packets will be only delivered to the first queue num. Because in nfqueue_hash, we only support ipv4 and ipv6 family. Now add support for bridge family too. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nft_queue: add _SREG_QNUM attr to select the queue numberLiping Zhang
Currently, the user can specify the queue numbers by _QUEUE_NUM and _QUEUE_TOTAL attributes, this is enough in most situations. But acctually, it is not very flexible, for example: tcp dport 80 mapped to queue0 tcp dport 81 mapped to queue1 tcp dport 82 mapped to queue2 In order to do this thing, we must add 3 nft rules, and more mapping meant more rules ... So take one register to select the queue number, then we can add one simple rule to mapping queues, maybe like this: queue num tcp dport map { 80:0, 81:1, 82:2 ... } Florian Westphal also proposed wider usage scenarios: queue num jhash ip saddr . ip daddr mod ... queue num meta cpu ... queue num meta mark ... The last point is how to load a queue number from sreg, although we can use *(u16*)&regs->data[reg] to load the queue number, just like nat expr to load its l4port do. But we will cooperate with hash expr, meta cpu, meta mark expr and so on. They all store the result to u32 type, so cast it to u16 pointer and dereference it will generate wrong result in the big endian system. So just keep it simple, we treat queue number as u32 type, although u16 type is already enough. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nf_tables: validate maximum value of u32 netlink attributesLaura Garcia Liebana
Fetch value and validate u32 netlink attribute. This validation is usually required when the u32 netlink attributes are being stored in a field whose size is smaller. This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes"). Fixes: 96518518cc41 ("netfilter: add nftables") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-22drm: Fix plane type uabi breakageDaniel Vetter
Turns out assuming that only stuff in uabi is uabi is a bit naive, and we have a bunch of properties for which the enum values are placed in random headers. A proper fix would be to split out uapi include headers, but meanwhile sprinkle at least some warning over them. Fixes: 532b36712ddf ("drm/doc: Polish for drm_plane.[hc]") Cc: Archit Taneja <architt@codeaurora.org> Cc: Sean Paul <seanpaul@chromium.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/1474612525-9488-1-git-send-email-daniel.vetter@ffwll.ch
2016-09-23Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22Merge branch 'nsfs-ioctls' into HEADEric W. Biederman
From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com>
2016-09-22nsfs: add ioctl to get a parent namespaceAndrey Vagin
Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships. In a future we will use this interface to dump and restore nested namespaces. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get an owning user namespace for ns file descriptorAndrey Vagin
Each namespace has an owning user namespace and now there is not way to discover these relationships. Understending namespaces relationships allows to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? After a long discussion, Eric W. Biederman proposed to use ioctl-s for this purpose. The NS_GET_USERNS ioctl returns a file descriptor to an owning user namespace. It returns EPERM if a target namespace is outside of a current user namespace. v2: rename parent to relative v3: Add a missing mntput when returning -EAGAIN --EWB Acked-by: Serge Hallyn <serge@hallyn.com> Link: https://lkml.org/lkml/2016/7/6/158 Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22kernel: add a helper to get an owning user namespace for a namespaceAndrey Vagin
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22PCI: pciehp: Allow exclusive userspace control of indicatorsKeith Busch
PCIe hotplug supports optional Attention and Power Indicators, which are used internally by pciehp. Users can't control the Power Indicator, but they can control the Attention Indicator by writing to a sysfs "attention" file. The Slot Control register has two bits for each indicator, and the PCIe spec defines the encodings for each as (Reserved/On/Blinking/Off). For sysfs "attention" writes, pciehp_set_attention_status() maps into these encodings, so the only useful write values are 0 (Off), 1 (On), and 2 (Blinking). However, some platforms use all four bits for platform-specific indicators, and they need to allow direct user control of them while preventing pciehp from using them at all. Add a "hotplug_user_indicators" flag to the pci_dev structure. When set, pciehp does not use either the Attention Indicator or the Power Indicator, and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values are written directly to the Attention Indicator Control and Power Indicator Control fields. [bhelgaas: changelog, rename flag and accessors to s/attention/indicator/] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-22blk-mq: add flag for drivers wanting blocking ->queue_rq()Jens Axboe
If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its ->queue_rq() handler. For that case, blk-mq ensures that we always calls it from a safe context. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-by: Josef Bacik <jbacik@fb.com>
2016-09-22nfs: allow blocking locks to be awoken by lock callbacksJeff Layton
Add a waitqueue head to the client structure. Have clients set a wait on that queue prior to requesting a lock from the server. If the lock is blocked, then we can use that to wait for wakeups. Note that we do need to do this "manually" since we need to set the wait on the waitqueue prior to requesting the lock, but requesting a lock can involve activities that can block. However, only do that for NFSv4.1 locks, either by compiling out all of the waitqueue handling when CONFIG_NFS_V4_1 is disabled, or skipping all of it at runtime if we're dealing with v4.0, or v4.1 servers that don't send lock callbacks. Note too that even when we expect to get a lock callback, RFC5661 section 20.11.4 is pretty clear that we still need to poll for them, so we do still sleep on a timeout. We do however always poll at the longest interval in that case. Signed-off-by: Jeff Layton <jlayton@redhat.com> [Anna: nfs4_retry_setlk() "status" should default to -ERESTARTSYS] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constantJeff Layton
As defined in RFC 5661, section 18.16. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22percpu: improve generic percpu modify-return implementationNicholas Piggin
Some architectures require an additional load to find the address of percpu pointers. In some implemenatations, the C aliasing rules do not allow the result of that load to be kept over the store that modifies the percpu variable, which causes additional loads. Work around this by finding the pointer first, then operating on that. It's also possible to mark things as restrict and those kind of games, but that can require larger and arch specific changes. On powerpc, __this_cpu_inc_return compiles to: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 ld 9,48(13) ldx 3,9,3 With this patch it compiles to: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> To: Tejun Heo <tj@kernel.org> To: Christoph Lameter <cl@linux.com> Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-22Merge tag 'media/v4.8-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - several fixes for new drivers added for Kernel 4.8 addition (cec core, pulse8 cec driver and Mediatek vcodec) - a regression fix for cx23885 and saa7134 drivers - an important fix for rcar-fcp, making rcar_fcp_enable() return 0 on success * tag 'media/v4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits) [media] cx23885/saa7134: assign q->dev to the PCI device [media] rcar-fcp: Make sure rcar_fcp_enable() returns 0 on success [media] cec: fix ioctl return code when not registered [media] cec: don't Feature Abort broadcast msgs when unregistered [media] vcodec:mediatek: Refine VP8 encoder driver [media] vcodec:mediatek: Refine H264 encoder driver [media] vcodec:mediatek: change H264 profile default to profile high [media] vcodec:mediatek: Add timestamp and timecode copy for V4L2 Encoder [media] vcodec:mediatek: Fix visible_height larger than coded_height issue in s_fmt_out [media] vcodec:mediatek: Fix fops_vcodec_release flow for V4L2 Encoder [media] vcodec:mediatek:code refine for v4l2 Encoder driver [media] cec-funcs.h: add missing vendor-specific messages [media] cec-edid: check for IEEE identifier [media] pulse8-cec: fix error handling [media] pulse8-cec: set correct Signal Free Time [media] mtk-vcodec: add HAS_DMA dependency [media] cec: ignore messages when log_addr_mask == 0 [media] cec: add item to TODO [media] cec: set unclaimed addresses to CEC_LOG_ADDR_INVALID [media] cec: add CEC_LOG_ADDRS_FL_ALLOW_UNREG_FALLBACK flag ...
2016-09-22libata: remove <asm-generic/libata-portmap.h>Christoph Hellwig
asm-generic is only intended for architecture defaults, and we can simply kill it off by moving the two defintions directly to <linux/libata.h>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Mostly small bits scattered all over the place, which is usually how things go this late in the -rc series. 1) Proper driver init device resets in bnx2, from Baoquan He. 2) Fix accounting overflow in __tcp_retransmit_skb(), sk_forward_alloc, and ip_idents_reserve, from Eric Dumazet. 3) Fix crash in bna driver ethtool stats handling, from Ivan Vecera. 4) Missing check of skb_linearize() return value in mac80211, from Johannes Berg. 5) Endianness fix in nf_table_trace dumps, from Liping Zhang. 6) SSN comparison fix in SCTP, from Marcelo Ricardo Leitner. 7) Update DSA and b44 MAINTAINERS entries. 8) Make input path of vti6 driver work again, from Nicolas Dichtel. 9) Off-by-one in mlx4, from Sebastian Ott. 10) Fix fallback route lookup handling in ipv6, from Vincent Bernat. 11) Fix stack corruption on probe in qed driver, from Yuval Mintz. 12) PHY init fixes in r8152 from Hayes Wang. 13) Missing SKB free in irda_accept error path, from Phil Turnbull" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) tcp: properly account Fast Open SYN-ACK retrans tcp: fix under-accounting retransmit SNMP counters MAINTAINERS: Update b44 maintainer. net: get rid of an signed integer overflow in ip_idents_reserve() net/mlx4_core: Fix to clean devlink resources net: can: ifi: Configure transmitter delay vti6: fix input path ipmr, ip6mr: return lastuse relative to now r8152: disable ALDPS and EEE before setting PHY r8152: remove r8153_enable_eee r8152: move PHY settings to hw_phy_cfg r8152: move enabling PHY r8152: move some functions cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter qed: Fix stack corruption on probe MAINTAINERS: Add an entry for the core network DSA code net: ipv6: fallback to full lookup if table lookup is unsuitable net/mlx5: E-Switch, Handle mode change failures net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code net/mlx5: Fix flow counter bulk command out mailbox allocation ...
2016-09-22timekeeping: Include the correct header for errno definitionsChristoph Hellwig
asm-generic headers are only defaults for architectures. We need to get the proper defintion, which goes through <linux/errno.h> and <asm/errno.h>. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1474555697-8206-1-git-send-email-hch@lst.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>