summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-09-24net: Update API for VF vlan protocol 802.1ad supportMoshe Shemesh
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving the ability for user-space application to specify it for the VF, as an option to support 802.1ad. We adjusted IP Link tool to support this option. For future use cases, the new UAPI supports multiple vlans. For now we limit the list size to a single vlan in kernel. Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward compatibility with older versions of IP Link tool. Add a vlan protocol parameter to the ndo_set_vf_vlan callback. We kept 802.1Q as the drivers' default vlan protocol. Suitable ip link tool command examples: Set vf vlan protocol 802.1ad: ip link set eth0 vf 1 vlan 100 proto 802.1ad Set vf to VST (802.1Q) mode: ip link set eth0 vf 1 vlan 100 proto 802.1Q Or by omitting the new parameter ip link set eth0 vf 1 vlan 100 Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24net/mlx4_core: Preparation for VF vlan protocol 802.1adMoshe Shemesh
Check device capability to support VF vlan protocol 802.1ad mode. Add vport attribute vlan protocol. Init vport vlan protocol by default to 802.1Q. Add update QP support for VF vlan protocol 802.1ad. Add func capability vlan_offload_disable to disable all vlan HW acceleration on VF while the VF is set to VF vlan protocol 802.1ad mode. No change in VF vlan protocol 802.1Q (VST) mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24lockdep: make MAX_LOCKDEP_SUBCLASSES unconditionally visibleBartosz Golaszewski
This define is needed by i2c_adapter_depth() to detect if we don't exceed the maximum number of lock subclasses. Make it visible even if lockdep is disabled. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Peter Rosin <peda@axentia.se> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-24i2c: export i2c_adapter_depth()Bartosz Golaszewski
For crazy setups in which an i2c gpio expander is behind an i2c gpio multiplexer controlled by a gpio provided a second expander using the same device driver we need to explicitly tell lockdep how to handle nested locking. Export i2c_adapter_depth() as public API to be reused outside of i2c core code. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Peter Rosin <peda@axentia.se> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-24thread_info: Use unsigned long for flagsMark Rutland
The generic THREAD_INFO_IN_TASK definition of thread_info::flags is a u32, matching x86 prior to the introduction of THREAD_INFO_IN_TASK. However, common helpers like test_ti_thread_flag() implicitly assume that thread_info::flags has at least the size and alignment of unsigned long, and relying on padding and alignment provided by other elements of task_struct is somewhat fragile. Additionally, some architectures use more that 32 bits for thread_info::flags, and others may need to in future. With THREAD_INFO_IN_TASK, task struct follows thread_info with a long field, and thus we no longer save any space as we did back in commit: affa219b60a11b32 ("x86: change thread_info's flag field back to 32 bits") Given all this, it makes more sense for the generic thread_info::flags to be an unsigned long. In fact given <linux/thread_info.h> contains/uses the helpers mentioned above, BE arches *must* use unsigned long (or something of the same size) today, or they wouldn't work. Make it so. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1474651447-30447-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-24watchdog: add pretimeout support to the coreWolfram Sang
Since the watchdog framework centrializes the IOCTL interfaces of device drivers now, SETPRETIMEOUT and GETPRETIMEOUT need to be added in the common code. Signed-off-by: Robin Gong <b38343@freescale.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> [vzapolskiy: added conditional pretimeout sysfs attribute visibility] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2016-09-24ACPI / watchdog: Add support for WDAT hardware watchdogMika Westerberg
Starting from Intel Skylake the iTCO watchdog timer registers were moved to reside in the same register space with SMBus host controller. Not all needed registers are available though and we need to unhide P2SB (Primary to Sideband) device briefly to be able to read status of required NO_REBOOT bit. The i2c-i801.c SMBus driver used to handle this and creation of the iTCO watchdog platform device. Windows, on the other hand, does not use the iTCO watchdog hardware directly even if it is available. Instead it relies on ACPI Watchdog Action Table (WDAT) table to describe the watchdog hardware to the OS. This table contains necessary information about the the hardware and also set of actions which are executed by a driver as needed. This patch implements a new watchdog driver that takes advantage of the ACPI WDAT table. We split the functionality into two parts: first part enumerates the WDAT table and if found, populates resources and creates platform device for the actual driver. The second part is the driver itself. The reason for the split is that this way we can make the driver itself to be a module and loaded automatically if the WDAT table is found. Otherwise the module is not loaded. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-23clk: change the type of clk_hw_onecell_data.num to unsigned intMasahiro Yamada
The "num" is the number of clk_hw entries in the structure, so "unsigned int" would be a better fit. (size_t looks like data size we count by byte.) Besides, struct clk_onecell_data already uses unsigned int for "clk_num". Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-09-23IB/core: remove ib_get_dma_mrChristoph Hellwig
We now only use it from ib_alloc_pd to create a local DMA lkey if the device doesn't provide one, or a global rkey if the ULP requests it. This patch removes ib_get_dma_mr and open codes the functionality in ib_alloc_pd so that we can simplify the code and prevent abuse of the functionality. As a side effect we can also simplify things by removing the valid access bit check, and the PD refcounting. In the future I hope to also remove the per-PD global MR entirely by shifting this work into the HW drivers, as one step towards avoiding the struct ib_mr overload for various different use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: rename pd->local_mr to pd->__internal_mrChristoph Hellwig
This has two reasons: a) to clearly mark that drivers don't have any business using it, and b) because we're going to use it for the (dangerous) global rkey soon, so that drivers don't create on themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23blkcg: Annotate blkg_hint correctlyBart Van Assche
Avoid that sparse complains about blkg_hint manipulations. Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23rxrpc: Add a tracepoint to log which packets will be retransmittedDavid Howells
Add a tracepoint to log in rxrpc_resend() which packets will be retransmitted. Note that if a positive ACK comes in whilst we have dropped the lock to retransmit another packet, the actual retransmission may not happen, though some of the effects will (such as altering the congestion management). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add tracepoint for ACK proposalDavid Howells
Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint to log injected Rx packet lossDavid Howells
Add a tracepoint to log received packets that get discarded due to Rx packet loss. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepointDavid Howells
Add a tracepoint to log transmission of DATA packets (including loss injection). Adjust the ACK transmission tracepoint to include the packet serial number and to line this up with the DATA transmission display. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23rxrpc: Add a tracepoint for the call timerDavid Howells
Add a tracepoint to log call timer initiation, setting and expiry. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23svcrdma: support Remote InvalidationChuck Lever
Support Remote Invalidation. A private message is exchanged with the client upon RDMA transport connect that indicates whether Send With Invalidation may be used by the server to send RPC replies. The invalidate_rkey is arbitrarily chosen from among rkeys present in the RPC-over-RDMA header's chunk lists. Send With Invalidate improves performance only when clients can recognize, while processing an RPC reply, that an rkey has already been invalidated. That has been submitted as a separate change. In the future, the RPC-over-RDMA protocol might support Remote Invalidation properly. The protocol needs to enable signaling between peers to indicate when Remote Invalidation can be used for each individual RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23rpcrdma: RDMA/CM private message data structureChuck Lever
Introduce data structure used by both client and server to exchange implementation details during RDMA/CM connection establishment. This is an experimental out-of-band exchange between Linux RPC-over-RDMA Version One implementations, replacing the deprecated CCP (see RFC 5666bis). The purpose of this extension is to enable prototyping of features that might be introduced in a subsequent version of RPC-over-RDMA. Suggested by Christoph Hellwig and Devesh Sharma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23svcrdma: Tail iovec leaves an orphaned DMA mappingChuck Lever
The ctxt's count field is overloaded to mean the number of pages in the ctxt->page array and the number of SGEs in the ctxt->sge array. Typically these two numbers are the same. However, when an inline RPC reply is constructed from an xdr_buf with a tail iovec, the head and tail often occupy the same page, but each are DMA mapped independently. In that case, ->count equals the number of pages, but it does not equal the number of SGEs. There's one more SGE, for the tail iovec. Hence there is one more DMA mapping than there are pages in the ctxt->page array. This isn't a real problem until the server's iommu is enabled. Then each RPC reply that has content in that iovec orphans a DMA mapping that consists of real resources. krb5i and krb5p always populate that tail iovec. After a couple million sent krb5i/p RPC replies, the NFS server starts behaving erratically. Reboot is needed to clear the problem. Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-23Merge tag 'v4.8-rc6' into develLinus Walleij
Linux 4.8-rc6
2016-09-23Merge branch 'gpio-irq-validmask' of /home/linus/linux-pinctrl into develLinus Walleij
2016-09-23gpiolib: Make it possible to exclude GPIOs from IRQ domainMika Westerberg
When using GPIO irqchip helpers to setup irqchip for a gpiolib based driver, it is not possible to select which GPIOs to add to the IRQ domain. Instead it just adds all GPIOs which is not always desired. For example there might be GPIOs that for some reason cannot generated normal interrupts at all. To support this we add a flag irq_need_valid_mask to struct gpio_chip. When this flag is set the core allocates irq_valid_mask that holds one bit for each GPIO the chip has. By default all bits are set but drivers can manipulate this using set_bit() and clear_bit() accordingly. Then when gpiochip_irqchip_add() is called, this mask is checked and all GPIOs with bit is set are added to the IRQ domain created for the GPIO chip. Suggested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-09-23bpf: add helper to invalidate hashDaniel Borkmann
Add a small helper that complements 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") for invalidating the current skb->hash after mangling on headers via direct packet write. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net: dsa: add port fast ageingVivien Didelot
Today the DSA drivers are in charge of flushing the MAC addresses associated to a port when its STP state changes from Learning or Forwarding, to Disabled or Blocking or Listening. This makes the drivers more complex and hides the generic switch logic. Introduce a new optional port_fast_age operation to dsa_switch_ops, to move this logic to the DSA layer and keep drivers simple. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net_sched: act_vlan: add helper inlines to access tcf_vlan infoOr Gerlitz
Needed e.g for offloading drivers to pick the relevant attributes. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23net_sched: sch_fq: account for schedule/timers driftsEric Dumazet
It looks like the following patch can make FQ very precise, even in VM or stressed hosts. It matters at high pacing rates. We take into account the difference between the time that was programmed when last packet was sent, and current time (a drift of tens of usecs is often observed) Add an EWMA of the unthrottle latency to help diagnostics. This latency is the difference between current time and oldest packet in delayed RB-tree. This accounts for the high resolution timer latency, but can be different under stress, as fq_check_throttled() can be opportunistically be called from a dequeue() called after an enqueue() for a different flow. Tested: // Start a 10Gbit flow $ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr & Before patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17106.04 756876.84 1102.75 1119049.02 0.00 0.00 0.52 After patch : $ sar -n DEV 10 5 | grep eth0 | grep Average Average: eth0 17867.00 800245.90 1151.77 1183172.12 0.00 0.00 0.52 A new iproute2 tc can output the 'unthrottle latency' : $ tc -s qd sh dev eth0 | grep latency 0 gc, 0 highprio, 32490767 throttled, 2382 ns latency Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23Merge tag 'linux-can-fixes-for-4.8-20160922' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-09-22 this is a pull request of one patch for the upcoming linux-4.8 release. The patch by Sergei Miroshnichenko fixes a potential deadlock in the generic CAN device code that cann occour after a bus-off. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23sctp: improve how SSN, TSN and ASCONF serial are comparedMarcelo Ricardo Leitner
Make it similar to time_before() macros: - easier to understand - make use of typecheck() to avoid working on unexpected variable types (made the issue on previous patch visible) - for _[lg]te versions, slighly faster, as the compiler used to generate a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle (for _lte): Before, for sctp_outq_sack: if (primary->cacc.changeover_active) { 1f01: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1f08: 74 6e je 1f78 <sctp_outq_sack+0xe8> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1f0a: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax return ((s) - (t)) & TSN_SIGN_BIT; } static inline int TSN_lte(__u32 s, __u32 t) { return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT); 1f10: 8b 7d bc mov -0x44(%rbp),%edi 1f13: 39 c7 cmp %eax,%edi 1f15: 74 25 je 1f3c <sctp_outq_sack+0xac> 1f17: 39 f8 cmp %edi,%eax 1f19: 78 21 js 1f3c <sctp_outq_sack+0xac> primary->cacc.changeover_active = 0; After: if (primary->cacc.changeover_active) { 1ee7: 80 b9 84 02 00 00 00 cmpb $0x0,0x284(%rcx) 1eee: 74 73 je 1f63 <sctp_outq_sack+0xf3> u8 clear_cycling = 0; if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) { 1ef0: 8b 81 80 02 00 00 mov 0x280(%rcx),%eax 1ef6: 2b 45 b4 sub -0x4c(%rbp),%eax 1ef9: 85 c0 test %eax,%eax 1efb: 7e 26 jle 1f23 <sctp_outq_sack+0xb3> primary->cacc.changeover_active = 0; *_lt() generated pretty much the same code. Tested with gcc (GCC) 6.1.1 20160621. This patch also removes SSN_lte as it is not used and cleanups some comments. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-09-23mtd: nand: Provide nand_cleanup() function to free NAND related resourcesRichard Weinberger
Provide a nand_cleanup() function to free all nand related resources without unregistering the mtd device. This should allow drivers to call mtd_device_unregister() and handle its return value and still being able to cleanup all nand related resources. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Daniel Walter <dwalter@sigma-star.at> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Add an option to maximize the ECC strengthBoris Brezillon
The generic NAND DT bindings allows one to tweak the ECC strength and step size to their need. It can be used to lower the ECC strength to match a bootloader/firmware config, but might also be used to get a better reliability. In the latter case, the user might want to use the maximum ECC strength without having to explicitly calculate the exact value (this value not only depends on the OOB size, but also on the NAND controller, and can be tricky to extract). Add a generic 'nand-ecc-maximize' DT property and the associated NAND_ECC_MAXIMIZE flag, to let ECC controller drivers select the best ECC strength and step-size on their own. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Rob Herring <robh@kernel.org>
2016-09-23mtd: nand: automate NAND timings selectionBoris Brezillon
The NAND framework provides several helpers to query timing modes supported by a NAND chip, but this implies that all NAND controller drivers have to implement the same timings selection dance. Also currently NAND devices can be resetted at arbitrary places which also resets the timing for ONFI chips to timing mode 0. Provide a common logic to select the best timings based on ONFI or ->onfi_timing_mode_default information. Hook this into nand_reset() to make sure the new timing is applied each time during a reset. NAND controller willing to support timings adjustment should just implement the ->setup_data_interface() method. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2016-09-23mtd: nand: Expose data interface for ONFI mode 0Sascha Hauer
The nand layer will need ONFI mode 0 to use it as timing mode before and right after reset. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Add function to convert ONFI mode to data_interfaceSascha Hauer
onfi_init_data_interface() initializes a data interface with values from a given ONFI mode. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Introduce nand_data_interfaceSascha Hauer
Currently we have no data structure to fully describe a NAND timing. We only have struct nand_sdr_timings for NAND timings in SDR mode, but nothing for DDR mode and also no container to store both types of timing. This patch adds struct nand_data_interface which stores the timing type and a union of different timings. This can be used to pass to drivers in order to configure the timing. Add kerneldoc for struct nand_sdr_timings while touching it anyway. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: Create a NAND reset functionSascha Hauer
When NAND devices are resetted some initialization may have to be done, like for example they have to be configured for the timing mode that shall be used. To get a common place where this initialization can be implemented create a nand_reset() function. This currently only issues a NAND_CMD_RESET to the NAND device. The places issuing this command manually are replaced with a call to nand_reset(). Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: remove unnecessary 'extern' from function declarationsSascha Hauer
'extern' is not necessary for function declarations. To prevent people from adding the keyword to new declarations remove the existing ones. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23mtd: nand: import nand_hw_control_init()Marc Gonzalez
The code to initialize a struct nand_hw_control is duplicated across several drivers. Factorize it using an inline function. Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-09-23netfilter: nf_queue: improve queue range support for bridge familyLiping Zhang
After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace"), we can queue packets to the user space in bridge family. But when the user specify the queue range, packets will be only delivered to the first queue num. Because in nfqueue_hash, we only support ipv4 and ipv6 family. Now add support for bridge family too. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nft_queue: add _SREG_QNUM attr to select the queue numberLiping Zhang
Currently, the user can specify the queue numbers by _QUEUE_NUM and _QUEUE_TOTAL attributes, this is enough in most situations. But acctually, it is not very flexible, for example: tcp dport 80 mapped to queue0 tcp dport 81 mapped to queue1 tcp dport 82 mapped to queue2 In order to do this thing, we must add 3 nft rules, and more mapping meant more rules ... So take one register to select the queue number, then we can add one simple rule to mapping queues, maybe like this: queue num tcp dport map { 80:0, 81:1, 82:2 ... } Florian Westphal also proposed wider usage scenarios: queue num jhash ip saddr . ip daddr mod ... queue num meta cpu ... queue num meta mark ... The last point is how to load a queue number from sreg, although we can use *(u16*)&regs->data[reg] to load the queue number, just like nat expr to load its l4port do. But we will cooperate with hash expr, meta cpu, meta mark expr and so on. They all store the result to u32 type, so cast it to u16 pointer and dereference it will generate wrong result in the big endian system. So just keep it simple, we treat queue number as u32 type, although u16 type is already enough. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23netfilter: nf_tables: validate maximum value of u32 netlink attributesLaura Garcia Liebana
Fetch value and validate u32 netlink attribute. This validation is usually required when the u32 netlink attributes are being stored in a field whose size is smaller. This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check on u8 nft_exthdr attributes"). Fixes: 96518518cc41 ("netfilter: add nftables") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-22drm: Fix plane type uabi breakageDaniel Vetter
Turns out assuming that only stuff in uabi is uabi is a bit naive, and we have a bunch of properties for which the enum values are placed in random headers. A proper fix would be to split out uapi include headers, but meanwhile sprinkle at least some warning over them. Fixes: 532b36712ddf ("drm/doc: Polish for drm_plane.[hc]") Cc: Archit Taneja <architt@codeaurora.org> Cc: Sean Paul <seanpaul@chromium.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/1474612525-9488-1-git-send-email-daniel.vetter@ffwll.ch
2016-09-23Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22Merge branch 'nsfs-ioctls' into HEADEric W. Biederman
From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com>
2016-09-22nsfs: add ioctl to get a parent namespaceAndrey Vagin
Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships. In a future we will use this interface to dump and restore nested namespaces. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22nsfs: add ioctl to get an owning user namespace for ns file descriptorAndrey Vagin
Each namespace has an owning user namespace and now there is not way to discover these relationships. Understending namespaces relationships allows to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? After a long discussion, Eric W. Biederman proposed to use ioctl-s for this purpose. The NS_GET_USERNS ioctl returns a file descriptor to an owning user namespace. It returns EPERM if a target namespace is outside of a current user namespace. v2: rename parent to relative v3: Add a missing mntput when returning -EAGAIN --EWB Acked-by: Serge Hallyn <serge@hallyn.com> Link: https://lkml.org/lkml/2016/7/6/158 Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22kernel: add a helper to get an owning user namespace for a namespaceAndrey Vagin
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-09-22PCI: pciehp: Allow exclusive userspace control of indicatorsKeith Busch
PCIe hotplug supports optional Attention and Power Indicators, which are used internally by pciehp. Users can't control the Power Indicator, but they can control the Attention Indicator by writing to a sysfs "attention" file. The Slot Control register has two bits for each indicator, and the PCIe spec defines the encodings for each as (Reserved/On/Blinking/Off). For sysfs "attention" writes, pciehp_set_attention_status() maps into these encodings, so the only useful write values are 0 (Off), 1 (On), and 2 (Blinking). However, some platforms use all four bits for platform-specific indicators, and they need to allow direct user control of them while preventing pciehp from using them at all. Add a "hotplug_user_indicators" flag to the pci_dev structure. When set, pciehp does not use either the Attention Indicator or the Power Indicator, and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values are written directly to the Attention Indicator Control and Power Indicator Control fields. [bhelgaas: changelog, rename flag and accessors to s/attention/indicator/] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-22blk-mq: add flag for drivers wanting blocking ->queue_rq()Jens Axboe
If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its ->queue_rq() handler. For that case, blk-mq ensures that we always calls it from a safe context. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-by: Josef Bacik <jbacik@fb.com>