linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-02-11	Merge branch 'net-mitigate-kmem_free-slowpath'	David S. Miller
	Jesper Dangaard Brouer says: ==================== net: mitigating kmem_cache free slowpath This patchset is the first real use-case for kmem_cache bulk _free_. The use of bulk _alloc_ is NOT included in this patchset. The full use have previously been posted here [1]. The bulk free side have the largest benefit for the network stack use-case, because network stack is hitting the kmem_cache/SLUB slowpath when freeing SKBs, due to the amount of outstanding SKBs. This is solved by using the new API kmem_cache_free_bulk(). Introduce new API napi_consume_skb(), that hides/handles bulk freeing for the caller. The drivers simply need to use this call when freeing SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() / dev_consume_skb_any(). Driver ixgbe is the first user of this new API. [1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	ixgbe: bulk free SKBs during TX completion cleanup cycle	Jesper Dangaard Brouer
	There is an opportunity to bulk free SKBs during reclaiming of resources after DMA transmit completes in ixgbe_clean_tx_irq. Thus, bulk freeing at this point does not introduce any added latency. Simply use napi_consume_skb() which were recently introduced. The napi_budget parameter is needed by napi_consume_skb() to detect if it is called from netpoll. Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost) Single CPU/flow numbers: before: 1982144 pps -> after : 2064446 pps Improvement: +82302 pps, -20 nanosec, +4.1% (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4)) Joint work with Alexander Duyck. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: bulk free SKBs that were delay free'ed due to IRQ context	Jesper Dangaard Brouer
	The network stack defers SKBs free, in-case free happens in IRQ or when IRQs are disabled. This happens in __dev_kfree_skb_irq() that writes SKBs that were free'ed during IRQ to the softirq completion queue (softnet_data.completion_queue). These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ in function net_tx_action(). Take advantage of this a use the skb defer and flush API, as we are already in softirq context. For modern drivers this rarely happens. Although most drivers do call dev_kfree_skb_any(), which detects the situation and calls __dev_kfree_skb_irq() when needed. This due to netpoll can call from IRQ context. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: bulk free infrastructure for NAPI context, use napi_consume_skb	Jesper Dangaard Brouer
	Discovered that network stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk can speedup this slowpath. NAPI context is a bit special, lets take advantage of that for bulk free'ing SKBs. In NAPI context we are running in softirq, which gives us certain protection. A softirq can run on several CPUs at once. BUT the important part is a softirq will never preempt another softirq running on the same CPU. This gives us the opportunity to access per-cpu variables in softirq context. Extend napi_alloc_cache (before only contained page_frag_cache) to be a struct with a small array based stack for holding SKBs. Introduce a SKB defer and flush API for accessing this. Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any() when running in NAPI context. A small trick to handle/detect if we are called from netpoll is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	Merge branch 'virtio_net-ethtool-validation'	David S. Miller
	Nikolay Aleksandrov says: ==================== virtio_net: better ethtool setting validation This small set is a follow-up for the recent patches that added ethtool get/set settings. Patch 1 changes the speed validation routine to check if the speed is between 0 and INT_MAX (or SPEED_UNKNOWN) and patch 2 adds port validation to virtio_net and better validation comment. This set is on top of Michael's patch which explains that speeds from 0 to INT_MAX are valid: http://patchwork.ozlabs.org/patch/578911/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	virtio_net: validate ethtool port setting and explain the user validation	Nikolay Aleksandrov
	We should validate the port setting that we got from the user and check if it's what we've set it to (PORT_OTHER), also add explanation that ignoring advertising is good as long as we don't have autonegotiation. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	ethtool: make validate_speed accept all speeds between 0 and INT_MAX	Nikolay Aleksandrov
	Devices these days can have any speed and as was recently pointed out any speed from 0 to INT_MAX is valid so adjust speed validation to accept such values. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	Merge branch 'dp83848-TLK10x'	David S. Miller
	Andrew F. Davis says: ==================== net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs This series is [0] split into its logical components. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: dp83848: Add comments for register definitions	Andrew F. Davis
	Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs	Andrew F. Davis
	The TI TLK10x Ethernet PHYs are similar in the interrupt relevant registers and so are compatible with the DP83848x devices already supported. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: dp83848: Reorganize code for readability and safety	Andrew F. Davis
	Reorganize code by moving the desired interrupt mask definition out of function. Also rearrange the enable/disable interrupt function to prevent accidental over-writing of values in registers. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: dp83848: Add PHY ID for TI version of DP83848C	Andrew F. Davis
	After acquiring National Semiconductor, TI appears to have changed the Vendor Model Number for the DP83848C PHYs, add this new ID to supported IDs. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: dp83848: Add macro for dp83848 compatible devices	Andrew F. Davis
	Add a helper macro for defining dp83848 compatible phy devices. Update copyright info. Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	be2net: don't report EVB for older chipsets when SR-IOV is disabled	Ivan Vecera
	The EVB (virtual bridge) functionality should be disabled on older BE3 and Lancer chips if SR-IOV is disabled in the NIC's BIOS. This setting is identified by the zero value of total VFs reported by the card. The GET_HSW_CONFIG command cannot be used as it is not supported by these older chipset's FW. v2: added the comment Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	Merge branch 'spi_ks8995'	David S. Miller
	Helmut Buchsbaum says: ==================== Add support for MICREL KSZ8795CLX 5-port switch This patch series refactors the spi-ks8995 driver to finally add support for the MICREL KSZ8795CLX. Additionally support for controlling a GPIO line for resetting the switch is added. Helmut Changes since v2: - use GPIO_ACTIVE_LOW according to Andrew's remark. - use ePAPR compliant node name in example, thanks to Sergei for pointing out Changes since v1: - removed initializing registers from Device Tree following Florian's advice - fixed GPIO handling for reset according to Andrew's remark. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	dt-bindings: net: ks8995: add bindings documentation for ks8995	Helmut Buchsbaum
	Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: spi_ks8995: add support for MICREL KSZ8795CLX	Helmut Buchsbaum
	Add support for MICREL KSZ8795CLX Integrated 5-Port, 10-/100-Managed Ethernet Switch with Gigabit GMII/RGMII and MII/RMII interfaces. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: spi_ks8995: generalize creation of SPI commands	Helmut Buchsbaum
	Prepare creating SPI reads and writes for other switch families. The KS8995 family uses the straight forward <8bit CMD><8bit ADDR> sequence. To be able to support KSZ8795 family, which uses <3bit CMD><12bit ADDR><1 bit TR> make the SPI command creation chip variant dependent. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: spi_ks8995: add support for resetting switch using GPIO	Helmut Buchsbaum
	When using device tree it is no more possible to reset the PHY at board level. Furthermore, doing in the driver allows to power down the switch when it is not used any more. The patch introduces a new optional property "reset-gpios" denoting an appropriate GPIO handle, e.g.: reset-gpios = <&gpio0 46 GPIO_ACTIVE_LOW> Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: spi_ks8995: verify chip and determine revision	Helmut Buchsbaum
	Since the chip variant is now determined by spi_device_id, verify family and chip id and determine the revision id. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: phy: spi_ks8995: introduce spi_device_id table	Helmut Buchsbaum
	Refactor to use spi_device_id table to facilitate easy extendability. Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	Merge branch 'thunderx-irq-hints'	David S. Miller
	Sunil Goutham says: ==================== net: thunderx: Setting IRQ affinity hints and other optimizations This patch series contains changes - To add support for virtual function's irq affinity hint - Replace napi_schedule() with napi_schedule_irqoff() - Reduce page allocation overhead by allocating pages of higher order when pagesize is 4KB. - Add couple of stats which helps in debugging - Some miscellaneous changes to BGX driver. Changes from v1: - As suggested changed MAC address invalid log message to dev_err() instead of dev_warn(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: thunderx: Alloc higher order pages when pagesize is small	Sunil Goutham
	Allocate higher order pages when pagesize is small, this will reduce number of calls to page allocator and wastage of memory. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: thunderx: bgx: Add log message when setting mac address	Robert Richter
	Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: thunderx: bgx: Use standard firmware node infrastructure.	David Daney
	In the case of OF device tree, the firmware information is attached to the BGX device structure in the standard manner, so use the firmware iterators and accessors where possible. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: thunderx: Assign affinity hints to vf's interrupts	Sunil Goutham
	This affinity hint can be used by user space irqbalance tool to set preferred CPU mask for irqs registered by this VF. Irqbalance needs to be in 'exact' mode to set irq affinity same as indicated by affinity hint. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: thunderx: Use napi_schedule_irqoff()	Sunil Goutham
	napi_schedule is being called from hard irq context, hence switch to napi_schedule_irqoff which avoids unneeded call to local_irq_save and local_irq_restore. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net, thunderx: Add TX timeout and RX buffer alloc failure stats.	Thanneeru Srinivasulu
	When system is low on atomic memory, too many error messages are logged. Since this is not a total failure but a simple switch to non-atomic allocation better to have a stat. Also add a stat for reset, kicked due to transmit watchdog timeout. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	IB/core: Fix reading capability mask of the port info class	Eran Ben Elisha
	When checking specific attribute from a bit mask, need to use bitwise AND and not logical AND, fixed that. Fixes: 145d9c541032 ('IB/core: Display extended counter set if available') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-11	net/mlx4: fix some error handling in mlx4_multi_func_init()	Rasmus Villemoes
	The while loop after err_slaves should use post-decrement; otherwise we'll fail to do the kfrees for i==0, and will run into out-of-bounds accesses if the setup above failed already at i==0. [I'm not sure why one even bothers populating the ->vlan_filter array: mlx4.h isn't #included by anything outside drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the vlan_filter elements aren't used at all.] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-11	null_blk: oops when initializing without lightnvm	Matias Bjørling
	If the LightNVM subsystem is not compiled into the kernel, and the null_blk device driver requests lightnvm to be initialized. The call to nvm_register fails and the null_add_dev function cleans up the initialization. However, at this point the null block device has already been added to the nullb_list and thus a second cleanup will occur when the function has returned, that leads to a double call to blk_cleanup_queue. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11	Revert "mmc: block: don't use parameter prefix if built as module"	Ulf Hansson
	This reverts commit 829b6962f7e3cfc06f7c5c26269fd47ad48cf503. Revert this change as it causes a sysfs path to change and therefore introduces and ABI regression. More precisely Android's vold is not being able to access /sys/module/mmcblk/parameters/perdev_minors any more, since the path becomes changed to: "/sys/module/mmc_block/..." Fixes: 829b6962f7e3 ("mmc: block: don't use parameter prefix if built as module") Reported-by: John Stultz <john.stultz@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	thermal: cpu_cooling: fix out of bounds access in time_in_idle	Javi Merino
	In __cpufreq_cooling_register() we allocate the arrays for time_in_idle and time_in_idle_timestamp to be as big as the number of cpus in this cpufreq device. However, in get_load() we access this array using the cpu number as index, which can result in an out of bound access. Index time_in_idle{,_timestamp} using the index in the cpufreq_device's allowed_cpus mask, as we do for the load_cpu array in cpufreq_get_requested_power() Reported-by: Nicolas Boichat <drinkcat@chromium.org> Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Tested-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-02-11	btrfs: properly set the termination value of ctx->pos in readdir	David Sterba
	The value of ctx->pos in the last readdir call is supposed to be set to INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a larger value, then it's LLONG_MAX. There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++" overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a 64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before the increment. We can get to that situation like that: * emit all regular readdir entries * still in the same call to readdir, bump the last pos to INT_MAX * next call to readdir will not emit any entries, but will reach the bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX Normally this is not a problem, but if we call readdir again, we'll find 'pos' set to LLONG_MAX and the unconditional increment will overflow. The report from Victor at (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging print shows that pattern: Overflow: e Overflow: 7fffffff Overflow: 7fffffffffffffff PAX: size overflow detected in function btrfs_real_readdir fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0; context: dir_context; CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015 ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48 ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78 ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8 Call Trace: [<ffffffff81742f0f>] dump_stack+0x4c/0x7f [<ffffffff811cb706>] report_size_overflow+0x36/0x40 [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0 [<ffffffff811dafc8>] iterate_dir+0xa8/0x150 [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70 [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0 Overflow: 1a [<ffffffff811db070>] ? iterate_dir+0x150/0x150 [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83 The jump from 7fffffff to 7fffffffffffffff happens when new dir entries are not yet synced and are processed from the delayed list. Then the code could go to the bump section again even though it might not emit any new dir entries from the delayed list. The fix avoids entering the "bump" section again once we've finished emitting the entries, both for synced and delayed entries. References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor <services@swwu.com> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-02-11	Merge branch 'igmp-ns'	David S. Miller
	Nikolay Borisov says: ==================== Make igmp sysctl knobs namespace aware This series continue making more of the net related sysctls namespace aware. The first 2 and last patches are straight forward and convert sysctls which weren't defined to be namespace aware. The only thing in them is that each removes a define which is used in only one place (to initialise the respective sysctl) so I don't think this is a huge loss. The third patch however, converts igmp_llm_reports which was already defined in the ipv4_net_table but wasn't using any of the net namespace infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	igmp: Namespacify igmp_qrv sysctl knob	Nikolay Borisov
	Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	igmp: Namespaceify igmp_llm_reports sysctl knob	Nikolay Borisov
	This was initially introduced in df2cf4a78e488d26 ("IGMP: Inhibit reports for local multicast groups") by defining the sysctl in the ipv4_net_table array, however it was never implemented to be namespace aware. Fix this by changing the code accordingly. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	igmp: Namespaceify igmp_max_msf sysctl knob	Nikolay Borisov
	Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	igmp: Namespaceify igmp_max_memberships sysctl knob	Nikolay Borisov
	Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	bonding: use return instead of goto	Zhang Shengju
	Replace 'goto' with 'return' to remove unnecessary check at label: err_undo_flags. The reason is that 'err_undo_flags' do two things for the first slave device: 1.revert bond mac address if it is set by the slave device. 2.revert bond device type if it's not ARPHRD_ETHER. It's not necessary for the following three places, they changed neither bond mac address nor type. It's straightforward to return directly. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: macb: add wake-on-lan support via magic packet	Sergio Prado
	Tested on Acqua A5 SoM (http://www.acmesystems.it/acqua). Signed-off-by: Sergio Prado <sergio.prado@e-labworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	net: hamradio: baycom_ser_fdx: Replace timeval with timespec64	Amitoj Kaur Chawla
	32 bit systems using 'struct timeval' will break in the year 2038, so we replace the code appropriately. However, this driver is not broken in 2038 since we are only using microseconds portion of the time. This patch replaces 'struct timeval' with 'struct timespec64'. We only need to find elapsed microseconds rather than absolute time, so it's better to use monotonic time, so using ktime_get_ts64() makes the code more efficient and more robust against concurrent settimeofday() calls. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	openvswitch: allow management from inside user namespaces	Tycho Andersen
	Operations with the GENL_ADMIN_PERM flag fail permissions checks because this flag means we call netlink_capable, which uses the init user ns. Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations which should be allowed inside a user namespace. The motivation for this is to be able to run openvswitch in unprivileged containers. I've tested this and it seems to work, but I really have no idea about the security consequences of this patch, so thoughts would be much appreciated. v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one massive one Reported-by: James Page <james.page@canonical.com> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> CC: Eric Biederman <ebiederm@xmission.com> CC: Pravin Shelar <pshelar@ovn.org> CC: Justin Pettit <jpettit@nicira.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	ethtool: future-proof interface for speed extensions	Michael S. Tsirkin
	Many virtual and not quite virtual devices allow any speed to be set through ethtool. In particular, this applies to the virtio-net devices. Document this fact to make sure people don't assume the enum lists all possible values. Reserve values greater than INT_MAX for future extension and to avoid conflict with SPEED_UNKNOWN. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	vrf: duplicate include of rtnetlink.h	stephen hemminger
	Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	vxlan: udp_tunnel duplicate include net/udp_tunnel.h	stephen hemminger
	Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	rds: duplicate include net/tcp.h	stephen hemminger
	Duplicate include detected. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	bonding: Return correct error code	Amitoj Kaur Chawla
	The return value of kzalloc on failure of allocation of memory should be -ENOMEM and not -1. Found using Coccinelle. A simplified version of the semantic patch used is: //<smpl> @@ expression *e; @@ e = kzalloc(...); if (e == NULL) { ... return - -1 + -ENOMEM ; } //</smpl> The single call site only checks that the return value is not 0, hence no change is required at the call site. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11	ARM: 8519/1: ICST: try other dividends than 1	Linus Walleij
	Since the dawn of time the ICST code has only supported divide by one or hang in an eternal loop. Luckily we were always dividing by one because the reference frequency for the systems using the ICSTs is 24MHz and the [min,max] values for the PLL input if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop will always terminate immediately without assigning any divisor for the reference frequency. But for the code to make sense, let's insert the missing i++ Reported-by: David Binderman <dcb314@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11	Merge branch 'gso-checksums'	David S. Miller
	Alexander Duyck says: ==================== Add GSO support for outer checksum w/ inner checksum offloads This patch series updates the existing segmentation offload code for tunnels to make better use of existing and updated GSO checksum computation. This is done primarily through two mechanisms. First we maintain a separate checksum in the GSO context block of the sk_buff. This allows us to maintain two checksum values, one offloaded with values stored in csum_start and csum_offset, and one computed and tracked in SKB_GSO_CB(skb)->csum. By maintaining these two values we are able to take advantage of the same sort of math used in local checksum offload so that we can provide both inner and outer checksums with minimal overhead. Below is the performance for a netperf session between an ixgbe PF and VF on the same host but in different namespaces. As can be seen a significant gain in performance can be had from allowing the use of Tx checksum offload on the inner headers while performing a software offload on the outer header computation: Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB Before: 87380 16384 16384 10.00 12844.38 9.30 -1.00 0.712 -1.00 After: 87380 16384 16384 10.00 13216.63 6.78 -1.00 0.504 -1.000 Changes from v1: * Dropped use of CHECKSUM_UNNECESSARY for remote checksum offload * Left encap_hdr_csum as it will likely be needed in future for SCTP GSO * Broke the changes out over many more patches * Updated GRE segmentation to more closely match UDP tunnel segmentation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>