summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-03drm/nouveau/pmu/gk107: enable PGOB codepathsBen Skeggs
Reported to be needed as per fdo#70354 comment #61. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pmu/gk104: check fuse to determine presence of PGOBBen Skeggs
Not 100% confirmed, but seems to match from the few boards I've looked at so far. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pci: prepare for chipset-specific initialisation tasksBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pci/nv46: attempt to fix msi, and re-enable by defaultBen Skeggs
Was not able to obtain a trace of NVRM due to kernel version annoyances, however, experimentally confirmed that the WAR we use on NV50/G8x boards works here too. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pci/g94: split implementation from nv40Ben Skeggs
An upcoming patch will implement functionality that we don't use on any NV40 chipset. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pci/g84: split implementation from nv50Ben Skeggs
An upcoming patch will implement functionality that we don't use on the original NV50. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/ibus/gf100: increase wait timeout to avoid read faultsSamuel Pitoiset
Increase clock timeout of some unknown engines in order to avoid failure at high gpcclk rate. This fixes IBUS read faults on my GF119 when reclocking is manually enabled. Note that memory reclocking is completely broken and NvMemExec has to be disabled to allow core clock reclocking only. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/gm204/6: add voltage control using the new gk104 volt classMartin Peres
I got confirmation that we can read and change the voltage with the same code. The divider is also computed correctly on the gm204 we got our hands on. Thanks to Yoshimo on IRC for executing the tests on his gm204! Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/gm107: add voltage control using the new gk104 volt classMartin Peres
Let's ignore the other desktop Maxwells until I get my hands on one and confirm that we still can change the voltage. Signed-off-by: Martin Peres <martin.peres@free.fr>
2015-11-03drm/nouveau/volt/gk104: add support for pwm and gpio modesMartin Peres
Most Keplers actually use the GPIO-based voltage management instead of the new PWM-based one. Use the GPIO mode as a fallback as it already gracefully handles the case where no GPIOs exist. All the Maxwells seem to use the PWM method though. v2: - Do not forget to commit the PWM configuration change! Signed-off-by: Martin Peres <martin.peres@free.fr>
2015-11-03drm/nouveau/volt: add support for non-vid-based voltage controllersMartin Peres
This patch is not ideal but it definitely beats a rewrite of the current interface and is very self-contained. Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/bios/volt: add support for pwm-based volt managementMartin Peres
Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/ttm: set the DMA mask for platform devicesAlexandre Courbot
So far the DMA mask was not set for platform devices, which limited them to a 32-bit physical space. Allow dma_set_mask() to be called for non-PCI devices, and also take the IOMMU bit into account since it could restrict the physically addressable space. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/ttm: convert to DMA APIAlexandre Courbot
The pci_dma_* functions are now superseeded in the kernel by the DMA API. Make the conversion to this more generic API. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/instmem/gk20a: make use of the IOMMU bitAlexandre Courbot
Use the IOMMU bit specified in platform data instead of hardcoding it to the bit used by current Tegra GPUs. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/platform: allow to specify the IOMMU bitAlexandre Courbot
Current Tegra code taking advantage of the IOMMU assumes a hardcoded value for the IOMMU bit. Make it a platform property instead for flexibility. v2 (Ben Skeggs): remove nvkm dependence on drm structures Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/instmem/gk20a: use direct CPU accessAlexandre Courbot
The Great Nouveau Refactoring Take II brought us a lot of goodness, including acquire/release methods that are called before and after an instobj is modified. These functions can be used as synchronization points to manage CPU/GPU coherency if we modify an instobj using the CPU. This patch replaces the legacy and slow PRAMIN access for gk20a instmem with CPU mappings and writes. A LRU list is used to unmap unused mappings after a certain threshold (currently 1MB) of mapped instobjs is reached. This allows mappings to be reused most of the time. Accessing instobjs using the CPU requires to maintain the GPU L2 cache, which we do in the acquire/release functions. This triggers a lot of L2 flushes/invalidates, but most of them are performed on an empty cache (and thus return immediately), and overall context setup performance greatly benefits from this (from 250ms to 160ms on Jetson TK1 for a simple libdrm program). Making L2 management more explicit should allow us to grab some more performance in the future. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau: remove unnecessary usage of object handlesBen Skeggs
No longer required in a lot of cases, as objects are identified over NVIF via an alternate mechanism since the rework. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/ltc/gf100: add flush/invalidate functionsAlexandre Courbot
Allow clients to manually flush and invalidate L2. This will be useful for Tegra systems for which we want to write instmem using the CPU. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/ltc: add hooks for invalidate and flushAlexandre Courbot
These are useful for systems without a coherent CPU/GPU bus. For such systems we may need to maintain the L2 ourselves. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/timer: re-introduce nvkm_wait_xsec macrosAlexandre Courbot
Reintroduce macros allowing us to test a register against a certain mask, since this is the most common usage pattern for the more generic nvkm_xsec macros and makes the code more concise and readable. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/pmu: do not assume a PMU is presentAlexandre Courbot
Some devices may not have a PMU. Avoid a NULL pointer dereference in such cases by checking whether the pointer given to nvkm_pmu_pgob() is valid. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/gem: return only valid domain when there's only oneIlia Mirkin
On nv50+, we restrict the valid domains to just the one where the buffer was originally created. However after the buffer is evicted to system memory, we might move it back to a different domain that was not originally valid. When sharing the buffer and retrieving its GEM_INFO data, we still want the domain that will be valid for this buffer in a pushbuf, not the one where it currently happens to be. This resolves fdo#92504 and several others. These are due to suspend evicting all buffers, making it more likely that they temporarily end up in the wrong place. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-02sg: Fix double-free when drives detach during SG_IOCalvin Owens
In sg_common_write(), we free the block request and return -ENODEV if the device is detached in the middle of the SG_IO ioctl(). Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we end up freeing rq->cmd in the already free rq object, and then free the object itself out from under the current user. This ends up corrupting random memory via the list_head on the rq object. The most common crash trace I saw is this: ------------[ cut here ]------------ kernel BUG at block/blk-core.c:1420! Call Trace: [<ffffffff81281eab>] blk_put_request+0x5b/0x80 [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg] [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg] [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70 [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg] [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg] [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520 [<ffffffff81258967>] ? file_has_perm+0x97/0xb0 [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0 [<ffffffff81602afb>] tracesys+0xdd/0xe2 RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0 The solution is straightforward: just set srp->rq to NULL in the failure branch so that sg_finish_rem_req() doesn't attempt to re-free it. Additionally, since sg_rq_end_io() will never be called on the object when this happens, we need to free memory backing ->cmd if it isn't embedded in the object itself. KASAN was extremely helpful in finding the root cause of this bug. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02net: phy: fix a bug in get_phy_c45_idsShaohui Xie
When probing devices-in-package for a c45 phy, device zero is the last device to probe, however, if driver reads 0 from device zero, c45_ids->devices_in_package is set to '0', the loop condition of probing will be matched again, see codes below: for (i = 1;i < num_ids && c45_ids->devices_in_package == 0;i++) driver will run in a dead loop. This patch restructures the bug and confusing loop, it provides a helper function get_phy_c45_devs_in_pkg which to read devices-in-package registers of a MMD, and rewrites the loop with using the helper function. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net/core: generic support for disabling netdev features down stackJarod Wilson
There are some netdev features, which when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices. This is a rework of an earlier more heavy-handed appraoch, which simply disables and prevents re-enabling of netdev features listed in a new define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper device that disables a flag in that feature mask, the disabling will propagate down the stack, and any lower device that has any upper device with one of those flags disabled should not be able to enable said flag. Initially, only LRO is included for proof of concept, and because this code effectively does the same thing as dev_disable_lro(), though it will also activate from the ethtool path, which was one of the goals here. [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -K bond0 lro off [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: off [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: off dmesg dump: [ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2. [ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1. [ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 This has been successfully tested with bnx2x, qlcnic and netxen network cards as slaves in a bond interface. Turning LRO on or off on the master also turns it on or off on each of the slaves, new slaves are added with LRO in the same state as the master, and LRO can't be toggled on the slaves. Also, this should largely remove the need for dev_disable_lro(), and most, if not all, of its call sites can be replaced by simply making sure NETIF_F_LRO isn't included in the relevant device's feature flags. Note that this patch is driven by bug reports from users saying it was confusing that bonds and slaves had different settings for the same features, and while it won't be 100% in sync if a lower device doesn't support a feature like LRO, I think this is a good step in the right direction. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Nikolay Aleksandrov <razor@blackwall.org> CC: Michal Kubecek <mkubecek@suse.cz> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02pm80xx: avoid a panic if MSI(X) interrupts are disabledBenjamin Rood
If MSI(X) interrupts are disabled via the kernel command line (pci=nomsi), the pm8001 driver will kernel panic because it does not detect that MSI interrupts are disabled and will soldier on and attempt to configure MSI interrupts anyways. This leads to a kernel panic, most likely because a required data structure is not available down the line. Using the pci_msi_enabled() function in order to detect if MSI interrupts are enabled before configuring them resolves this issue and avoids a kernel panic when the module is loaded. Additionally, the irq_vector structure must be initialized when legacy interrupts are being used otherwise legacy interrupts will simply not function and result in another panic. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: wait a minimum of 500ms before issuing commands to SPCvBenjamin Rood
The documentation for the 8070 and 8072 SPCv chip explicitly states that a minimum of 500ms must elapse before issuing commands, otherwise the SPCv may not process them and the firmware may get into an unrecoverable state requiring a reboot. While the Linux guys will probably think this is 'racy', it is called out in the chip documentation and inserting this delay makes power management function properly. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: do not examine registers for iButton feature if ATTO adapterBenjamin Rood
ATTO adapters do not support this feature. If the firmware fails to be ready, it should not check the examined registers in order to examine the state of the feature in order to prevent undefined behavior. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: set PHY profiles for ATTO 12Gb SAS controllersBenjamin Rood
PHY profiles are not saved in NVRAM on ATTO 12Gb SAS controllers. Therefore, in order for the controller to function in a wide range of configurations, the PHY profiles must be statically set. This patch provides the necessary functionality to do so. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: add support for ATTO devices during SAS address initiailizationBenjamin Rood
ATTO SAS controllers retrieve the SAS address from the NVRAM in a location different from non-ATTO PMC Sierra SAS controllers. This patch makes the necessary adjustments in order to retrieve the SAS address on these types of adapters. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: add ATTO PCI IDs to pm8001_pci_tableBenjamin Rood
These PCI IDs allow the pm8001 driver to load against ATTO 12Gb SAS controllers that use PMC Sierra 8070 and PMC Sierra 8072 SAS chips. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: add support for PMC Sierra 8070 and PMC Sierra 8072 SAS controllersBenjamin Rood
These SAS controllers support speeds up to 12Gb. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02pm80xx: configure PHY settings based on subsystem vendor IDBenjamin Rood
Previuosly, all PMC Sierra 80xx controllers are assumed to be a motherboard controller, except if the subsystem vendor ID was equal to PCI_VENDOR_ID_ADAPTEC. The driver then attempts to load PHY settings from NVRAM. While this may be correct behavior for most controllers, it does not work with Adaptec and ATTO controllers since they do not store PHY settings in NVRAM and choose to use either custom PHY settings or chip defaults. Loading random values from NVRAM may cause the controllers to malfunction in this edge case. Signed-off-by: Benjamin Rood <brood@attotech.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02SCSI: Increase REPORT_LUNS timeoutBrian King
This patch fixes an issue seen with an IBM 2145 (SVC) where, following an error injection test which results in paths going offline, when they came back online, the path would timeout the REPORT_LUNS issued during the scan. This timeout situation continued until retries were expired, resulting in falling back to a sequential LUN scan. Then, since the target responds with PQ=1, PDT=0 for all possible LUNs, due to the way the sequential LUN scan code works, we end up adding 512 LUNs for each target, when there is really only a small handful of LUNs that are actually present. This patch increases the timeout used on the REPORT_LUNS to 30 seconds. This patch solves the issue of 512 non existent LUNs showing up after this event. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02sfc: push partner queue for skb->xmit_moreMartin Habets
When the IP stack passes SKBs the sfc driver puts them in 2 different TX queues (called partners), one for checksummed and one for not checksummed. If the SKB has xmit_more set the driver will delay pushing the work to the NIC. When later it does decide to push the buffers this patch ensures it also pushes the partner queue, if that also has any delayed work. Before this fix the work in the partner queue would be left for a long time and cause a netdev watchdog. Fixes: 70b33fb ("sfc: add support for skb->xmit_more") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02sh_eth: fix typo in RX descriptor bit nameSergei Shtylyov
The correct name of the RX descriptor 0 bit 30 is RDLE (receive descriptor list end), not RDEL. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02megaraid_sas : Remove debug print from function megasas_update_span_setsumit.saxena@avagotech.com
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02megaraid_sas : Driver version upgradesumit.saxena@avagotech.com
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02megaraid_sas : SMAP restriction--do not access user memory from IOCTL codesumit.saxena@avagotech.com
This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit OS. Do not access user memory from kernel code. The SMAP bit restricts accessing user memory from kernel code. Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02sit: fix sit0 percpu double allocationsEric Dumazet
sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02Merge branch 'bonding-actor-updates'David S. Miller
Mahesh Bandewar says: ==================== re-org actor admin/oper key updates I was observing machines entering into weird LACP state when the partner is in passive mode. This issue is not because of the partners in passive state but probably because of some operational key update which is pushing the state-machine is that weird state. This was happening randomly on about 1% of the machine (when the sample size is a large set of machines with a variety of NICs/ports bonded). In this patch-series I'm attempting to unify the logic of actor-key / operational-key changes to one place to avoid possible errors in update. Also this eliminates the need for the event-handler to decide if the key needs update. After this patch-set none of the machines (from same sample set) were exhibiting LACP-weirdness that was observed earlier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bonding: simplify / unify event handling code for 3ad mode.Mahesh Bandewar
Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bonding: unify all places where actor-oper key needs to be updated.Mahesh Bandewar
actor_admin, and actor_oper key is changed at multiple locations in the code. This patch brings all those updates into one location in an attempt to avoid possible inconsistent updates causing LACP state machine to go in weird state. The unified place is ad_update_actor_key() with simple state-machine logic - (a) If port is "duplex" then only it can participate in LACP (b) Speed change reinitializes the LACP state-machine. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bonding: Simplify __get_duplex function.Mahesh Bandewar
Eliminate 'else' clause by simply initializing variable Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02Merge branch 'bpf-persistent'David S. Miller
Daniel Borkmann says: ==================== BPF updates This set adds support for persistent maps/progs. Please see individual patches for further details. A man-page update to bpf(2) will be sent later on, also a iproute2 patch for support in tc. v1 -> v2: - Reworked most of patch 4 and 5 - Rebased to latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: add sample usages for persistent maps/progsDaniel Borkmann
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: add support for persistent maps/progsDaniel Borkmann
This work adds support for "persistent" eBPF maps/programs. The term "persistent" is to be understood that maps/programs have a facility that lets them survive process termination. This is desired by various eBPF subsystem users. Just to name one example: tc classifier/action. Whenever tc parses the ELF object, extracts and loads maps/progs into the kernel, these file descriptors will be out of reach after the tc instance exits. So a subsequent tc invocation won't be able to access/relocate on this resource, and therefore maps cannot easily be shared, f.e. between the ingress and egress networking data path. The current workaround is that Unix domain sockets (UDS) need to be instrumented in order to pass the created eBPF map/program file descriptors to a third party management daemon through UDS' socket passing facility. This makes it a bit complicated to deploy shared eBPF maps or programs (programs f.e. for tail calls) among various processes. We've been brainstorming on how we could tackle this issue and various approches have been tried out so far, which can be read up further in the below reference. The architecture we eventually ended up with is a minimal file system that can hold map/prog objects. The file system is a per mount namespace singleton, and the default mount point is /sys/fs/bpf/. Any subsequent mounts within a given namespace will point to the same instance. The file system allows for creating a user-defined directory structure. The objects for maps/progs are created/fetched through bpf(2) with two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor along with a pathname is being passed to bpf(2) that in turn creates (we call it eBPF object pinning) the file system nodes. Only the pathname is being passed to bpf(2) for getting a new BPF file descriptor to an existing node. The user can use that to access maps and progs later on, through bpf(2). Removal of file system nodes is being managed through normal VFS functions such as unlink(2), etc. The file system code is kept to a very minimum and can be further extended later on. The next step I'm working on is to add dump eBPF map/prog commands to bpf(2), so that a specification from a given file descriptor can be retrieved. This can be used by things like CRIU but also applications can inspect the meta data after calling BPF_OBJ_GET. Big thanks also to Alexei and Hannes who significantly contributed in the design discussion that eventually let us end up with this architecture here. Reference: https://lkml.org/lkml/2015/10/15/925 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: consolidate bpf_prog_put{, _rcu} dismantle pathsDaniel Borkmann
We currently have duplicated cleanup code in bpf_prog_put() and bpf_prog_put_rcu() cleanup paths. Back then we decided that it was not worth it to make it a common helper called by both, but with the recent addition of resource charging, we could have avoided the fix in commit ac00737f4e81 ("bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put") if we would have had only a single, common path. We can simplify it further by assigning aux->prog only once during allocation time. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: align and clean bpf_{map,prog}_get helpersDaniel Borkmann
Add a bpf_map_get() function that we're going to use later on and align/clean the remaining helpers a bit so that we have them a bit more consistent: - __bpf_map_get() and __bpf_prog_get() that both work on the fd struct, check whether the descriptor is eBPF and return the pointer to the map/prog stored in the private data. Also, we can return f.file->private_data directly, the function signature is enough of a documentation already. - bpf_map_get() and bpf_prog_get() that both work on u32 user fd, call their respective __bpf_map_get()/__bpf_prog_get() variants, and take a reference. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>