summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-07Merge branch 'bnxt_en-RDMA'David S. Miller
Michael Chan says: ==================== bnxt_en: Add interface to support RDMA driver. This series adds an interface to support a brand new RDMA driver bnxt_re. The first step is to re-arrange some code so that pci_enable_msix() can be called during pci probe. The purpose is to allow the RDMA driver to initialize and stay initialized whether the netdev is up or down. Then we make some changes to VF resource allocation so that there is enough resources to support RDMA. Finally the last patch adds a simple interface to allow the RDMA driver to probe and register itself with any bnxt_en devices that support RDMA. Once registered, the RDMA driver can request MSIX, send fw messages, and receive some notifications. v2: Fixed kbuild test robot warnings. David, please consider this series for net-next. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Add interface to support RDMA driver.Michael Chan
Since the network driver and RDMA driver operate on the same PCI function, we need to create an interface to allow the RDMA driver to share resources with the network driver. 1. Create a new bnxt_en_dev struct which will be returned by bnxt_ulp_probe() upon success. After that, all calls from the RDMA driver to bnxt_en will pass a pointer to this struct. 2. This struct contains additional function pointers to register, request msix, send fw messages, register for async events. 3. If the RDMA driver wants to enable RDMA on the function, it needs to call the function pointer bnxt_register_device(). A ulp_ops structure is passed for RCU protected upcalls from bnxt_en to the RDMA driver. 4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg() function pointer. 5. 1 stats context is reserved when the RDMA driver registers. MSIX and completion rings are reserved when the RDMA driver calls bnxt_request_msix() function pointer. 6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources will be cleaned up. v2: Fixed 2 uninitialized variable warnings. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Refactor the driver registration function with firmware.Michael Chan
The driver register function with firmware consists of passing version information and registering for async events. To support the RDMA driver, the async events that we need to register may change. Separate the driver register function into 2 parts so that we can just update the async events for the RDMA driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Reserve RDMA resources by default.Michael Chan
If the device supports RDMA, we'll setup network default rings so that there are enough minimum resources for RDMA, if possible. However, the user can still increase network rings to the max if he wants. The actual RDMA resources won't be reserved until the RDMA driver registers. v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Improve completion ring allocation for VFs.Michael Chan
All available remaining completion rings not used by the PF should be made available for the VFs so that there are enough rings in the VF to support RDMA. The earlier workaround code of capping the rings by the statistics context is removed. When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources() to restore FW resources. Later on we need to add some logic to account for RDMA resources. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Move function reset to bnxt_init_one().Michael Chan
Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by the RDMA driver before the network device is opened. So we cannot do function reset in bnxt_open() which will clear all the resources. The proper place to do function reset now is in bnxt_init_one(). If we get AER, we'll do function reset as well. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Enable MSIX early in bnxt_init_one().Michael Chan
To better support the new RDMA driver, we need to move pci_enable_msix() from bnxt_open() to bnxt_init_one(). This way, MSIX vectors are available to the RDMA driver whether the network device is up or down. Part of the existing bnxt_setup_int_mode() function is now refactored into a new bnxt_init_int_mode(). bnxt_init_int_mode() is called during bnxt_init_one() to enable MSIX. The remaining logic in bnxt_setup_int_mode() to map the IRQs to the completion rings is called during bnxt_open(). v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07bnxt_en: Add bnxt_set_max_func_irqs().Michael Chan
By refactoring existing code into this new function. The new function will be used in subsequent patches. v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07net: sock_rps_record_flow() is for connected socketsEric Dumazet
Paolo noticed a cache line miss in UDP recvmsg() to access sk_rxhash, sharing a cache line with sk_drops. sk_drops might be heavily incremented by cpus handling a flood targeting this socket. We might place sk_drops on a separate cache line, but lets try to avoid wasting 64 bytes per socket just for this, since we have other bottlenecks to take care of. sock_rps_record_flow() should only access sk_rxhash for connected flows. Testing sk_state for TCP_ESTABLISHED covers most of the cases for connected sockets, for a zero cost, since system calls using sock_rps_record_flow() also access sk->sk_prot which is on the same cache line. A follow up patch will provide a static_key (Jump Label) since most hosts do not even use RFS. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07perf tools: Explicitly document that --children is enabled by defaultYannick Brosseau
The fact that the --children option is enabled by default is buried deep at the end of the help page, in the overhead calculation section. This make it explicit right where the option is listed, following the same way other default options are described Signed-off-by: Yannick Brosseau <scientist@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20161202160732.29058-1-scientist@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07perf sched timehist: Cleanup idle_max_cpu handlingNamhyung Kim
It treats the idle_max_cpu little bit confusingly IMHO. Let's make it more straight forward. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07perf sched timehist: Handle zero sample->tid properlyNamhyung Kim
Sometimes samples have tid of 0 but non-0 pid. It ends up having a new thread of 0 tid/pid (instead of referring idle task) since tid is used to search matching task. But I guess it's wrong to use 0 as a tid when pid is set. This patch uses tid only if it has a non-zero value or same as pid (of 0). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07perf callchain: Introduce callchain_cursor__copy()Namhyung Kim
The callchain_cursor__copy() function is to save current callchain captured by a cursor. It'll be used to keep callchains when switching to idle task for each cpu. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07perf sched: Cleanup option processingNamhyung Kim
The -D/--dump-raw-trace option is in the parent option so no need to repeat it. Also move -f/--force option to parent as it's common to handle data file. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161206034010.6499-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07perf sched timehist: Improve error message when analyzing wrong fileDavid Ahern
Arnaldo reported an unhelpful error message when running perf sched timehist on a file that did not contain sched tracepoints: [root@jouet ~]# perf sched timehist No trace sample to read. Did you call 'perf record -R'? [root@jouet ~]# perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 Change the has_traces check to look for the sched_switch event. Analysis for perf sched timehist requires at least this event. Now when analyzing a file without sched tracepoints you get: root@f21-vbox:/tmp$ perf sched timehist No sched_switch events found. Have you run 'perf sched record'? Signed-off-by: David Ahern <dsahern@gmail.com> Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480451988-43673-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-07netfilter: nft_quota: allow to restore consumed quotaPablo Neira Ayuso
Allow to restore consumed quota, this is useful to restore the quota state across reboots. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: xt_bpf: support ebpfWillem de Bruijn
Add support for attaching an eBPF object by file descriptor. The iptables binary can be called with a path to an elf object or a pinned bpf object. Also pass the mode and path to the kernel to be able to return it later for iptables dump and save. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: x_tables: avoid warn and OOM killer on vmalloc callMarcelo Ricardo Leitner
Andrey Konovalov reported that this vmalloc call is based on an userspace request and that it's spewing traces, which may flood the logs and cause DoS if abused. Florian Westphal also mentioned that this call should not trigger OOM killer. This patch brings the vmalloc call in sync to kmalloc and disables the warn trace on allocation failure and also disable OOM killer invocation. Note, however, that under such stress situation, other places may trigger OOM killer invocation. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: support for set flushingPablo Neira Ayuso
This patch adds support for set flushing, that consists of walking over the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set. This patch requires the following changes: 1) Add set->ops->deactivate_one() operation: This allows us to deactivate an element from the set element walk path, given we can skip the lookup that happens in ->deactivate(). 2) Add a new nft_trans_alloc_gfp() function since we need to allocate transactions using GFP_ATOMIC given the set walk path happens with held rcu_read_lock. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_set: introduce nft_{hash, rbtree}_deactivate_one()Pablo Neira Ayuso
This new function allows us to deactivate one single element, this is required by the set flush command that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc()Pablo Neira Ayuso
Context is not modified by nft_trans_alloc(), so constify it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nat: skip checksum on offload SCTP packetsDavide Caratti
SCTP GSO and hardware can do CRC32c computation after netfilter processing, so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE when the NAT code computes the CRC, to prevent offloaders from computing it again (on ixgbe this resulted in a transmission with wrong L4 checksum). Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: rpfilter: bypass ipv4 lbcast packets with zeronet sourceLiping Zhang
Otherwise, DHCP Discover packets(0.0.0.0->255.255.255.255) may be dropped incorrectly. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: allow to filter stateful object dumps by typePablo Neira Ayuso
This patch adds the netlink code to filter out dump of stateful objects, through the NFTA_OBJ_TYPE netlink attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_objref: support for stateful object mapsPablo Neira Ayuso
This patch allows us to refer to stateful object dictionaries, the source register indicates the key data to be used to look up for the corresponding state object. We can refer to these maps through names or, alternatively, the map transaction id. This allows us to refer to both anonymous and named maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: add stateful object reference to set elementsPablo Neira Ayuso
This patch allows you to refer to stateful objects from set elements. This provides the infrastructure to create maps where the right hand side of the mapping is a stateful object. This allows us to build dictionaries of stateful objects, that you can use to perform fast lookups using any arbitrary key combination. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_quota: add depleted flag for objectsPablo Neira Ayuso
Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag indicates we have reached overquota. Add pointer to table from nft_object, so we can use it when sending the depletion notification to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: notify internal updates of stateful objectsPablo Neira Ayuso
Introduce nf_tables_obj_notify() to notify internal state changes in stateful objects. This is used by the quota object to report depletion in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: atomic dump and reset for stateful objectsPablo Neira Ayuso
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic dump-and-reset of the stateful object. This also comes with add support for atomic dump and reset for counter and quota objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07crypto: mcryptd - Check mcryptd algorithm compatibilitytim
Algorithms not compatible with mcryptd could be spawned by mcryptd with a direct crypto_alloc_tfm invocation using a "mcryptd(alg)" name construct. This causes mcryptd to crash the kernel if an arbitrary "alg" is incompatible and not intended to be used with mcryptd. It is an issue if AF_ALG tries to spawn mcryptd(alg) to expose it externally. But such algorithms must be used internally and not be exposed. We added a check to enforce that only internal algorithms are allowed with mcryptd at the time mcryptd is spawning an algorithm. Link: http://marc.info/?l=linux-crypto-vger&m=148063683310477&w=2 Cc: stable@vger.kernel.org Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07crypto: algif_aead - fix AEAD tag memory handlingStephan Mueller
For encryption, the AEAD ciphers require AAD || PT as input and generate AAD || CT || Tag as output and vice versa for decryption. Prior to this patch, the AF_ALG interface for AEAD ciphers requires the buffer to be present as input for encryption. Similarly, the output buffer for decryption required the presence of the tag buffer too. This implies that the kernel reads / writes data buffers from/to kernel space even though this operation is not required. This patch changes the AF_ALG AEAD interface to be consistent with the in-kernel AEAD cipher requirements. Due to this handling, he changes are transparent to user space with one exception: the return code of recv indicates the mount of output buffer. That output buffer has a different size compared to before the patch which implies that the return code of recv will also be different. For example, a decryption operation uses 16 bytes AAD, 16 bytes CT and 16 bytes tag, the AF_ALG AEAD interface before showed a recv return code of 48 (bytes) whereas after this patch, the return code is 32 since the tag is not returned any more. Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernelHoria Geantă
Start with a clean slate before dealing with bit 16 (pointer size) of Master Configuration Register. This fixes the case of AArch64 boot loader + AArch32 kernel, when the boot loader might set MCFGR[PS] and kernel would fail to clear it. Cc: <stable@vger.kernel.org> Reported-by: Alison Wang <alison.wang@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-By: Alison Wang <Alison.wang@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07crypto: marvell - Don't corrupt state of an STD req for re-stepped ahashRomain Perier
mv_cesa_hash_std_step() copies the creq->state into the SRAM at each step, but this is only required on the first one. By doing that, we overwrite the engine state, and get erroneous results when the crypto request is split in several chunks to fit in the internal SRAM. This commit changes the function to copy the state only on the first step. Fixes: commit 2786cee8e50b ("crypto: marvell - Move SRAM I/O op...") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07crypto: marvell - Don't copy hash operation twice into the SRAMRomain Perier
No need to copy the template of an hash operation twice into the SRAM from the step function. Fixes: commit 85030c5168f1 ("crypto: marvell - Add support for chai...") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07netfilter: nft_quota: dump consumed quotaPablo Neira Ayuso
Add a new attribute NFTA_QUOTA_CONSUMED that displays the amount of quota that has been already consumed. This allows us to restore the internal state of the quota object between reboots as well as to monitor how wasted it is. This patch changes the logic to account for the consumed bytes, instead of the bytes that remain to be consumed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07can: raw: raw_setsockopt: limit number of can_filter that can be setMarc Kleine-Budde
This patch adds a check to limit the number of can_filters that can be set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER are not prevented resulting in a warning. Reference: https://lkml.org/lkml/2016/12/2/230 Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-12-07parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and ↵John David Anglin
flush_icache_page_asm We have four routines in pacache.S that use temporary alias pages: copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and flush_icache_page_asm(). copy_user_page_asm() and clear_user_page_asm() don't purge the TLB entry used for the operation. flush_dcache_page_asm() and flush_icache_page_asm do purge the entry. Presumably, this was thought to optimize TLB use. However, the operation is quite heavy weight on PA 1.X processors as we need to take the TLB lock and a TLB broadcast is sent to all processors. This patch removes the purges from flush_dcache_page_asm() and flush_icache_page_asm. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-12-07parisc: Purge TLB before setting PTEJohn David Anglin
The attached change interchanges the order of purging the TLB and setting the corresponding page table entry. TLB purges are strongly ordered. It occurred to me one night that setting the PTE first might have subtle ordering issues on SMP machines and cause random memory corruption. A TLB lock guards the insertion of user TLB entries. So after the TLB is purged, a new entry can't be inserted until the lock is released. This ensures that the new PTE value is used when the lock is released. Since making this change, no random segmentation faults have been observed on the Debian hppa buildd servers. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-12-06i40e: move all updates for VLAN mode into i40e_sync_vsi_filtersJacob Keller
In a similar fashion to how we handled exiting VLAN mode, move the logic in i40e_vsi_add_vlan into i40e_sync_vsi_filters. Extract this logic into its own function for ease of understanding as it will become quite complex. The new function, i40e_correct_mac_vlan_filters() correctly updates all filters for when we need to enter VLAN mode, exit VLAN mode, and also enforces the PVID when assigned. Call i40e_correct_mac_vlan_filters from i40e_sync_vsi_filters passing it the number of active VLAN filters, and the two temporary lists. Remove the function for updating VLAN=0 filters from i40e_vsi_add_vlan. The end result is that the logic for entering and exiting VLAN mode is in one location which has the most knowledge about all filters. This ensures that we always correctly have the non-VLAN filters assigned to VID=0 or VID=-1 regardless of how we ended up getting to this result. Additionally this enforces the PVID at sync time so that we know for certain that an assigned PVID results in only filters with that PVID will be added to the firmware. Change-ID: I895cee81e9c92d0a16baee38bd0ca51bbb14e372 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: use (add|rm)_vlan_all_mac helper functions when changing PVIDJacob Keller
The current flow for adding or updating the PVID for a VF uses i40e_vsi_add_vlan and i40e_vsi_kill_vlan which each take, then release the hash lock. In addition the two functions also must take special care that they do not perform VLAN mode changes as this will make the code in i40e_ndo_set_vf_port_vlan behave incorrectly. Fix these issues by using the new helper functions i40e_add_vlan_all_mac and i40e_rm_vlan_all_mac which expect the hash lock to already be taken. Additionally these functions do not perform any state updates in regards to VLAN mode, so they are safe to use in the PVID update flow. It should be noted that we don't need the VLAN mode update code here, because there are only a few flows here. (a) we're adding a new PVID In this case, if we already had VLAN filters the VSI is knocked offline so we don't need to worry about pre-existing VLAN filters (b) we're replacing an existing PVID In this case, we can't have any VLAN filters except those with the old PVID which we already take care of manually. (c) we're removing an existing PVID Similarly to above, we can't have any existing VLAN filters except those with the old PVID which we already take care of correctly. Because of this, we do not need (or even want) the special accounting done in i40e_vsi_add_vlan, so use of the helpers is a saner alternative. It also opens the door for a future patch which will refactor the flow of i40e_vsi_add_vlan now that it is not needed in this function. Change-ID: Ia841f63da94e12b106f41cf7d28ce8ce92f2ad99 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: factor out addition/deletion of VLAN per each MAC addressJacob Keller
A future refactor of how the PF assigns a PVID to a VF will want to be able to add and remove a block of filters by VLAN without worrying about accidentally triggering the accounting for I40E_VLAN_ANY. Additionally the PVID assignment would like to be able to batch several changes under one use of the mac_filter_hash_lock. Factor out the addition and deletion of a VLAN on all MACs into their own function which i40e_vsi_(add|kill)_vlan can use. These new functions expect the caller to take the hash lock, as well as perform any necessary accounting for updating I40E_VLAN_ANY filters if we are now operating under VLAN mode. Change-ID: If79e5b60b770433275350a74b3f1880333a185d5 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: delete filter after adding its replacement when convertingJacob Keller
Fix a subtle issue with the code for converting VID=-1 filters into VID=0 filters when adding a new VLAN. Previously the code deleted the VID=-1 filter, and then added a new VID=0 filter. In the rare case that the addition fails due to -ENOMEM, we end up completely deleting the filter which prevents recovery if memory pressure subsides. While it is not strictly an issue because it is likely that memory issues would result in many other problems, we shouldn't delete the filter until after the addition succeeds. Change-ID: Icba07ddd04ecc6a3b27c2e29f2c1c8673d266826 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: refactor i40e_update_filter_state to avoid passing aq_errJacob Keller
The current caller of i40e_update_filter_state incorrectly passes aq_ret, an i40e_status variable, instead of the expected aq_err. This happens to work because i40e_status is actually just a typedef integer, and 0 is still the successful return. However i40e_update_filter_state has special handling for ENOSPC which is currently being ignored. Also notice that firmware does not update the per-filter response for many types of errors, such as EINVAL. Thus, modify the filter setup so that the firmware response memory is pre-set with I40E_AQC_MM_ERR_NO_RES. This enables us to refactor i40e_update_filter_state, removing the need to pass aq_err and avoiding a need for having 3 different flows for checking the filter state. The resulting code for i40e_update_filter_state is much simpler, only a single loop and we always check each filter response value every time. Since we pre-set the response value to match our expected error this correctly works for all success and error flows. Change-ID: Ie292c9511f34ee18c6ef40f955ad13e28b7aea7d Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: recalculate vsi->active_filters from hash contentsJacob Keller
Previous code refactors have accidentally caused issues with the counting of active_filters. Avoid similar issues in the future by simply re-counting the active filters every time after we handle add and delete of all the filters. Additionally this allows us to simplify the check for when we exit promiscuous mode since we can combine the check for failed filters at the same time. Additionally since we recount filters at the end we need to set vsi->promisc_threshold as well. The resulting code takes a bit longer since we do have to loop over filters again. However, the result is more readable and less likely to become incorrect due to failed accounting of filters in the future. Finally, this ensures that it is not possible for vsi->active_filters to ever underflow since we never decrement it. Change-ID: Ib4f3a377e60eb1fa6c91ea86cc02238c08edd102 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: defeature support for PTP L4 frame detection on XL710Jacob Keller
A product decision has been made to defeature detection of PTP frames over L4 (UDP) on the XL710 MAC. Do not advertise support for L4 timestamping. Change-ID: I41fbb0f84ebb27c43e23098c08156f2625c6ee06 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: lock service task correctlyMitch Williams
The service task lock was being set in the scheduling function, not the actual service task. This would potentially leave the bit set for a long time before the task actually ran. Furthermore, if the service task takes too long, it calls the schedule function to reschedule itself - which would fail to take the lock and do nothing. Instead, set and clear the lock bit in the service task itself. In the process, get rid of the i40e_service_event_complete() function, which is really just two lines of code that can be put right in the service task itself. Change-ID: I83155e682b686121e2897f4429eb7d3f7c669168 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add functions which apply correct PHY access method for read and write ↵Michal Kosiarz
operation Depending on external PHY type, register access method should be different. Clause22 or Clause45 can be chosen for different PHYs. Implemented functions apply correct access method for used device. Change-ID: If39d5f0da9c0b905a8cbdc1ab89885535e7d0426 Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add FEC for 25gCarolyn Wyborny
This patch adds adminq support for Forward Error Correction ("FEC")for 25g products. Change-ID: Iaff4910737c239d2c730e5c22a313ce9c37d3964 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jacek Naczyk <jacek.naczyk@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add support for 25G devicesCarolyn Wyborny
Add support for 25G devices - defines and data structures. One tricky part here is that the firmware support for these Devices introduces a mismatch between the PHY type enum and the bitfields for the phy types. This change creates a macro and uses it to increment the 25G PHY values when creating 25G bitfields. Change-ID: I69b24d837d44cf9220bf5cb8dd46c5be89ce490b Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: use unsigned printf format specifier for active_filters countJacob Keller
Replace the %d specifier used for printing vsi->active_filters and vsi->promisc_threshold with an unsigned %u format specifier. While it is unlikely in practice that these values will ever reach such a large number they are unsigned values and thus should not be interpreted as negative numbers. Change-ID: Iff050fad5a1c8537c4c57fcd527441cd95cfc0d4 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>