summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-06Changed version from 1.6.21 to 1.6.25Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Blink LED on 1G BaseT boardsHenry Tieman
Before this patch "ethtool -p" was not blinking the LEDs on boards with 1G BaseT PHYs. This commit identifies 1G BaseT boards as having the LEDs connected to the MAC. Also, renamed the flag to be more descriptive of usage. The flag is now I40E_FLAG_PHY_CONTROLS_LEDS. Change-ID: I4eb741da9780da7849ddf2dc4c0cb27ffa42a801 Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: remove code to handle dev_addr speciallyJacob Keller
The netdev->dev_addr MAC filter already exists in the MAC/VLAN hash table, as it is added when we configure the netdev in i40e_configure_netdev. Because we already know that this address will be updated in the hash_for_each loops, we do not need to handle it specially. This removes duplicate code and simplifies the i40e_vsi_add_vlan and i40e_vsi_kill_vlan functions. Because we know these filters must be part of the MAC/VLAN hash table, this should not have any functional impact on what filters are included and is merely a code simplification. Change-ID: I5e648302dbdd7cc29efc6d203b7019c11f0b5705 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e/i40evf: napi_poll must return the work doneAlexander Duyck
Currently the function i40e_napi-poll() returns 0 when it clean completely the Rx rings, but this foul budget accounting in core code. Fix this by returning the actual work done, capped to budget - 1, since the core doesn't allow to return the full budget when the driver modifies the NAPI status This is based on a similar change that was made for the ixgbe driver by Paolo Abeni. Change-ID: Ic3d93ad2fa2fc8ce3164bc461e69367da0f9173b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: restore workaround for removing default MAC filterJacob Keller
A previous commit 53cb6e9e8949 ("i40e: Removal of workaround for simple MAC address filter deletion") removed a workaround for some firmware versions which was reported to not be necessary in production NICs. Unfortunately this workaround is necessary in some configurations, specifically the Ethernet Controller XL710 for 40GbE QSFP+ (8086:1583). Without this patch, the mentioned NICs with current firmware exhibit issues when adding VLANs, as outlined by the following reproduction: $modprobe i40e $ip link set <device> up $ip link add link <device> vlan100 type vlan id 100 $dmesg | tail <snip> kernel: i40e 0000:82:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on This results in filters being marked as FAILED and setting the device in promiscuous mode. The root cause of receiving the -EINVAL error response appears to be due to a conflict with the default MAC filter which still exists on the default firmware for this device. Attempting to add a new VLAN filter on the default MAC address conflicts with the IGNORE_VLAN setting on the default rule. Change-ID: I4d8f6d48ac5f60cfe981b3baad30eb4d7c170d61 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: simplify txd use count calculationMitch Williams
The i40e_txd_use_count function was fast but confusing. In the comments, it even admits that it's ugly. So replace it with a new function that is (very) slightly faster and has extensive commenting to help the thicker among us (including the author, who will forget in a week) understand how it works. Change-ID: Ifb533f13786a0bf39cb29f77969a5be2c83d9a87 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Driver prints log message on link speed changeFilip Sadowski
This patch makes the driver log link speed change. Before applying the patch link messages were printed only on state change. Now message is printed when link is brought up or down and when speed changes. Change-ID: Ifbee14b4b16c24967450b3cecac6e8351dcc8f74 Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06tun: Use netif_receive_skb instead of netif_rxAndrey Konovalov
This patch changes tun.c to call netif_receive_skb instead of netif_rx when a packet is received (if CONFIG_4KSTACKS is not enabled to avoid stack exhaustion). The difference between the two is that netif_rx queues the packet into the backlog, and netif_receive_skb proccesses the packet in the current context. This patch is required for syzkaller [1] to collect coverage from packet receive paths, when a packet being received through tun (syzkaller collects coverage per process in the process context). As mentioned by Eric this change also speeds up tun/tap. As measured by Peter it speeds up his closed-loop single-stream tap/OVS benchmark by about 23%, from 700k packets/second to 867k packets/second. A similar patch was introduced back in 2010 [2, 3], but the author found out that the patch doesn't help with the task he had in mind (for cgroups to shape network traffic based on the original process) and decided not to go further with it. The main concern back then was about possible stack exhaustion with 4K stacks. [1] https://github.com/google/syzkaller [2] https://www.spinics.net/lists/netdev/thrd440.html#130570 [3] https://www.spinics.net/lists/netdev/msg130570.html Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge branch 'w83977af_ir-neatening'David S. Miller
Joe Perches says: ==================== irda: w83977af_ir: Neatening Originally on top of Arnd's overly long udelay patches because I noticed a misindented block. That's now already fixed along with some other whitespace problems. These patches are the remainder style issues from my original series. Even though I haven't turned on the netwinder in a box in the garage in who knows how long, if this device is still used somewhere, might as well neaten the code too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Neaten loggingJoe Perches
Use more common logging style, standardize function output logging use. Miscellanea: o Add and use pr_fmt o Convert printks to pr_<level> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Parenthesis alignmentJoe Perches
Neaten function declaration and definition arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Use the common brace styleJoe Perches
Add braces where appropriate and remove an unnecessary else. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Neaten pointer comparisonsJoe Perches
Convert pointer comparisons to NULL. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Remove and add blank linesJoe Perches
Use a more typical vertical spacing style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: More whitespace neateningJoe Perches
Add spaces around operators. git diff -w shows no differences. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Whitespace neateningJoe Perches
Remove leading and trailing whitespace. git diff -w shows no differences. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-06device-dax: fix private mapping restriction, permit read-onlyDan Williams
Hugh notes in response to commit 4cb19355ea19 "device-dax: fail all private mapping attempts": "I think that is more restrictive than you intended: haven't tried, but I believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no way to mmap /dev/dax without write permission to it." Indeed it does restrict read-only mappings, switch to checking VM_MAYSHARE, not VM_SHARED. Cc: <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Fixes: 4cb19355ea19 ("device-dax: fail all private mapping attempts") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06tools/testing/nvdimm: unit test acpi_nfit_ctl()Dan Williams
A recent flurry of bug discoveries in the nfit driver's DSM marshalling routine has highlighted the fact that we do not have unit test coverage for this routine. Add a self-test of acpi_nfit_ctl() routine before probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl() and if any of the tests fail "nfit_test.0" will be unavailable causing the rest of the tests to not run / fail. This unit test will also be a place to land reproductions of quirky BIOS behavior discovered in the field and ensure the kernel does not regress against implementations it has seen in practice. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06acpi, nfit: fix bus vs dimm confusion in xlat_statusDan Williams
Given dimms and bus commands share the same command number space we need to be careful that we are translating status in the correct context. Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command because max_xfer is zero. It fails because that condition erroneously correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR. Cc: <stable@vger.kernel.org> Fixes: aef253382266 ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06acpi, nfit: validate ars_status output buffer sizeDan Williams
If an ARS Status command returns truncated output, do not process partial records or otherwise consume non-status fields. Cc: <stable@vger.kernel.org> Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06acpi, nfit, libnvdimm: fix / harden ars_status output length handlingDan Williams
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)" field of the ARS (Address Range Scrub) Status command, a firmware implementation may in practice return 0, 4, or 8 to indicate that there is no output payload to process. The specification states "Size of Output Buffer in bytes, including this field.". However, 'Output Buffer' is also the name of the entire payload, and earlier in the specification it states "Max Query ARS Status Output Buffer Size: Maximum size of buffer (including the Status and Extended Status fields)". Without this fix if the BIOS happens to return 0 it causes memory corruption as evidenced by this result from the acpi_nfit_ctl() unit test. ars_status00000000: 00020000 00000000 ........ BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff) kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC task: ffff8803332d2ec0 task.stack: ffffc9000174c000 RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20 RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0 FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0 Stack: ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac 0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000 0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246 Call Trace: [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit] [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test] Cc: <stable@vger.kernel.org> Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06acpi, nfit: fix extended status translations for ACPI DSMsVishal Verma
ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Return from each command's 'case' once we have verified both its status and extend status are good. Cc: <stable@vger.kernel.org> Fixes: 11294d63ac91 ("nfit: fail DSMs that return non-zero status by default") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06netfilter: nf_tables: add stateful object reference expressionPablo Neira Ayuso
This new expression allows us to refer to existing stateful objects from rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_quota: add stateful object typePablo Neira Ayuso
Register a new quota stateful object type into the new stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_counter: add stateful object typePablo Neira Ayuso
Register a new percpu counter stateful object type into the stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nf_tables: add stateful objectsPablo Neira Ayuso
This patch augments nf_tables to support stateful objects. This new infrastructure allows you to create, dump and delete stateful objects, that are identified by a user-defined name. This patch adds the generic infrastructure, follow up patches add support for two stateful objects: counters and quotas. This patch provides a native infrastructure for nf_tables to replace nfacct, the extended accounting infrastructure for iptables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: add and use nf_fwd_netdev_egressFlorian Westphal
... so we can use current skb instead of working with a clone. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: ingress: translate 0 nf_hook_slow retval to -1Florian Westphal
The caller assumes that < 0 means that skb was stolen (or free'd). All other return values continue skb processing. nf_hook_slow returns 3 different return value types: A) a (negative) errno value: the skb was dropped (NF_DROP, e.g. by iptables '-j DROP' rule). B) 0. The skb was stolen by the hook or queued to userspace. C) 1. all hooks returned NF_ACCEPT so the caller should invoke the okfn so packet processing can continue. nft ingress facility currently doesn't have the 'okfn' that the NF_HOOK() macros use; there is no nfqueue support either. So 1 means that nf_hook_ingress() caller should go on processing the skb. In order to allow use of NF_STOLEN from ingress we need to translate this to an errno number, else we'd crash because we continue with already-free'd (or about to be free-d) skb. The errno value isn't checked, its just important that its less than 0, so return -1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: xt_multiport: Fix wrong unmatch result with multiple portsGao Feng
I lost one test case in the last commit for xt_multiport. For example, the rule is "-m multiport --dports 22,80,443". When first port is unmatched and the second is matched, the curent codes could not return the right result. It would return false directly when the first port is unmatched. Fixes: dd2602d00f80 ("netfilter: xt_multiport: Use switch case instead of multiple condition checks") Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fieldsPablo Neira Ayuso
This patch adds a new flag that signals the kernel to update layer 4 checksum if the packet field belongs to the layer 4 pseudoheader. This implicitly provides stateless NAT 1:1 that is useful under very specific usecases. Since rules mangling layer 3 fields that are part of the pseudoheader may potentially convey any layer 4 packet, we have to deal with the layer 4 checksum adjustment using protocol specific code. This patch adds support for TCP, UDP and ICMPv6, since they include the pseudoheader in the layer 4 checksum calculation. ICMP doesn't, so we can skip it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_fib_ipv4: initialize *dest to zeroLiping Zhang
Otherwise, if fib lookup fail, *dest will be filled with garbage value, so reverse path filtering will not work properly: # nft add rule x prerouting fib saddr oif eq 0 drop Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_fib: convert htonl to ntohl properlyLiping Zhang
Acctually ntohl and htonl are identical, so this doesn't affect anything, but it is conceptually wrong. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: convert while loops to for loopsAaron Conole
This is to facilitate converting from a singly-linked list to an array of elements. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: decouple nf_hook_entry and nf_hook_opsAaron Conole
During nfhook traversal we only need a very small subset of nf_hook_ops members. We need: - next element - hook function to call - hook function priv argument Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops. nf_hook_entry struct is now 32 bytes on x86_64. A followup patch will turn the run-time list into an array that only stores hook functions plus their priv arguments, eliminating the ->next element. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: introduce accessor functions for hook entriesAaron Conole
This allows easier future refactoring. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: defrag: only register defrag functionality if neededFlorian Westphal
nf_defrag modules for ipv4 and ipv6 export an empty stub function. Any module that needs the defragmentation hooks registered simply 'calls' this empty function to create a phony module dependency -- modprobe will then load the defrag module too. This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook registration until the functionality is requested within a network namespace instead of module load time for all namespaces. Hooks are only un-registered on module unload or when a namespace that used such defrag functionality exits. We have to use struct net for this as the register hooks can be called before netns initialization here from the ipv4/ipv6 conntrack module init path. There is no unregister functionality support, defrag will always be active once it was requested inside a net namespace. The reason is that defrag has impact on nft and iptables rulesets (without defrag we might see framents). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fix from David Miller: "A use-before-NULL-check from Dan Carpenter" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: dbri: move dereference after check for NULL
2016-12-06dbri: move dereference after check for NULLDan Carpenter
We accidentally introduced a dereference before the NULL check in xmit_descs() as part of silencing a GCC warning. Fixes: 16f46050e709 ("dbri: Fix compiler warning") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) When dcbnl_cee_fill() fails to be able to push a new netlink attribute, it return 0 instead of an error code. From Pan Bian. 2) Two suffix handling fixes to FIB trie code, from Alexander Duyck. 3) bnxt_hwrm_stat_ctx_alloc() goes through all the trouble of setting and maintaining a return code 'rc' but fails to actually return it. Also from Pan Bian. 4) ping socket ICMP handler needs to validate ICMP header length, from Kees Cook. 5) caif_sktinit_module() has this interesting logic: int err = sock_register(...); if (!err) return err; return 0; Just return sock_register()'s return value directly which is the only possible correct thing to do. 6) Two bnx2x driver fixes from Yuval Mintz, return a reasonable estimate from get_ringparam() ethtool op when interface is down and avoid trying to use UDP port based tunneling on 577xx chips. 7) Fix ep93xx_eth crash on module unload from Florian Fainelli. 8) Missing uapi exports, from Stephen Hemminger. 9) Don't schedule work from sk_destruct(), because the socket will be freed upon return from that function. From Herbert Xu. 10) Buggy drivers, of which we know there is at least one, can send a huge packet into the TCP stack but forget to set the gso_size in the SKB, which causes all kinds of problems. Correct this when it happens, and emit a one-time warning with the device name included so that it can be diagnosed more easily. From Marcelo Ricardo Leitner. 11) virtio-net does DMA off the stack causes hiccups with VMAP_STACK, fix from Andy Lutomirski. 12) Fix fec driver compilation with CONFIG_M5272, from Nikita Yushchenko. 13) mlx5 fixes from Kamal Heib, Saeed Mahameed, and Mohamad Haj Yahia. (erroneously flushing queues on error, module parameter validation, etc) * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits) net/mlx5e: Change the SQ/RQ operational state to positive logic net/mlx5e: Don't flush SQ on error net/mlx5e: Don't notify HW when filling the edge of ICO SQ net/mlx5: Fix query ISSI flow net/mlx5: Remove duplicate pci dev name print net/mlx5: Verify module parameters net: fec: fix compile with CONFIG_M5272 be2net: Add DEVSEC privilege to SET_HSW_CONFIG command. virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address() tcp: warn on bogus MSS and try to amend it uapi glibc compat: fix outer guard of net device flags enum net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing netlink: Do not schedule work from sk_destruct uapi: export nf_log.h uapi: export tc_skbmod.h net: ep93xx_eth: Do not crash unloading module bnx2x: Prevent tunnel config for 577xx bnx2x: Correct ringparam estimate when DOWN isdn: hisax: set error code on failure net: bnx2x: fix improper return value ...
2016-12-06shmem: fix shm fallocate() list corruptionLinus Torvalds
The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not want to race with generating new pages by faulting them in. However, the wait-queue used to delay the page faulting has a serious problem: the wait queue head (in shmem_fallocate()) is allocated on the stack, and the code expects that "wake_up_all()" will make sure that all the queue entries are gone before the stack frame is de-allocated. And that is not at all necessarily the case. Yes, a normal wake-up sequence will remove the wait-queue entry that caused the wakeup (see "autoremove_wake_function()"), but the key wording there is "that caused the wakeup". When there are multiple possible wakeup sources, the wait queue entry may well stay around. And _particularly_ in a page fault path, we may be faulting in new pages from user space while we also have other things going on, and there may well be other pending wakeups. So despite the "wake_up_all()", it's not at all guaranteed that all list entries are removed from the wait queue head on the stack. Fix this by introducing a new wakeup function that removes the list entry unconditionally, even if the target process had already woken up for other reasons. Use that "synchronous" function to set up the waiters in shmem_fault(). This problem has never been seen in the wild afaik, but Dave Jones has reported it on and off while running trinity. We thought we fixed the stack corruption with the blk-mq rq_list locking fix (commit 7fe311302f7d: "blk-mq: update hardware and software queues for sleeping alloc"), but it turns out there was _another_ stack corruptor hiding in the trinity runs. Vegard Nossum (also running trinity) was able to trigger this one fairly consistently, and made us look once again at the shmem code due to the faults often being in that area. Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>. Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-06Merge branch 'mlx5-fixes'David S. Miller
Saeed Mahameed says: ==================== Mellanox 100G mlx5 fixes 2016-12-04 Some bug fixes for mlx5 core and mlx5e driver. v1->v2: - replace "uint" with "unsigned int" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Change the SQ/RQ operational state to positive logicMohamad Haj Yahia
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen we will have a time interval that the RQ/SQ is not really ready and the state indicates that its not in FLUSH state because the initial SQ/RQ struct memory starts as zeros. Now we changed the state to indicate if the SQ/RQ is opened and we will set the READY state after finishing preparing all the SQ/RQ resources. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't flush SQ on errorSaeed Mahameed
We are doing SQ descriptors cleanup in driver. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't notify HW when filling the edge of ICO SQSaeed Mahameed
We are going to do this a couple of steps ahead anyway. Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Fix query ISSI flowKamal Heib
In old FWs query ISSI command is not supported and for some of those FWs it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR". In such case instead of failing the driver load, we will treat any FW status other than 0 for Query ISSI FW command as ISSI not supported and assume ISSI=0 (most basic driver/FW interface). In case of driver syndrom (query ISSI failure by driver) we will fail driver load. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Remove duplicate pci dev name printKamal Heib
Remove duplicate pci dev name printing from mlx5_core_warn/dbg. Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>