summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-01mm, memory_hotplug: don't bail out in do_migrate_range() prematurelyOscar Salvador
do_migrate_range() takes a memory range and tries to isolate the pages to put them into a list. This list will be later on used in migrate_pages() to know the pages we need to migrate. Currently, if we fail to isolate a single page, we put all already isolated pages back to their LRU and we bail out from the function. This is quite suboptimal, as this will force us to start over again because scan_movable_pages will give us the same range. If there is no chance that we can isolate that page, we will loop here forever. Issue debugged in [1] has proved that. During the debugging of that issue, it was noticed that if do_migrate_ranges() fails to isolate a single page, we will just discard the work we have done so far and bail out, which means that scan_movable_pages() will find again the same set of pages. Instead, we can just skip the error, keep isolating as much pages as possible and then proceed with the call to migrate_pages(). This will allow us to do as much work as possible at once. [1] https://lkml.org/lkml/2018/12/6/324 Michal said: : I still think that this doesn't give us a whole picture. Looping for : ever is a bug. Failing the isolation is quite possible and it should : be a ephemeral condition (e.g. a race with freeing the page or : somebody else isolating the page for whatever reason). And here comes : the disadvantage of the current implementation. We simply throw : everything on the floor just because of a ephemeral condition. The : racy page_count check is quite dubious to prevent from that. Link: http://lkml.kernel.org/r/20181211135312.27034-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01Merge branch 'devlink-add-device-driver-information-API'David S. Miller
Jakub Kicinski says: ==================== devlink: add device (driver) information API fw_version field in ethtool -i does not suit modern needs with 31 characters being quite limiting on more complex systems. There is also no distinction between the running and flashed versions of the firmware. Since the driver information pertains to the entire device, rather than a particular netdev, it seems wise to move it do devlink, at the same time fixing the aforementioned issues. The new API allows exposing the device serial number and versions of the components of the card - both hardware, firmware (running and flashed). Driver authors can choose descriptive identifiers for the version fields. A few version identifiers which seemed relevant for most devices have been added to the global devlink header. Example: $ devlink dev info pci/0000:05:00.0 pci/0000:05:00.0: driver nfp serial_number 16240145 versions: fixed: board.id AMDA0099-0001 board.rev 07 board.vendor SMA board.model carbon running: fw.mgmt: 010156.010156.010156 fw.cpld: 0x44 fw.app: sriov-2.1.16 stored: fw.mgmt: 010158.010158.010158 fw.cpld: 0x44 fw.app: sriov-2.1.20 Last patch also includes a compat code for ethtool. If driver reports no fw_version via the traditional ethtool API, ethtool can call into devlink and try to cram as many versions as possible into the 31 characters. v4: - use IS_REACHABLE instead of IS_ENABLED in last patch. v3 (Jiri): - rename various functions and attributes; - break out the version helpers per-type; - make the compat code parse a dump instead of special casing in each helper; - move generic version defines to a separate patch. v2: - rebase. this non-RFC, v3 some would say: - add three more versions in the NFP patches; - add last patch (ethool compat) - Andrew & Michal. RFCv2: - use one driver op; - allow longer serial number; - wrap the skb into an opaque request struct; - add some common identifier into the devlink header. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ethtool: add compat for devlink infoJakub Kicinski
If driver did not fill the fw_version field, try to call into the new devlink get_info op and collect the versions that way. We assume ethtool was always reporting running versions. v4: - use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot). v3 (Jiri): - do a dump and then parse it instead of special handling; - concatenate all versions (well, all that fit :)). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01nfp: devlink: report the running and flashed versionsJakub Kicinski
Report versions of firmware components using the new NSP command. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01nfp: nsp: add support for versions commandJakub Kicinski
Retrieve the FW versions with the new command. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01nfp: devlink: report fixed versionsJakub Kicinski
Report information about the hardware. RFCv2: - add defines for board IDs which are likely to be reusable for other drivers (Jiri). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01nfp: devlink: report driver name and serial numberJakub Kicinski
Report the basic info through new devlink info API. RFCv2: - add driver name; - align serial to core changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add generic info version namesJakub Kicinski
Add defines and docs for generic info versions. v3: - add docs; - separate patch (Jiri). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add version reporting to devlink info APIJakub Kicinski
ethtool -i has a few fixed-size fields which can be used to report firmware version and expansion ROM version. Unfortunately, modern hardware has more firmware components. There is usually some datapath microcode, management controller, PXE drivers, and a CPLD load. Running ethtool -i on modern controllers reveals the fact that vendors cram multiple values into firmware version field. Here are some examples from systems I could lay my hands on quickly: tg3: "FFV20.2.17 bc 5720-v1.39" i40e: "6.01 0x800034a4 1.1747.0" nfp: "0.0.3.5 0.25 sriov-2.1.16 nic" Add a new devlink API to allow retrieving multiple versions, and provide user-readable name for those versions. While at it break down the versions into three categories: - fixed - this is the board/fixed component version, usually vendors report information like the board version in the PCI VPD, but it will benefit from naming and common API as well; - running - this is the running firmware version; - stored - this is firmware in the flash, after firmware update this value will reflect the flashed version, while the running version may only be updated after reboot. v3: - add per-type helpers instead of using the special argument (Jiri). RFCv2: - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now versions are mixed with other info attrs)l - have the driver report versions from the same callback as other info. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01devlink: add device information APIJakub Kicinski
ethtool -i has served us well for a long time, but its showing its limitations more and more. The device information should also be reported per device not per-netdev. Lay foundation for a simple devlink-based way of reading device info. Add driver name and device serial number as initial pieces of information exposed via this new API. v3: - rename helpers (Jiri); - rename driver name attr (Jiri); - remove double spacing in commit message (Jiri). RFC v2: - wrap the skb into an opaque structure (Jiri); - allow the serial number of be any length (Jiri & Andrew); - add driver name (Jonathan). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-01-31 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) disable preemption in sender side of socket filters, from Alexei. 2) fix two potential deadlocks in syscall bpf lookup and prog_register, from Martin and Alexei. 3) fix BTF to allow typedef on func_proto, from Yonghong. 4) two bpftool fixes, from Jiri and Paolo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01Merge branch 'selftests-Various-fixes'David S. Miller
Petr Machata says: ==================== selftests: Various fixes This patch set contains various fixes whose common denominator is improving quality of forwarding and mlxsw selftests. Most of the fixes are improvements in determinism (such that timing and latency don't impact the test performance). These were prompted by regular runs of the test suite on a hardware emulator, the performance of which is necessarily lower than that of the real device. Patches #1 (from Ido), #2 and #3 make changes to ping limits. Patches #4 and #5 add more sleep in places where things need more time to finish. Patches #6 and #7 fix two tests in the suite of mirror-to-gretap tests where underlay involves a VLAN device over an 802.1q bridge. Patches #8, #9 and #10 fix bugs in mirror-to-gretap test where underlay involves a LAG device. Patch #11 fixes a missed RET initialization in mirror-to-gretap flower test. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_flower: Fix test result handlingPetr Machata
The global variable RET needs to be initialized before each call to log_test. This test case sets it once before running the tests, but then calls log_tests for every individual test. Thus a failure in one of the tests causes spurious failures in follow-up tests as well. Fix by moving the initialization of RET from test_all() to full_test_span_gre_dir_acl(), a function that implements the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_bridge_1q_lag: Ignore ARPPetr Machata
This test sets up mirroring such that it mirrors all overlay traffic. That includes ARP, which causes occasional miscounts and spurious failures. Ignore ARP explicitly to avoid these problems. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_bridge_1q_lag: Enable forwardingPetr Machata
This test relies on routing in the primary traffic path, but neglects to enable forwarding. Do so. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_bridge_1q_lag: Flush neighborsPetr Machata
After one LAG slave is downed and another upped, it takes a while for the neighbor on a bridge to time out and get renegotiated. The test does prompt update of FDB entries by arpinging. But because the neighbor still references another address, offloading is not possible, and some packets may end up not being mirrored. To force the neighbor renegotiation, simply flush the neighbor table at the bridge. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix roaming testPetr Machata
ARP or ND traffic can cause spurious migration of FDB back to $swp3. Mirroring is then updated in accordance with the change, and mirrored packets are seen at h3, causing a failure. Detect the case of this spurious roaming, and retry the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix untagged testPetr Machata
The untagged egress test sets up mirroring to {,ip6}gretap such that the underlay goes through a bridge. Then VLAN flags are manipulated to test that the traffic leaves the bridge 802.1q-tagged or not, as appropriate. However, when a neighbor expires at the time that the bridge VLAN is configured as PVID and egress untagged, the following discovery process can't finish, because the IP address on H3 is still at the VLAN-tagged netdevice. This manifests by occasional failures where only several of the 10 required packets get through. Therefore, when reconfiguring the VLAN flags, move the IP address to the appropriate device in the H3 VRF. In addition to that, take this opportunity to embed an ASCII art diagram to make the topology move obvious. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_lib: Wait for tardy mirrored packetsPetr Machata
When running in an environment with poor performance (such as a simulator), processing mirrored packets can take a while. Evaluating the condition too soon leads to spurious "seen 9, expected 10" failures as the last packet doesn't have enough time to get mirrored and the mirror to arrive and bump the observed counters. Wait for one ping interval before evaluating the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_gre_changes: Fix TTL testPetr Machata
When running in a simulator, the TTL change takes a while to settle and during this time the performance of the packet processing is lowered. The resulting instability leads to ping sending more packets as it assumes some have been dropped. This then leads to regular spurious failures as more packets than expected are observed. Sleep a bit to give the system time to stabilize. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: mlxsw: Update ping limitsPetr Machata
The current ping intervals are too short for running mirroring tests in simulator. This leads to ping sending a follow-up ping before the reply arrives, thus sending more than the requested 10 ICMP requests. This traffic is seen at the counters, and causes spurious failures. Bump interval and timeout numbers 5x in mirroring tests to address the spurious failures. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: mirror_lib: Update ping limitsPetr Machata
The current ping intervals are too short for running mirroring tests in simulator. This leads to ping sending a follow-up ping before the reply arrives, thus sending more than the requested 10 ICMP requests. Those are mirrored, and over a certain threshold the test case run is considered a failure, because too much traffic is observed. Bump interval and timeout numbers 5x in mirroring tests to address the spurious failures. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01selftests: forwarding: Make ping timeout configurableIdo Schimmel
The current timeout (2 seconds) proved to be too low for some (emulated) systems where we run the tests. Make the timeout configurable and default to 5 seconds. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ipconfig: add carrier_timeout kernel parameterMartin Kepplinger
commit 3fb72f1e6e61 ("ipconfig wait for carrier") added a "wait for carrier" policy, with a fixed worst case maximum wait of two minutes. Now make the wait for carrier timeout configurable on the kernel commandline and use the 120s as the default. The timeout messages introduced with commit 5e404cd65860 ("ipconfig: add informative timeout messages while waiting for carrier") are done in a fixed interval of 20 seconds, just like they were before (240/12). Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01ipv4: fib: use struct_size() in kzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01nfp: use struct_size() in kzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01tulip: eeprom: use struct_size() in kmalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01cxgb4: smt: use struct_size() in kvzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01cxgb4: sched: use struct_size() in kvzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESSDave Watson
Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01scsi: aic94xx: fix module loadingJames Bottomley
The aic94xx driver is currently failing to load with errors like sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.3/0000:07:02.0/revision' Because the PCI code had recently added a file named 'revision' to every PCI device. Fix this by renaming the aic94xx revision file to aic_revision. This is safe to do for us because as far as I can tell, there's nothing in userspace relying on the current aic94xx revision file so it can be renamed without breaking anything. Fixes: 702ed3be1b1b (PCI: Create revision file in sysfs) Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-01Merge branch 'tls-1.3-support'David S. Miller
Dave Watson says: ==================== net: tls: TLS 1.3 support This patchset adds 256bit keys and TLS1.3 support to the kernel TLS socket. TLS 1.3 is requested by passing TLS_1_3_VERSION in the setsockopt call, which changes the framing as required for TLS1.3. 256bit keys are requested by passing TLS_CIPHER_AES_GCM_256 in the sockopt. This is a fairly straightforward passthrough to the crypto framework. 256bit keys work with both TLS 1.2 and TLS 1.3 TLS 1.3 requires a different AAD layout, necessitating some minor refactoring. It also moves the message type byte to the encrypted portion of the message, instead of the cleartext header as it was in TLS1.2. This requires moving the control message handling to after decryption, but is otherwise similar. V1 -> V2 The first two patches were dropped, and sent separately, one as a bugfix to the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tests for TLS 1.3Dave Watson
Change most tests to TLS 1.3, while adding tests for previous TLS 1.2 behavior. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tls 1.3 supportDave Watson
TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor control message handling on recvDave Watson
For TLS 1.3, the control message is encrypted. Handle control message checks after decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor tls aad space size calculationDave Watson
TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Support 256 bit keysDave Watson
Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01dccp: fool proof ccid_hc_[rt]x_parse_options()Eric Dumazet
Similarly to commit 276bdb82dedb ("dccp: check ccid before dereferencing") it is wise to test for a NULL ccid. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline] RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233 Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000 RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001 RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80 R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0 kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5' DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688 sk_backlog_rcv include/net/sock.h:936 [inline] __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234 NF_HOOK include/linux/netfilter.h:289 [inline] NF_HOOK include/linux/netfilter.h:283 [inline] ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414 NF_HOOK include/linux/netfilter.h:289 [inline] NF_HOOK include/linux/netfilter.h:283 [inline] ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083 process_backlog+0x206/0x750 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x76d/0x1930 net/core/dev.c:6412 __do_softirq+0x30b/0xb11 kernel/softirq.c:292 run_ksoftirqd kernel/softirq.c:654 [inline] run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace 58a0ba03bea2c376 ]--- RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline] RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233 Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000 RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001 RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80 R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01Merge branch 'smc-fixes'David S. Miller
Ursula Braun says: ==================== net/smc: fixes 2019-01-30 here are some fixes in different areas of the smc code for the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: fix use of variable in cleared areaKarsten Graul
Do not use pend->idx as index for the arrays because its value is located in the cleared area. Use the existing local variable instead. Without this fix the wrong area might be cleared. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: use device link provided in qp_contextKarsten Graul
The device field of the IB event structure does not always point to the SMC IB device. Load the pointer from the qp_context which is always provided to smc_ib_qp_event_handler() in the priv field. And for qp events the affected port is given in the qp structure of the ibevent, derive it from there. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: call smc_cdc_msg_send() under send_lockKarsten Graul
Call smc_cdc_msg_send() under the connection send_lock to make sure all send operations for one connection are serialized. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: do not wait under send_lockKarsten Graul
smc_cdc_get_free_slot() might wait for free transfer buffers when using SMC-R. This wait should not be done under the send_lock, which is a spin_lock. This fixes a cpu loop in parallel threads waiting for the send_lock. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: recvmsg and splice_read should return 0 after shutdownKarsten Graul
When a socket was connected and is now shut down for read, return 0 to indicate end of data in recvmsg and splice_read (like TCP) and do not return ENOTCONN. This behavior is required by the socket api. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: don't wait for send buffer space when data was already sentKarsten Graul
When there is no more send buffer space and at least 1 byte was already sent then return to user space. The wait is only done when no data was sent by the sendmsg() call. This fixes smc_tx_sendmsg() which tried to always send all user data and started to wait for free send buffer space when needed. During this wait the user space program was blocked in the sendmsg() call and hence not able to receive incoming data. When both sides were in such a situation then the connection stalled forever. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()Karsten Graul
To prevent races between smc_lgr_terminate() and smc_conn_free() add an extra check of the lgr field before accessing it, and cancel a delayed free_work when a new smc connection is created. This fixes the problem that free_work cleared the lgr variable but smc_lgr_terminate() or smc_conn_free() still access it in parallel. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: allow 16 byte pnetids in netlink policyHans Wippel
Currently, users can only send pnetids with a maximum length of 15 bytes over the SMC netlink interface although the maximum pnetid length is 16 bytes. This patch changes the SMC netlink policy to accept 16 byte pnetids. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: fix another sizeof to int comparisonUrsula Braun
Comparing an int to a size, which is unsigned, causes the int to become unsigned, giving the wrong result. kernel_sendmsg can return a negative error code. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01enic: fix checksum validation for IPv6Govindarajulu Varadarajan
In case of IPv6 pkts, ipv4_csum_ok is 0. Because of this, driver does not set skb->ip_summed. So IPv6 rx checksum is not offloaded. Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01Merge branch 'bpf-xdp-sample-libbpf'Daniel Borkmann
Maciej Fijalkowski says: ==================== This patchset tries to address the situation where: * user loads a particular xdp sample application that does stats polling * user loads another sample application on the same interface * then, user sends SIGINT/SIGTERM to the app that was attached as a first one * second application ends up with an unloaded xdp program 1st patch contains a helper libbpf function for getting the map fd by a given map name. In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which was a blocker for converting this sample to libbpf usage. 3rd patch updates a bunch of xdp samples to make the use of libbpf. Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset. In patch 5 extack messages are added for cases where dev_change_xdp_fd returns with an error so user has an idea what was the reason for not attaching the xdp program onto interface. Patch 6 makes the samples behavior similar to what iproute2 does when loading xdp prog - the "force" flag is introduced. Patch 7 introduces the libbpf function that will query the driver from userspace about the currently attached xdp prog id. Use it in samples that do polling by checking the prog id in signal handler and comparing it with previously stored one which is the scope of patch 8. Thanks! v1->v2: * add a libbpf helper for getting a prog via relative index * include xdp_redirect_cpu into conversion v2->v3: mostly addressing Daniel's/Jesper's comments * get rid of the helper from v1->v2 * feed the xdp_redirect_cpu with program name instead of number v3->v4: * fix help message in xdp_sample_pkts v4->v5: * in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned with 0 * add extack messages in dev_change_xdp_fd * check the return value of bpf_get_link_xdp_id when exiting from sample progs v5->v6: * rebase ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>