summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-24net: stmmac: sun8i: add support for Allwinner H6 EMACIcenowy Zheng
The EMAC on Allwinner H6 is just like the one on A64. The "internal PHY" on H6 is on a co-packaged AC200 chip, and it's not really internal (it's connected via RMII at PA GPIO bank). Add support for the Allwinner H6 EMAC in the dwmac-sun8i driver. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Ondrej Jirman <megous@megous.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-25Merge tag 'mfd-fixes-5.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull mfd bugfix from Lee Jones. Fix stmfx type confusion between regmap_read() (which takes an "u32") and the bitmap operations (which take an "unsigned long" array). * tag 'mfd-fixes-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: stmfx: Fix an endian bug in stmfx_irq_handler() mfd: stmfx: Uninitialized variable in stmfx_irq_handler()
2019-06-24Merge branch 'cached-route-listings'David S. Miller
Stefano Brivio says: ==================== Fix listing (IPv4, IPv6) and flushing (IPv6) of cached route exceptions For IPv6 cached routes, the commands 'ip -6 route list cache' and 'ip -6 route flush cache' don't work at all after route exceptions have been moved to a separate hash table in commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"). For IPv4 cached routes, the command 'ip route list cache' has also stopped working in kernel 3.5 after commit 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") introduced storage for route exceptions as a separate entity. Fix this by allowing userspace to clearly request cached routes with the RTM_F_CLONED flag used as a filter (in conjuction with strict checking) and by retrieving and dumping cached routes if requested. If strict checking is not requested (iproute2 < 5.0.0), we don't have a way to consistently filter results on other selectors (e.g. on tables), so skip filtering entirely and dump both regular routes and exceptions. For IPv4, cache flushing uses a completely different mechanism, so it wasn't affected. Listing of exception routes (modified routes pre-3.5) was tested against these versions of kernel and iproute2: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 3.5-rc4 + + + + + 4.4 4.9 4.14 4.15 4.19 5.0 5.1 fixed + + + + + For IPv6, a separate iproute2 patch is required. Versions of iproute2 and kernel tested: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + + v7: Make sure r->rtm_tos is initialised in 3/11, move loop over nexthop objects in 4/11, add comments about usage of "skip" counters in commit messages of 4/11 and 8/11 v6: Target for net-next, rebase and adapt to nexthop objects for IPv6 paths. Merge selftests into this series (as they were addressed for net-next). A number of minor changes detailed in logs of single patches. v5: Skip filtering altogether if no strict checking is requested: selecting routes or exceptions only would be inconsistent with the fact we can't filter on tables. Drop 1/8 (non-strict dump filter function no longer needed), replace 2/8 (don't use NLM_F_MATCH, decide to skip routes or exceptions in filter function), drop 6/8 (2/8 is enough for IPv6 too). Introduce dump_routes and dump_exceptions flags in filter, adapt other patches to that. v4: Fix the listing issue also for IPv4, making the behaviour consistent with IPv6. Honour NLM_F_MATCH as per RFC 3549 and allow usage of RTM_F_CLONED filter. Split patches into smaller logical changes. v3: Drop check on RTM_F_CLONED and rework logic of return values of rt6_dump_route() v2: Add count of routes handled in partial dumps, and skip them, in patch 1/2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24selftests: pmtu: Make list_flush_ipv6_exception test more demandingStefano Brivio
Instead of just listing and flushing two cached exceptions, create a relatively big number of them, and count how many are listed. Single netlink dump messages contain approximately 25 entries each, and this way we can make sure the partial dump tracking mechanism is working properly. While at it, also ensure that no cached routes can be listed after flush, and remove 'sleep 1' calls, they are not actually needed. v7: No changes v6: - Merge this patch into series including fix, as it's also targeted for net-next. No actual changes Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24selftests: pmtu: Introduce list_flush_ipv4_exception test caseStefano Brivio
This test checks that route exceptions can be successfully listed and flushed using ip -6 route {list,flush} cache. v7: No changes v6: - Merge this patch into series including fix, as it's also targeted for net-next - Drop left-over print of 'ip route list cache | wc -l' Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()Stefano Brivio
When we perform an inexact match on FIB nodes via fib6_locate_1(), longer prefixes will be preferred to shorter ones. However, it might happen that a node, with higher fn_bit value than some other, has no valid routing information. In this case, we'll pick that node, but it will be discarded by the check on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing information but with lower fn_bit value. This is apparent when a routing exception is created for a default route: # ip -6 route list fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium # ip -6 route list cache fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium # ip -6 route flush cache # node for default route is discarded Failed to send flush request: No such process # ip -6 route list cache fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium Check right away if the node has a RTN_RTINFO flag, before replacing the 'prev' pointer, that indicates the longest matching prefix found so far. Fixes: 38fbeeeeccdb ("ipv6: prepare fib6_locate() for exception table") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6: Dump route exceptions if requestedStefano Brivio
Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walking the FIB, so they won't be dumped to userspace on a RTM_GETROUTE message. This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to have no function anymore: # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium # ip -6 route list cache # ip -6 route flush cache # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium because iproute2 lists cached routes using RTM_GETROUTE, and flushes them by listing all the routes, and deleting them with RTM_DELROUTE one by one. If cached routes are requested using the RTM_F_CLONED flag together with strict checking, or if no strict checking is requested (and hence we can't consistently apply filters), look up exceptions in the hash table associated with the current fib6_info in rt6_dump_route(), and, if present and not expired, add them to the dump. We might be unable to dump all the entries for a given node in a single message, so keep track of how many entries were handled for the current node in fib6_walker, and skip that amount in case we start from the same partially dumped node. When a partial dump restarts, as the starting node might change when 'sernum' changes, we have no guarantee that we need to skip the same amount of in-node entries. Therefore, we need two counters, and we need to zero the in-node counter if the node from which the dump is resumed differs. Note that, with the current version of iproute2, this only fixes the 'ip -6 route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is still unable to fetch the routes to be flushed. This will be addressed in a patch for iproute2. To flush cached routes, a procfs entry could be introduced instead: that's how it works for IPv4. We already have a rt6_flush_exception() function ready to be wired to it. However, this would not solve the issue for listing. Versions of iproute2 and kernel tested: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + + v7: - Explain usage of "skip" counters in commit message (suggested by David Ahern) v6: - Rebase onto net-next, use recently introduced nexthop walker - Make rt6_nh_dump_exceptions() a separate function (suggested by David Ahern) v5: - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH, update test results (flushing works with iproute2 < 5.0.0 now) v4: - Split NLM_F_MATCH and strict check handling in separate patches - Filter routes using RTM_F_CLONED: if it's not set, only return non-cached routes, and if it's set, only return cached routes: change requested by David Ahern and Martin Lau. This implies that iproute2 needs a separate patch to be able to flush IPv6 cached routes. This is not ideal because we can't fix the breakage caused by 2b760fcf5cfb entirely in kernel. However, two years have passed since then, and this makes it more tolerable v3: - More descriptive comment about expired exceptions in rt6_dump_route() - Swap return values of rt6_dump_route() (suggested by Martin Lau) - Don't zero skip_in_node in case we don't dump anything in a given pass (also suggested by Martin Lau) - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic, it's just a flag to indicate the route was cloned, not to filter on routes v2: Add tracking of number of entries to be skipped in current node after a partial dump. As we restart from the same node, if not all the exceptions for a given node fit in a single message, the dump will not terminate, as suggested by Martin Lau. This is a concrete possibility, setting up a big number of exceptions for the same route actually causes the issue, suggested by David Ahern. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Change return code of rt6_dump_route() for partial node dumpsStefano Brivio
In the next patch, we are going to add optional dump of exceptions to rt6_dump_route(). Change the return code of rt6_dump_route() to accomodate partial node dumps: we might dump multiple routes per node, and might be able to dump only a given number of them, so fib6_dump_node() will need to know how many routes have been dumped on partial dump, to restart the dump from the point where it was interrupted. Note that fib6_dump_node() is the only caller and already handles all non-negative return codes as success: those become -1 to signal that we're done with the node. If we fail, return 0, as we were unable to dump the single route in the node, but we're not done with it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()Stefano Brivio
If fc_nh_id isn't set, we shouldn't try to match against it. This actually matters just for the RTF_CACHE below (where this case is already handled): if iproute2 gets a route exception and tries to delete it, it won't reference it by fc_nh_id, even if a nexthop object might be associated to the originating route. Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24Revert "net/ipv6: Bail early if user only wants cloned entries"Stefano Brivio
This reverts commit 08e814c9e8eb5a982cbd1e8f6bd255d97c51026f: as we are preparing to fix listing and dumping of IPv6 cached routes, we need to allow RTM_F_CLONED as a flag to match routes against while dumping them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4: Dump route exceptions if requestedStefano Brivio
Since commit 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), cached exception routes are stored as a separate entity, so they are not dumped on a FIB dump, even if the RTM_F_CLONED flag is passed. This implies that the command 'ip route list cache' doesn't return any result anymore. If the RTM_F_CLONED is passed, and strict checking requested, retrieve nexthop exception routes and dump them. If no strict checking is requested, filtering can't be performed consistently: dump everything in that case. With this, we need to add an argument to the netlink callback in order to track how many entries were already dumped for the last leaf included in a partial netlink dump. A single additional argument is sufficient, even if we traverse logically nested structures (nexthop objects, hash table buckets, bucket chains): it doesn't matter if we stop in the middle of any of those, because they are always traversed the same way. As an example, s_i values in [], s_fa values in (): node (fa) #1 [1] nexthop #1 bucket #1 -> #0 in chain (1) bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4) bucket #3 -> #0 in chain (5) -> #1 in chain (6) nexthop #2 bucket #1 -> #0 in chain (7) -> #1 in chain (8) bucket #2 -> #0 in chain (9) -- node (fa) #2 [2] nexthop #1 bucket #1 -> #0 in chain (1) -> #1 in chain (2) bucket #2 -> #0 in chain (3) it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2) for "node #2": walking flattens all that. It would even be possible to drop the distinction between the in-tree (s_i) and in-node (s_fa) counter, but a further improvement might advise against this. This is only as accurate as the existing tracking mechanism for leaves: if a partial dump is restarted after exceptions are removed or expired, we might skip some non-dumped entries. To improve this, we could attach a 'sernum' attribute (similar to the one used for IPv6) to nexthop entities, and bump this counter whenever exceptions change: having a distinction between the two counters would make this more convenient. Listing of exception routes (modified routes pre-3.5) was tested against these versions of kernel and iproute2: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 3.5-rc4 + + + + + 4.4 4.9 4.14 4.15 4.19 5.0 5.1 fixed + + + + + v7: - Move loop over nexthop objects to route.c, and pass struct fib_info and table ID to it, not a struct fib_alias (suggested by David Ahern) - While at it, note that the NULL check on fa->fa_info is redundant, and the check on RTNH_F_DEAD is also not consistent with what's done with regular route listing: just keep it for nhc_flags - Rename entry point function for dumping exceptions to fib_dump_info_fnhe(), and rearrange arguments for consistency with fib_dump_info() - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle one bucket at a time - Expand commit message to describe why we can have a single "skip" counter for all exceptions stored in bucket chains in nexthop objects (suggested by David Ahern) v6: - Rebased onto net-next - Loop over nexthop paths too. Move loop over fnhe buckets to route.c, avoids need to export rt_fill_info() and to touch exceptions from fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that (suggested by David Ahern) Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4/route: Allow NULL flowinfo in rt_fill_info()Stefano Brivio
In the next patch, we're going to use rt_fill_info() to dump exception routes upon RTM_GETROUTE with NLM_F_ROOT, meaning userspace is requesting a dump and not a specific route selection, which in turn implies the input interface is not relevant. Update rt_fill_info() to handle a NULL flowinfo. v7: If fl4 is NULL, explicitly set r->rtm_tos to 0: it's not initialised otherwise (spotted by David Ahern) v6: New patch Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4/fib_frontend: Allow RTM_F_CLONED flag to be used for filteringStefano Brivio
This functionally reverts the check introduced by commit e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") as modified by commit e4e92fb160d7 ("net/ipv4: Bail early if user only wants prefix entries"). As we are preparing to fix listing of IPv4 cached routes, we need to give userspace a way to request them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONEDStefano Brivio
The following patches add back the ability to dump IPv4 and IPv6 exception routes, and we need to allow selection of regular routes or exceptions. Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions: iproute2 passes it in dump requests (except for IPv6 cache flush requests, this will be fixed in iproute2) and this used to work as long as exceptions were stored directly in the FIB, for both IPv4 and IPv6. Caveat: if strict checking is not requested (that is, if the dump request doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol, tables or route types. In this case, filtering on RTM_F_CLONED would be inconsistent: we would fix 'ip route list cache' by returning exception routes and at the same time introduce another bug in case another selector is present, e.g. on 'ip route list cache table main' we would return all exception routes, without filtering on tables. Keep this consistent by applying no filters at all, and dumping both routes and exceptions, if strict checking is not requested. iproute2 currently filters results anyway, and no unwanted results will be presented to the user. The kernel will just dump more data than needed. v7: No changes v6: Rebase onto net-next, no changes v5: New patch: add dump_routes and dump_exceptions flags in filter and simply clear the unwanted one if strict checking is enabled, don't ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set. Skip filtering altogether if no strict checking is requested: selecting routes or exceptions only would be inconsistent with the fact we can't filter on tables. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24qmi_wwan: Fix out-of-bounds readBjørn Mork
The syzbot reported Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x67/0x231 mm/kasan/report.c:188 __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317 kasan_report+0xe/0x20 mm/kasan/common.c:614 qmi_wwan_probe+0x342/0x360 drivers/net/usb/qmi_wwan.c:1417 usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361 really_probe+0x281/0x660 drivers/base/dd.c:509 driver_probe_device+0x104/0x210 drivers/base/dd.c:670 __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777 bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454 Caused by too many confusing indirections and casts. id->driver_info is a pointer stored in a long. We want the pointer here, not the address of it. Thanks-to: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+b68605d7fadd21510de1@syzkaller.appspotmail.com Cc: Kristian Evensen <kristian.evensen@gmail.com> Fixes: e4bf63482c30 ("qmi_wwan: Add quirk for Quectel dynamic config") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24tipc: check msg->req data len in tipc_nl_compat_bearer_disableXin Long
This patch is to fix an uninit-value issue, reported by syzbot: BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:981 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310 memchr+0xce/0x110 lib/string.c:981 string_is_valid net/tipc/netlink_compat.c:176 [inline] tipc_nl_compat_bearer_disable+0x2a1/0x480 net/tipc/netlink_compat.c:449 __tipc_nl_compat_doit net/tipc/netlink_compat.c:327 [inline] tipc_nl_compat_doit+0x3ac/0xb00 net/tipc/netlink_compat.c:360 tipc_nl_compat_handle net/tipc/netlink_compat.c:1178 [inline] tipc_nl_compat_recv+0x1b1b/0x27b0 net/tipc/netlink_compat.c:1281 TLV_GET_DATA_LEN() may return a negtive int value, which will be used as size_t (becoming a big unsigned long) passed into memchr, cause this issue. Similar to what it does in tipc_nl_compat_bearer_enable(), this fix is to return -EINVAL when TLV_GET_DATA_LEN() is negtive in tipc_nl_compat_bearer_disable(), as well as in tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). v1->v2: - add the missing Fixes tags per Eric's request. Fixes: 0762216c0ad2 ("tipc: fix uninit-value in tipc_nl_compat_bearer_enable") Fixes: 8b66fee7f8ee ("tipc: fix uninit-value in tipc_nl_compat_link_reset_stats") Reported-by: syzbot+30eaa8bf392f7fafffaf@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: macb: use GROAntoine Tenart
This patch updates the macb driver to use NAPI GRO helpers when receiving SKBs. This improves performances. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: macb: use NAPI_POLL_WEIGHTAntoine Tenart
Use NAPI_POLL_WEIGHT, the default NAPI poll() weight instead of redefining our own value (which turns out to be 64 as well). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24Merge branch 'ipv4-fix-bugs-when-enable-route_localnet'David S. Miller
Shijie Luo says: ==================== ipv4: fix bugs when enable route_localnet When enable route_localnet, route of the 127/8 address is enabled. But in some situations like arp_announce=2, ARP requests or reply work abnormally. This patchset fix some bugs when enable route_localnet. Change History: V2: - Change a single patch to a patchset. - Add bug fix for arp_ignore = 3. - Add a couple of test for enabling route_localnet in selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24selftests: add route_localnet test scriptShijie Luo
Add a simple scripts to exercise several situations when enable route_localnet. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Zhiqiang liu <liuzhiqiang26@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4: fix confirm_addr_indev() when enable route_localnetShijie Luo
When arp_ignore=3, the NIC won't reply for scope host addresses, but if enable route_locanet, we need to reply ip address with head 127 and scope RT_SCOPE_HOST. Fixes: d0daebc3d622 ("ipv4: Add interface option to enable routing of 127.0.0.0/8") Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4: fix inet_select_addr() when enable route_localnetShijie Luo
Suppose we have two interfaces eth0 and eth1 in two hosts, follow the same steps in the two hosts: # sysctl -w net.ipv4.conf.eth1.route_localnet=1 # sysctl -w net.ipv4.conf.eth1.arp_announce=2 # ip route del 127.0.0.0/8 dev lo table local and then set ip to eth1 in host1 like: # ifconfig eth1 127.25.3.4/24 set ip to eth2 in host2 and ping host1: # ifconfig eth1 127.25.3.14/24 # ping -I eth1 127.25.3.4 Well, host2 cannot connect to host1. When set a ip address with head 127, the scope of the address defaults to RT_SCOPE_HOST. In this situation, host2 will use arp_solicit() to send a arp request for the mac address of host1 with ip address 127.25.3.14. When arp_announce=2, inet_select_addr() cannot select a correct saddr with condition ifa->ifa_scope > scope, because ifa_scope is RT_SCOPE_HOST and scope is RT_SCOPE_LINK. Then, inet_select_addr() will go to no_in_dev to lookup all interfaces to find a primary ip and finally get the primary ip of eth0. Here I add a localnet_scope defaults to RT_SCOPE_HOST, and when route_localnet is enabled, this value changes to RT_SCOPE_LINK to make inet_select_addr() find a correct primary ip as saddr of arp request. Fixes: d0daebc3d622 ("ipv4: Add interface option to enable routing of 127.0.0.0/8") Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: macb: do not copy the mac address if NULLAntoine Tenart
This patch fixes the MAC address setup in the probe. The MAC address retrieved using of_get_mac_address was checked for not containing an error, but it may also be NULL which wasn't tested. Fix it by replacing IS_ERR with IS_ERR_OR_NULL. Fixes: 541ddc66d665 ("net: macb: support of_get_mac_address new ERR_PTR error") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24tipc: remove the unnecessary msg->req check from tipc_nl_compat_bearer_setXin Long
tipc_nl_compat_bearer_set() is only called by tipc_nl_compat_link_set() which already does the check for msg->req check, so remove it from tipc_nl_compat_bearer_set(), and do the same in tipc_nl_compat_media_set(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24Merge branch 'mlxsw-Thermal-and-hwmon-extensions'David S. Miller
Ido Schimmel says: ==================== mlxsw: Thermal and hwmon extensions This patchset from Vadim includes various enhancements to thermal and hwmon code in mlxsw. Patch #1 adds a thermal zone for each inter-connect device (gearbox). These devices are present in SN3800 systems and code to expose their temperature via hwmon was added in commit 2e265a8b6c09 ("mlxsw: core: Extend hwmon interface with inter-connect temperature attributes"). Currently, there are multiple thermal zones in mlxsw and only a few cooling devices. Patch #2 detects the hottest thermal zone and the cooling devices are switched to follow its trends. RFC was sent last month [1]. Patch #3 allows to read and report negative temperature of the sensors mlxsw exposes via hwmon and thermal subsystems. v2 (Andrew Lunn): * In patch #3, replace '%u' with '%d' in mlxsw_hwmon_module_temp_show() [1] https://patchwork.ozlabs.org/patch/1107161/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24mlxsw: core: Add support for negative temperature readoutVadim Pasternak
Extend macros MLXSW_REG_MTMP_TEMP_TO_MC() to allow support of negative temperature readout, since chip and others thermal components are capable of operating within the negative temperature. With no such support negative temperature will be consider as very high temperature and it will cause wrong readout and thermal shutdown. For negative values 2`s complement is used. Tested in chamber. Example of chip ambient temperature readout with chamber temperature: -10 Celsius: temp1: -6.0C (highest = -5.0C) -5 Celsius: temp1: -1.0C (highest = -1.0C) v2 (Andrew Lunn): * Replace '%u' with '%d' in mlxsw_hwmon_module_temp_show() Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24mlxsw: core: Add the hottest thermal zone detectionVadim Pasternak
When multiple sensors are mapped to the same cooling device, the cooling device should be set according the worst sensor from the sensors associated with this cooling device. Provide the hottest thermal zone detection and enforce cooling device to follow the temperature trends of the hottest zone only. Prevent competition for the cooling device control from others zones, by "stable trend" indication. A cooling device will not perform any actions associated with a zone with a "stable trend". When other thermal zone is detected as a hottest, a cooling device is to be switched to following temperature trends of new hottest zone. Thermal zone score is represented by 32 bits unsigned integer and calculated according to the next formula: For T < TZ<t><i>, where t from {normal trip = 0, high trip = 1, hot trip = 2, critical = 3}: TZ<i> score = (T + (TZ<t><i> - T) / 2) / (TZ<t><i> - T) * 256 ** j; Highest thermal zone score s is set as MAX(TZ<i>score); Following this formula, if TZ<i> is in trip point higher than TZ<k>, the higher score is to be always assigned to TZ<i>. For two thermal zones located at the same kind of trip point, the higher score will be assigned to the zone which is closer to the next trip point. Thus, the highest score will always be assigned objectively to the hottest thermal zone. All the thermal zones initially are to be configured with mode "enabled" with the "step_wise" governor. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24mlxsw: core: Extend thermal core with per inter-connect device thermal zonesVadim Pasternak
Add a dedicated thermal zone for each inter-connect device. The current temperature is obtained from inter-connect temperature sensor and the default trip points are set to the same values as default ASIC trip points. These settings could be changed from the user space. A cooling device (fan) is bound to all inter-connect devices. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net/packet: fix memory leak in packet_set_ring()Eric Dumazet
syzbot found we can leak memory in packet_set_ring(), if user application provides buggy parameters. Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24tipc: fix missing indentation in source codejohn.rutherford@dektech.com.au
Fix misalignment of policy statement in netlink.c due to automatic spatch code transformation. Fixes: 3b0f31f2b8c9 ("genetlink: make policy common to family") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: ethernet: ti: cpsw: Fix suspend/resume breakKeerthy
Commit bfe59032bd6127ee190edb30be9381a01765b958 ("net: ethernet: ti: cpsw: use cpsw as drv data")changes the driver data to struct cpsw_common *cpsw. This is done only in probe/remove but the suspend/resume functions are still left with struct net_device *ndev. Hence fix both suspend & resume also to fetch the updated driver data. Fixes: bfe59032bd6127ee1 ("net: ethernet: ti: cpsw: use cpsw as drv data") Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net/tls: fix page double free on TX cleanupDirk van der Merwe
With commit 94850257cf0f ("tls: Fix tls_device handling of partial records") a new path was introduced to cleanup partial records during sk_proto_close. This path does not handle the SW KTLS tx_list cleanup. This is unnecessary though since the free_resources calls for both SW and offload paths will cleanup a partial record. The visible effect is the following warning, but this bug also causes a page double free. WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 RIP: 0010:sk_stream_kill_queues+0x103/0x110 RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007 RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270 RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007 R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0 Call Trace: inet_csk_destroy_sock+0x55/0x100 tcp_close+0x25d/0x400 ? tcp_check_oom+0x120/0x120 tls_sk_proto_close+0x127/0x1c0 inet_release+0x3c/0x60 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0xd8/0x210 task_work_run+0x84/0xa0 do_exit+0x2dc/0xb90 ? release_sock+0x43/0x90 do_group_exit+0x3a/0xa0 get_signal+0x295/0x720 do_signal+0x36/0x610 ? SYSC_recvfrom+0x11d/0x130 exit_to_usermode_loop+0x69/0xb0 do_syscall_64+0x173/0x180 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fe9b9abc10d RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430 RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080 R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000 Fixes: 94850257cf0f ("tls: Fix tls_device handling of partial records") Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24mfd: stmfx: Fix an endian bug in stmfx_irq_handler()Dan Carpenter
It's not okay to cast a "u32 *" to "unsigned long *" when you are doing a for_each_set_bit() loop because that will break on big endian systems. Fixes: 386145601b82 ("mfd: stmfx: Uninitialized variable in stmfx_irq_handler()") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Amelie Delaunay <amelie.delaunay@st.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2019-06-24hinic: implement the statistical interface of ethtoolXue Chaojing
This patch implement the statistical interface of ethtool, user can use ethtool -S to show hinic statistics. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24samples/bpf: xdp_redirect, correctly get dummy program idPrashant Bhole
When we terminate xdp_redirect, it ends up with following message: "Program on iface OUT changed, not removing" This results in dummy prog still attached to OUT interface. It is because signal handler checks if the programs are the same that we had attached. But while fetching dummy_prog_id, current code uses prog_fd instead of dummy_prog_fd. This patch passes the correct fd. Fixes: 3b7a8ec2dec3 ("samples/bpf: Check the prog id before exiting") Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-24samples: make pidfd-metadata fail gracefully on older kernelsDmitry V. Levin
Initialize pidfd to an invalid descriptor, to fail gracefully on those kernels that do not implement CLONE_PIDFD and leave pidfd unchanged. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-24bpf: fix NULL deref in btf_type_is_resolve_source_onlyStanislav Fomichev
Commit 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") added invocations of btf_type_is_resolve_source_only before btf_type_nosize_or_null which checks for the NULL pointer. Swap the order of btf_type_nosize_or_null and btf_type_is_resolve_source_only to make sure the do the NULL pointer check first. Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-24fork: don't check parent_tidptr with CLONE_PIDFDDmitry V. Levin
Give userspace a cheap and reliable way to tell whether CLONE_PIDFD is supported by the kernel or not. The easiest way is to pass an invalid file descriptor value in parent_tidptr, perform the syscall and verify that parent_tidptr has been changed to a valid file descriptor value. CLONE_PIDFD uses parent_tidptr to return pidfds. CLONE_PARENT_SETTID will use parent_tidptr to return the tid of the parent. The two flags cannot be used together. Old kernels that only support CLONE_PARENT_SETTID will not verify the value pointed to by parent_tidptr. This behavior is unchanged even with the introduction of CLONE_PIDFD. However, if CLONE_PIDFD is specified the kernel will currently check the value pointed to by parent_tidptr before placing the pidfd in the memory pointed to. EINVAL will be returned if the value in parent_tidptr is not 0. If CLONE_PIDFD is supported and fd 0 is closed, then the returned pidfd can and likely will be 0 and parent_tidptr will be unchanged. This means userspace must either check CLONE_PIDFD support beforehand or check that fd 0 is not closed when invoking CLONE_PIDFD. The check for pidfd == 0 was introduced during the v5.2 merge window by commit b3e583825266 ("clone: add CLONE_PIDFD") to ensure that CLONE_PIDFD could be potentially extended by passing in flags through the return argument. However, that extension would look horrible, and with the upcoming introduction of the clone3 syscall in v5.3 there is no need to extend legacy clone syscall this way. (Even if it would need to be extended, CLONE_DETACHED can be reused with CLONE_PIDFD.) So remove the pidfd == 0 check. Userspace that needs to be portable to kernels without CLONE_PIDFD support can then be advised to initialize pidfd to -1 and check the pidfd value returned by CLONE_PIDFD. Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-24Merge tag 'mtd/fixes-for-5.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: - Set the raw NAND number of targets to the right value - Fix a bug uncovered by a recent patch on Spansion SPI-NOR flashes * tag 'mtd/fixes-for-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: spi-nor: use 16-bit WRR command when QE is set on spansion flashes mtd: rawnand: initialize ntargets with maxchips
2019-06-24iwlwifi: add support for hr1 RF IDOren Givon
The 22000 series FW that was meant to be used with hr is also the FW that is used for hr1 and has a different RF ID. Add support to load the hr FW when hr1 RF ID is detected. Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Oren Givon <oren.givon@intel.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24mwifiex: Don't abort on small, spec-compliant vendor IEsBrian Norris
Per the 802.11 specification, vendor IEs are (at minimum) only required to contain an OUI. A type field is also included in ieee80211.h (struct ieee80211_vendor_ie) but doesn't appear in the specification. The remaining fields (subtype, version) are a convention used in WMM headers. Thus, we should not reject vendor-specific IEs that have only the minimum length (3 bytes) -- we should skip over them (since we only want to match longer IEs, that match either WMM or WPA formats). We can reject elements that don't have the minimum-required 3 byte OUI. While we're at it, move the non-standard subtype and version fields into the WMM structs, to avoid this confusion in the future about generic "vendor header" attributes. Fixes: 685c9b7750bf ("mwifiex: Abort at too short BSS descriptor element") Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24wl18xx: Fix Wunused-const-variableNathan Huckleberry
Clang produces the following warning drivers/net/wireless/ti/wl18xx/main.c:1850:43: warning: unused variable 'wl18xx_iface_ap_cl_limits' [-Wunused-const-variable] static const struct ieee80211_iface_limit wl18xx_iface_ap_cl_limits[] = { ^ drivers/net/wireless/ti/wl18xx/main.c:1869:43: warning: unused variable 'wl18xx_iface_ap_go_limits' [-Wunused-const-variable] static const struct ieee80211_iface_limit wl18xx_iface_ap_go_limits[] = { ^ The commit that added these variables never used them. Removing them. Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/530 Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24iwlwifi: change 0x02F0 fw from qu to quzIhab Zhaika
change the fw of 0x02F0 platform from qu to quz Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24Merge tag 'powerpc-5.2-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a bug in our context id handling on 64-bit hash CPUs, which can lead to unrelated processes being able to read/write to each other's virtual memory. See the commit for full details. That is the fix for CVE-2019-12817. This also adds a kernel selftest for the bug" * tag 'powerpc-5.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Add test of fork with mapping above 512TB powerpc/mm/64s/hash: Reallocate context ids on fork
2019-06-24iwlwifi: add new cards for 22000 and change wrong structsIhab Zhaika
add few PCI ID'S for 22000 and chainge few cards structs names Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24iwlwifi: add new cards for 22000 and fix struct nameIhab Zhaika
add few PCI ID'S for 22000 and fix the wrong name for one of the structs Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-06-24ARM: dts: imx6ul: fix PWM[1-4] interruptsSébastien Szymanski
According to the i.MX6UL/L RM, table 3.1 "ARM Cortex A7 domain interrupt summary", the interrupts for the PWM[1-4] go from 83 to 86. Fixes: b9901fe84f02 ("ARM: dts: imx6ul: add pwm[1-4] nodes") Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-06-24Merge tag 'auxdisplay-for-linus-v5.2-rc7' of git://github.com/ojeda/linuxLinus Torvalds
Pull auxdisplay cleanup from Miguel Ojeda: "A cleanup for two drivers in auxdisplay: convert them to use vm_map_pages_zero() (Souptick Joarder)" * tag 'auxdisplay-for-linus-v5.2-rc7' of git://github.com/ojeda/linux: auxdisplay/ht16k33.c: Convert to use vm_map_pages_zero() auxdisplay/cfag12864bfb.c: Convert to use vm_map_pages_zero()
2019-06-23Merge branch 'ipv6-avoid-taking-refcnt-on-dst-during-route-lookup'David S. Miller
Wei Wang says: ==================== ipv6: avoid taking refcnt on dst during route lookup Ipv6 route lookup code always grabs refcnt on the dst for the caller. But for certain cases, grabbing refcnt is not always necessary if the call path is rcu protected and the caller does not cache the dst. Another issue in the route lookup logic is: When there are multiple custom rules, we have to do the lookup into each table associated to each rule individually. And when we can't find the route in one table, we grab and release refcnt on net->ipv6.ip6_null_entry before going to the next table. This operation is completely redundant, and causes false issue because net->ipv6.ip6_null_entry is a shared object. This patch set introduces a new flag RT6_LOOKUP_F_DST_NOREF for route lookup callers to set, to avoid any manipulation on the dst refcnt. And it converts the major input and output path to use it. The performance gain is noticable. I ran synflood tests between 2 hosts under the same switch. Both hosts have 20G mlx NIC, and 8 tx/rx queues. Sender sends pure SYN flood with random src IPs and ports using trafgen. Receiver has a simple TCP listener on the target port. Both hosts have multiple custom rules: - For incoming packets, only local table is traversed. - For outgoing packets, 3 tables are traversed to find the route. The packet processing rate on the receiver is as follows: - Before the fix: 3.78Mpps - After the fix: 5.50Mpps v2->v3: - Handled fib6_rule_lookup() when CONFIG_IPV6_MULTIPLE_TABLES is not configured in patch 03 (suggested by David Ahern) - Removed the renaming of l3mdev_link_scope_lookup() in patch 05 (suggested by David Ahern) - Moved definition of ip6_route_output_flags() from an inline function in /net/ipv6/route.c to net/ipv6/route.c in order to address kbuild error in patch 05 v1->v2: - Added a helper ip6_rt_put_flags() in patch 3 suggested by David Miller ==================== Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-23ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREFWei Wang
For tx path, in most cases, we still have to take refcnt on the dst cause the caller is caching the dst somewhere. But it still is beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the route lookup. It is cause this flag prevents manipulating refcnt on net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each routing table. The null_entry is a shared object and constant updates on it cause false sharing. We converted the current major lookup function ip6_route_output_flags() to make use of RT6_LOOKUP_F_DST_NOREF. Together with the change in the rx path, we see noticable performance boost: I ran synflood tests between 2 hosts under the same switch. Both hosts have 20G mlx NIC, and 8 tx/rx queues. Sender sends pure SYN flood with random src IPs and ports using trafgen. Receiver has a simple TCP listener on the target port. Both hosts have multiple custom rules: - For incoming packets, only local table is traversed. - For outgoing packets, 3 tables are traversed to find the route. The packet processing rate on the receiver is as follows: - Before the fix: 3.78Mpps - After the fix: 5.50Mpps Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>