summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-01-20bpf: in __bpf_redirect_no_mac pull mac only if presentWillem de Bruijn
Syzkaller was able to construct a packet of negative length by redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline] BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline] BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395 Read of size 4294967282 at addr ffff8801d798009c by task syz-executor2/12942 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x23/0x50 mm/kasan/kasan.c:302 memcpy include/linux/string.h:345 [inline] skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline] __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395 __pskb_copy include/linux/skbuff.h:1053 [inline] pskb_copy include/linux/skbuff.h:2904 [inline] skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539 ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline] sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029 __netdev_start_xmit include/linux/netdevice.h:4325 [inline] netdev_start_xmit include/linux/netdevice.h:4334 [inline] xmit_one net/core/dev.c:3219 [inline] dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235 __dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838 __bpf_tx_skb net/core/filter.c:2016 [inline] __bpf_redirect_common net/core/filter.c:2054 [inline] __bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061 ____bpf_clone_redirect net/core/filter.c:2094 [inline] bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066 bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000 The generated test constructs a packet with mac header, network header, skb->data pointing to network header and skb->len 0. Redirecting to a sit0 through __bpf_redirect_no_mac pulls the mac length, even though skb->data already is at skb->network_header. bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2. Update the offset calculation to pull only if skb->data differs from skb->network_header, which is not true in this case. The test itself can be run only from commit 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command"), but the same type of packets with skb at network header could already be built from lwt xmit hooks, so this fix is more relevant to that commit. Also set the mac header on redirect from LWT_XMIT, as even after this change to __bpf_redirect_no_mac that field is expected to be set, but is not yet in ip_finish_output2. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-19net_sched: add performance counters for basic filterCong Wang
Similar to u32 filter, it is useful to know how many times we reach each basic filter and how many times we pass the ematch attached to it. Sample output: filter protocol arp pref 49152 basic chain 0 filter protocol arp pref 49152 basic chain 0 handle 0x1 (rule hit 3 success 3) action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 installed 81 sec used 4 sec Action statistics: Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-20Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "These are mostly fixes for SUNRPC bugs, with a single v4.2 copy_file_range() fix mixed in. Stable bugfixes: - Fix TCP receive code on archs with flush_dcache_page() Other bugfixes: - Fix error code in rpcrdma_buffer_create() - Fix a double free in rpcrdma_send_ctxs_create() - Fix kernel BUG at kernel/cred.c:825 - Fix unnecessary retry in nfs42_proc_copy_file_range() - Ensure rq_bytes_sent is reset before request transmission - Ensure we respect the RPCSEC_GSS sequence number limit - Address Kerberos performance/behavior regression" * tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Address Kerberos performance/behavior regression SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit SUNRPC: Ensure rq_bytes_sent is reset before request transmission NFSv4.2 fix unnecessary retry in nfs4_copy_file_range sunrpc: kernel BUG at kernel/cred.c:825! SUNRPC: Fix TCP receive code on archs with flush_dcache_page() xprtrdma: Double free in rpcrdma_sendctxs_create() xprtrdma: Fix error code in rpcrdma_buffer_create()
2019-01-19net: sock: do not set sk_cookie in sk_clone_lock()Yafang Shao
The only call site of sk_clone_lock is in inet_csk_clone_lock, and sk_cookie will be set there. So we don't need to set sk_cookie in sk_clone_lock(). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: mpls: netconf: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETNETCONF's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: mpls: route: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETROUTE's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: route: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETROUTE's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: addrlabel: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETADDRLABEL's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: netconf: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETNETCONF's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: addr: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETADDR's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv4: ipmr: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETROUTE's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. v2: - improve extack messages (DaveA). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv4: route: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETROUTE's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. v2: - new patch (DaveA). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv4: netconf: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETNETCONF's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: namespace: perform strict checks also for doit handlersJakub Kicinski
Make RTM_GETNSID's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. v2: - don't check size >= sizeof(struct rtgenmsg) (Nicolas). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19rtnetlink: ifinfo: perform strict checks also for doit handlerJakub Kicinski
Make RTM_GETLINK's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19rtnetlink: stats: reject requests for unknown statsJakub Kicinski
In the spirit of strict checks reject requests of stats the kernel does not support when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19rtnetlink: stats: validate attributes in get as well as dumpsJakub Kicinski
Make sure NETLINK_GET_STRICT_CHK influences both GETSTATS doit as well as the dump. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: netlink: add helper to retrieve NETLINK_F_STRICT_CHKJakub Kicinski
Dumps can read state of the NETLINK_F_STRICT_CHK flag from a field in the callback structure. For non-dump GET requests we need a way to access the state of that flag from a socket. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19sch_api: Change signature of qdisc_tree_reduce_backlog() to use intsToke Høiland-Jørgensen
There are now several places where qdisc_tree_reduce_backlog() is called with a negative number of packets (to signal an increase in number of packets in the queue). Rather than rely on overflow behaviour, change the function signature to use signed integers to communicate this usage to people reading the code. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19mac80211: minstrel_ht: add flag to indicate missing/inaccurate tx A-MPDU lengthFelix Fietkau
Some hardware (e.g. MediaTek MT7603) cannot report A-MPDU length in tx status information. Add support for a flag to indicate that, to allow minstrel_ht to use a fixed value in its internal calculation (which gives better results than just defaulting to 1). Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: mesh: only switch path when new metric is at least 10% betterJulan Hsu
This helps to reduce frequent path switches when multiple path candidates have the same or very similar path metrics. Signed-off-by: Julan Hsu <julanhsu@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: mesh: use average bitrate for link metric calculationJulan Hsu
Use bitrate moving average to smooth out link metric and stablize path selection. Signed-off-by: Julan Hsu <julanhsu@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19nl80211/mac80211: mesh: add mesh path change count to mpath infoJulan Hsu
Expose path change count to destination in mpath info Signed-off-by: Julan Hsu <julanhsu@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19nl80211/mac80211: mesh: add hop count to mpath infoJulan Hsu
Expose hop count to destination information in mpath info Signed-off-by: Julan Hsu <julanhsu@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: allow overriding HT STBC capabilitiesSergey Matyukevich
Allow user to override STBC configuration for Rx and Tx spatial streams. In practice RX/TX STBC settings can be modified using appropriate options in wpa_supplicant configuration file: tx_stbc=-1..1 rx_stbc=-1..3 This functionality has been added to wpa_supplicant in commit cdeea70f59d0. In FullMAC case these STBC options are passed to drivers by cfg80211 connect callback in fields of cfg80211_connect_params structure. However for mac80211 drivers, e.g. for mac80211_hwsim, overrides for STBC settings are ignored. The reason why RX/TX STBC capabilities are not modified for mac80211 drivers is as follows. All drivers need to specify supported HT/VHT overrides explicitly: see ht_capa_mod_mask and vht_capa_mod_mask fields of wiphy structure. Only supported overrides will be passed to drivers by cfg80211_connect and cfg80211_mlme_assoc operations: see bitwise 'AND' performed by cfg80211_oper_and_ht_capa and cfg80211_oper_and_vht_capa. This commit adds RX/TX STBC HT capabilities to mac80211_ht_capa_mod_mask, allowing their modifications, as well as applies requested STBC modifications in function ieee80211_apply_htcap_overrides. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: Add airtime accounting and scheduling to TXQsToke Høiland-Jørgensen
This adds airtime accounting and scheduling to the mac80211 TXQ scheduler. A new callback, ieee80211_sta_register_airtime(), is added that drivers can call to report airtime usage for stations. When airtime information is present, mac80211 will schedule TXQs (through ieee80211_next_txq()) in a way that enforces airtime fairness between active stations. This scheduling works the same way as the ath9k in-driver airtime fairness scheduling. If no airtime usage is reported by the driver, the scheduler will default to round-robin scheduling. For drivers that don't control TXQ scheduling in software, a new API function, ieee80211_txq_may_transmit(), is added which the driver can use to check if the TXQ is eligible for transmission, or should be throttled to enforce fairness. Calls to this function must also be enclosed in ieee80211_txq_schedule_{start,end}() calls to ensure proper locking. The API ieee80211_txq_may_transmit() also ensures that TXQ list will be aligned aginst driver's own round-robin scheduler list. i.e it rotates the TXQ list till it makes the requested node becomes the first entry in TXQ list. Thus both the TXQ list and driver's list are in sync. Co-developed-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Louie Lu <git@louie.lu> [added debugfs write op to reset airtime counter] Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19cfg80211: Add airtime statistics and settingsToke Høiland-Jørgensen
This adds TX airtime statistics to the cfg80211 station dump (to go along with the RX info already present), and adds a new parameter to set the airtime weight of each station. The latter allows userspace to implement policies for different stations by varying their weights. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> [rmanohar@codeaurora.org: fixed checkpatch warnings] Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> [move airtime weight != 0 check into policy] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: Add TXQ scheduling APIToke Høiland-Jørgensen
This adds an API to mac80211 to handle scheduling of TXQs. The interface between driver and mac80211 for TXQ handling is changed by adding two new functions: ieee80211_next_txq(), which will return the next TXQ to schedule in the current round-robin rotation, and ieee80211_return_txq(), which the driver uses to indicate that it has finished scheduling a TXQ (which will then be put back in the scheduling rotation if it isn't empty). The driver must call ieee80211_txq_schedule_start() at the start of each scheduling session, and ieee80211_txq_schedule_end() at the end. The API then guarantees that the same TXQ is not returned twice in the same session (so a driver can loop on ieee80211_next_txq() without worrying about breaking the loop. Usage of the new API is optional, so drivers can be ported one at a time. In this patch, the actual scheduling performed by mac80211 is simple round-robin, but a subsequent commit adds airtime fairness awareness to the scheduler. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> [minor kernel-doc fix, propagate sparse locking checks out] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-19mac80211: fix miscounting of ttl-dropped framesBob Copeland
In ieee80211_rx_h_mesh_fwding, we increment the 'dropped_frames_ttl' counter when we decrement the ttl to zero. For unicast frames destined for other hosts, we stop processing the frame at that point. For multicast frames, we do not rebroadcast it in this case, but we do pass the frame up the stack to process it on this STA. That doesn't match the usual definition of "dropped," so don't count those as such. With this change, something like `ping6 -i0.2 ff02::1%mesh0` from a peer in a ttl=1 network no longer increments the counter rapidly. Signed-off-by: Bob Copeland <bobcopeland@fb.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-18net: bridge: Mark FDB entries that were added by user as suchIdo Schimmel
Externally learned entries can be added by a user or by a switch driver that is notifying the bridge driver about entries that were learned in hardware. In the first case, the entries are not marked with the 'added_by_user' flag, which causes switch drivers to ignore them and not offload them. The 'added_by_user' flag can be set on externally learned FDB entries based on the 'swdev_notify' parameter in br_fdb_external_learn_add(), which effectively means if the created / updated FDB entry was added by a user or not. Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health dump {get,clear} commandsEran Ben Elisha
Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health diagnose commandEran Ben Elisha
Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health recover commandEran Ben Elisha
Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health set commandEran Ben Elisha
Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health get commandEran Ben Elisha
Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health report functionalityEran Ben Elisha
Upon error discover, every driver can report it to the devlink health mechanism via devlink_health_report function, using the appropriate reporter registered to it. Driver can pass error specific context which will be delivered to it as part of the dump / recovery callbacks. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and stored at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. depends on: - Auto Recovery configuration - Grace period vs. time since last recover Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health reporter create/destroy functionalityEran Ben Elisha
Devlink health reporter is an instance for reporting, diagnosing and recovering from run time errors discovered by the reporters. Define it's data structure and supported operations. In addition, expose devlink API to create and destroy a reporter. Each devlink instance will hold it's own reporters list. As part of the allocation, driver shall provide a set of callbacks which will be used the devlink in order to handle health reports and user commands related to this reporter. In addition, driver is entitled to provide some priv pointer, which can be fetched from the reporter by devlink_health_reporter_priv function. For each reporter, devlink will hold a metadata of statistics, buffers and status. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18devlink: Add health buffer supportEran Ben Elisha
Devlink health buffer is a mechanism to pass descriptors between drivers and devlink. The API allows the driver to add objects, object pair, value array (nested attributes), value and name. Driver can use this API to fill the buffers in a format which can be translated by the devlink to the netlink message. In order to fulfill it, an internal buffer descriptor is defined. This will hold the data and metadata per each attribute and by used to pass actual commands to the netlink. This mechanism will be later used in devlink health for dump and diagnose data store by the drivers. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net_sched: add hit counter for matchallCong Wang
Although matchall always matches packets, however, it still relies on a protocol match first. So it is still useful to have such a counter for matchall. Of course, unlike u32, every time we hit a matchall filter, it is always a success, so we don't have to distinguish them. Sample output: filter protocol 802.1Q pref 100 matchall chain 0 filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1 not_in_hw (rule hit 10) action order 1: vlan pop continue index 1 ref 1 bind 1 installed 40 sec used 1 sec Action statistics: Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net: Fix usage of pskb_trim_rcsumRoss Lagerwall
In certain cases, pskb_trim_rcsum() may change skb pointers. Reinitialize header pointers afterwards to avoid potential use-after-frees. Add a note in the documentation of pskb_trim_rcsum(). Found by KASAN. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net: ip6_gre: remove gre_hdr_len from ip6erspan_rcvLorenzo Bianconi
Remove gre_hdr_len from ip6erspan_rcv routine signature since it is not longer used Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18Revert "netfilter: nft_hash: add map lookups for hashing operations"Laura Garcia Liebana
A better way to implement this from userspace has been found without specific code in the kernel side, revert this. Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations") Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nat: un-export nf_nat_used_tupleFlorian Westphal
Not used since 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nft_meta: Add NFT_META_I/OIFKIND meta typewenxu
In the ip_rcv the skb goes through the PREROUTING hook first, then kicks in vrf device and go through the same hook again. When conntrack dnat works with vrf, there will be some conflict with rules because the packet goes through the hook twice with different nf status. ip link add user1 type vrf table 1 ip link add user2 type vrf table 2 ip l set dev tun1 master user1 ip l set dev tun2 master user2 nft add table firewall nft add chain firewall zones { type filter hook prerouting priority - 300 \; } nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 } nft add chain firewall rule-1000-ingress nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept nft add rule firewall rule-1000-ingress counter drop nft add chain firewall rule-1000-egress nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop nft add rule firewall rule-1000-egress counter accept nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; } nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress } nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress } nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 } nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7 For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong chain rule-1000-egress which leads the packet drop With this patch, userspace can add the 'don't re-do entire ruleset for vrf' policy itself via: nft add rule firewall rules-all meta iifkind "vrf" counter accept Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nf_conntrack: provide modparam to always register conntrack hooksPablo Neira Ayuso
The connection tracking hooks can be optionally registered per netns when conntrack is specifically invoked from the ruleset since 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed"), the default behaviour is changed to always register them on demand. This patch provides a toggle that allows users to always register them. Without this toggle, in order to use conntrack for statistics collection, you need a dummy rule that refers to conntrack, eg. iptables -I INPUT -m state --state NEW This patch allows users to restore the original behaviour via modparam, ie. always register connection tracking, eg. modprobe nf_conntrack enable_hooks=1 Hence, no dummy rule is required. Reported-by: Laura Garcia <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove nf_ct_l4proto_find_getFlorian Westphal
Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove l4proto destroy hookFlorian Westphal
Only one user (gre), add a direct call and remove this facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove l4proto init and get_net callbacksFlorian Westphal
Those were needed we still had modular trackers. As we don't have those anymore, prefer direct calls and remove all the (un)register infrastructure associated with this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove sysctl registration helpersFlorian Westphal
After previous patch these are not used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: unify sysctl handlingFlorian Westphal
Due to historical reasons, all l4 trackers register their own sysctls. This leads to copy&pasted boilerplate code, that does exactly same thing, just with different data structure. Place all of this in a single file. This allows to remove the various ctl_table pointers from the ct_netns structure and reduces overall code size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>