summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-09-06Merge branch 'for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Several notable changes this cycle: - Thread mode was merged. This will be used for cgroup2 support for CPU and possibly other controllers. Unfortunately, CPU controller cgroup2 support didn't make this pull request but most contentions have been resolved and the support is likely to be merged before the next merge window. - cgroup.stat now shows the number of descendant cgroups. - cpuset now can enable the easier-to-configure v2 behavior on v1 hierarchy" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cpuset: Allow v2 behavior in v1 cgroup cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup cgroup: remove unneeded checks cgroup: misc changes cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy cgroup: re-use the parent pointer in cgroup_destroy_locked() cgroup: add cgroup.stat interface with basic hierarchy stats cgroup: implement hierarchy limits cgroup: keep track of number of descent cgroups cgroup: add comment to cgroup_enable_threaded() cgroup: remove unnecessary empty check when enabling threaded mode cgroup: update debug controller to print out thread mode information cgroup: implement cgroup v2 thread support cgroup: implement CSS_TASK_ITER_THREADED cgroup: introduce cgroup->dom_cgrp and threaded css_set handling cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS cgroup: reorganize cgroup.procs / task write path cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets cgroup: distinguish local and children populated states cgroup: remove now unused list_head @pending in cgroup_apply_cftypes() ...
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-09-05rds: Fix non-atomic operation on shared flag variableHåkon Bugge
The bits in m_flags in struct rds_message are used for a plurality of reasons, and from different contexts. To avoid any missing updates to m_flags, use the atomic set_bit() instead of the non-atomic equivalent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: sched: don't use GFP_KERNEL under spin lockJakub Kicinski
The new TC IDR code uses GFP_KERNEL under spin lock. Which leads to: [ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416 [ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc [ 582.636939] 2 locks held by tc/3379: [ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400 [ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.662217] Preemption disabled at: [ 582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0 [ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287 [ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 582.691937] Call Trace: ... [ 582.742460] kmem_cache_alloc+0x286/0x540 [ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450 [ 582.753209] idr_get_free_cmn+0x627/0xf80 ... [ 582.815525] idr_alloc_cmn+0x1a8/0x270 ... [ 582.833804] tcf_idr_create+0x31b/0x8e0 ... Try to preallocate the memory with idr_prealloc(GFP_KERNEL) (as suggested by Eric Dumazet), and change the allocation flags under spin lock. Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05rxrpc: Make service connection lookup always check for retryDavid Howells
When an RxRPC service packet comes in, the target connection is looked up by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry check is, however, currently skipped if we got a match, but probably shouldn't be in case the connection we found gets replaced whilst we're doing a search. Make the lookup procedure always go through need_seqretry(), even if the lookup was successful. This makes sure we always pick up on a write-lock event. On the other hand, since we don't take a ref on the object, but rely on RCU to prevent its destruction after dropping the seqlock, I'm not sure this is necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID update from Jiri Kosina: - Wacom driver fixes/updates (device name generation improvements, touch ring status support) from Jason Gerecke - T100 touchpad support from Hans de Goede - support for batteries driven by HID input reports, from Dmitry Torokhov - Arnd pointed out that driver_lock semaphore is superfluous, as driver core already provides all the necessary concurency protection. Removal patch from Binoy Jayan - logical minimum numbering improvements in sensor-hub driver, from Srinivas Pandruvada - support for Microsoft Win8 Wireless Radio Controls extensions from João Paulo Rechi Vita - assorted small fixes and device ID additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (28 commits) HID: prodikeys: constify snd_rawmidi_ops structures HID: sensor: constify platform_device_id HID: input: throttle battery uevents HID: usbmouse: constify usb_device_id and fix space before '[' error HID: usbkbd: constify usb_device_id and fix space before '[' error. HID: hid-sensor-hub: Force logical minimum to 1 for power and report state HID: wacom: Do not completely map WACOM_HID_WD_TOUCHRINGSTATUS usage HID: asus: Add T100CHI bluetooth keyboard dock touchpad support HID: ntrig: constify attribute_group structures. HID: logitech-hidpp: constify attribute_group structures. HID: sensor: constify attribute_group structures. HID: multitouch: constify attribute_group structures. HID: multitouch: use proper symbolic constant for 0xff310076 application HID: multitouch: Support Asus T304UA media keys HID: multitouch: Support HID_GD_WIRELESS_RADIO_CTLS HID: input: optionally use device id in battery name HID: input: map digitizer battery usage HID: Remove the semaphore driver_lock HID: wacom: add USB_HID dependency HID: add ALWAYS_POLL quirk for Logitech 0xc077 ...
2017-09-05net: dsa: tag_brcm: Set output queue from skb queue mappingFlorian Fainelli
We originally used skb->priority but that was not quite correct as this bitfield needs to contain the egress switch queue we intend to send this SKB to. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net: dsa: Allow switch drivers to indicate number of TX queuesFlorian Fainelli
Let switch drivers indicate how many TX queues they support. Some switches, such as Broadcom Starfighter 2 are designed with 8 egress queues. Future changes will allow us to leverage the queue mapping and direct the transmission towards a particular queue. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05bridge: switchdev: Use an helper to clear forward markIdo Schimmel
Instead of using ifdef in the C file. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05flow_dissector: Add limit for number of headers to dissectTom Herbert
In flow dissector there are no limits to the number of nested encapsulations or headers that might be dissected which makes for a nice DOS attack. This patch sets a limit of the number of headers that flow dissector will parse. Headers includes network layer headers, transport layer headers, shim headers for encapsulation, IPv6 extension headers, etc. The limit for maximum number of headers to parse has be set to fifteen to account for a reasonable number of encapsulations, extension headers, VLAN, in a packet. Note that this limit does not supercede the STOP_AT_* flags which may stop processing before the headers limit is reached. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05flow_dissector: Cleanup control flowTom Herbert
__skb_flow_dissect is riddled with gotos that make discerning the flow, debugging, and extending the capability difficult. This patch reorganizes things so that we only perform goto's after the two main switch statements (no gotos within the cases now). It also eliminates several goto labels so that there are only two labels that can be target for goto. Reported-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid referencesArnd Bergmann
We get a new link error in allmodconfig kernels after ftgmac100 started using the ncsi helpers: ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined! ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined! Related to that, we get another error when CONFIG_NET_NCSI is disabled: drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'? drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'? This fixes both problems at once, using a 'static inline' stub helper for the disabled case, and exporting the functions when they are present. Fixes: 51564585d8c6 ("ftgmac100: Support NCSI VLAN filtering when available") Fixes: 21acf63013ed ("net/ncsi: Configure VLAN tag filter") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05Merge branch 'for-4.14/wacom' into for-linusJiri Kosina
- name generation improvement for Wacom devices from Jason Gerecke - Kconfig dependency fix for Wacom driver from Arnd Bergmann
2017-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for next-net (part 2) The following patchset contains Netfilter updates for net-next. This patchset includes updates for nf_tables, removal of CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More specifically, they: 1) Add new rate match mode for hashlimit, this introduces a new revision for this match. The idea is to stop matching packets until ratelimit criteria stands true. Patch from Vishwanath Pai. 2) Add ->select_ops indirection to nf_tables named objects, so we can choose between different flavours of the same object type, patch from Pablo M. Bermudo. 3) Shorter function names in nft_limit, basically: s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo. 4) Add new stateful limit named object type, this allows us to create limit policies that you can identify via name, also from Pablo. 5) Remove unused hooknum parameter in conntrack ->packet indirection. From Florian Westphal. 6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao. 7) Add nf_tables_updchain() helper function and use it from nf_tables_newchain() to make it more maintainable. Similarly, add nf_tables_addchain() and use it too. 8) Add new netlink NLM_F_NONREC flag, this flag should only be used for deletion requests, specifically, to support non-recursive deletion. Based on what we discussed during NFWS'17 in Faro. 9) Use NLM_F_NONREC from table and sets in nf_tables. 10) Support for recursive chain deletion. Table and set deletion commands come with an implicit content flush on deletion, while chains do not. This patch addresses this inconsistency by adding the code to perform recursive chain deletions. This also comes with the bits to deal with the new NLM_F_NONREC netlink flag. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-04Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Add 'cross-release' support to lockdep, which allows APIs like completions, where it's not the 'owner' who releases the lock, to be tracked. It's all activated automatically under CONFIG_PROVE_LOCKING=y. - Clean up (restructure) the x86 atomics op implementation to be more readable, in preparation of KASAN annotations. (Dmitry Vyukov) - Fix static keys (Paolo Bonzini) - Add killable versions of down_read() et al (Kirill Tkhai) - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini) - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra) - Remove smp_mb__before_spinlock() and convert its usages, introduce smp_mb__after_spinlock() (Peter Zijlstra) * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits) locking/lockdep/selftests: Fix mixed read-write ABBA tests sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures smp: Avoid using two cache lines for struct call_single_data locking/lockdep: Untangle xhlock history save/restore from task independence locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being futex: Remove duplicated code and fix undefined behaviour Documentation/locking/atomic: Finish the document... locking/lockdep: Fix workqueue crossrelease annotation workqueue/lockdep: 'Fix' flush_work() annotation locking/lockdep/selftests: Add mixed read-write ABBA tests mm, locking/barriers: Clarify tlb_flush_pending() barriers locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive locking/lockdep: Explicitly initialize wq_barrier::done::map locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING locking/refcounts, x86/asm: Implement fast refcount overflow protection locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease ...
2017-09-04netfilter: nf_tables: support for recursive chain deletionPablo Neira Ayuso
This patch sorts out an asymmetry in deletions. Currently, table and set deletion commands come with an implicit content flush on deletion. However, chain deletion results in -EBUSY if there is content in this chain, so no implicit flush happens. So you have to send a flush command in first place to delete chains, this is inconsistent and it can be annoying in terms of user experience. This patch uses the new NLM_F_NONREC flag to request non-recursive chain deletion, ie. if the chain to be removed contains rules, then this returns EBUSY. This problem was discussed during the NFWS'17 in Faro, Portugal. In iptables, you hit -EBUSY if you try to delete a chain that contains rules, so you have to flush first before you can remove anything. Since iptables-compat uses the nf_tables netlink interface, it has to use the NLM_F_NONREC flag from userspace to retain the original iptables semantics, ie. bail out on removing chains that contain rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: use NLM_F_NONREC for deletion requestsPablo Neira Ayuso
Bail out if user requests non-recursive deletion for tables and sets. This new flags tells nf_tables netlink interface to reject deletions if tables and sets have content. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add nf_tables_addchain()Pablo Neira Ayuso
Wrap the chain addition path in a function to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add nf_tables_updchain()Pablo Neira Ayuso
nf_tables_newchain() is too large, wrap the chain update path in a function to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnad: "The main RCU related changes in this cycle were: - Removal of spin_unlock_wait() - SRCU updates - RCU torture-test updates - RCU Documentation updates - Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant - Miscellaneous RCU fixes - CPU-hotplug fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) arch: Remove spin_unlock_wait() arch-specific definitions locking: Remove spin_unlock_wait() generic definitions drivers/ata: Replace spin_unlock_wait() with lock/unlock pair ipc: Replace spin_unlock_wait() with lock/unlock pair exit: Replace spin_unlock_wait() with lock/unlock pair completion: Replace spin_unlock_wait() with lock/unlock pair doc: Set down RCU's scheduling-clock-interrupt needs doc: No longer allowed to use rcu_dereference on non-pointers doc: Add RCU files to docbook-generation files doc: Update memory-barriers.txt for read-to-write dependencies doc: Update RCU documentation membarrier: Provide expedited private command rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() rcu: Add warning to rcu_idle_enter() for irqs enabled rcu: Make rcu_idle_enter() rely on callers disabling irqs rcu: Add assertions verifying blocked-tasks list rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() rcu: Add TPS() protection for _rcu_barrier_trace strings rcu: Use idle versions of swait to make idle-hack clear swait: Add idle variants which don't contribute to load average ...
2017-09-04net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros.Varsha Rao
This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they are no longer required. Replace _ASSERT() macros with WARN_ON(). Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04net: Replace NF_CT_ASSERT() with WARN_ON().Varsha Rao
This patch removes NF_CT_ASSERT() and instead uses WARN_ON(). Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
2017-09-04netfilter: remove unused hooknum arg from packet functionsFlorian Westphal
tested with allmodconfig build. Signed-off-by: Florian Westphal <fw@strlen.de>
2017-09-04netfilter: nft_limit: add stateful object typePablo M. Bermudo Garay
Register a new limit stateful object type into the stateful object infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nft_limit: replace pkt_bytes with bytesPablo M. Bermudo Garay
Just a small refactor patch in order to improve the code readability. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: nf_tables: add select_ops for stateful objectsPablo M. Bermudo Garay
This patch adds support for overloading stateful objects operations through the select_ops() callback, just as it is implemented for expressions. This change is needed for upcoming additions to the stateful objects infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: xt_hashlimit: add rate match modeVishwanath Pai
This patch adds a new feature to hashlimit that allows matching on the current packet/byte rate without rate limiting. This can be enabled with a new flag --hashlimit-rate-match. The match returns true if the current rate of packets is above/below the user specified value. The main difference between the existing algorithm and the new one is that the existing algorithm rate-limits the flow whereas the new algorithm does not. Instead it *classifies* the flow based on whether it is above or below a certain rate. I will demonstrate this with an example below. Let us assume this rule: iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain If the packet rate is 15/s, the existing algorithm would ACCEPT 10 packets every second and send 5 packets to "new_chain". But with the new algorithm, as long as the rate of 15/s is sustained, all packets will continue to match and every packet is sent to new_chain. This new functionality will let us classify different flows based on their current rate, so that further decisions can be made on them based on what the current rate is. This is how the new algorithm works: We divide time into intervals of 1 (sec/min/hour) as specified by the user. We keep track of the number of packets/bytes processed in the current interval. After each interval we reset the counter to 0. When we receive a packet for match, we look at the packet rate during the current interval and the previous interval to make a decision: if [ prev_rate < user and cur_rate < user ] return Below else return Above Where cur_rate is the number of packets/bytes seen in the current interval, prev is the number of packets/bytes seen in the previous interval and 'user' is the rate specified by the user. We also provide flexibility to the user for choosing the time interval using the option --hashilmit-interval. For example the user can keep a low rate like x/hour but still keep the interval as small as 1 second. To preserve backwards compatibility we have to add this feature in a new revision, so I've created revision 3 for hashlimit. The two new options we add are: --hashlimit-rate-match --hashlimit-rate-interval I have updated the help text to add these new options. Also added a few tests for the new options. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04Merge branch 'linus' into locking/core, to fix up conflictsIngo Molnar
Conflicts: mm/page_alloc.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-03Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-09-03 Here's one last bluetooth-next pull request for the 4.14 kernel: - NULL pointer fix in ca8210 802.15.4 driver - A few "const" fixes - New Kconfig option for disabling legacy interfaces Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, updates to the conntrack core, enhancements for nf_tables, conversion of netfilter hooks from linked list to array to improve memory locality and asorted improvements for the Netfilter codebase. More specifically, they are: 1) Add expection to hashes after timer initialization to prevent access from another CPU that walks on the hashes and calls del_timer(), from Florian Westphal. 2) Don't update nf_tables chain counters from hot path, this is only used by the x_tables compatibility layer. 3) Get rid of nested rcu_read_lock() calls from netfilter hook path. Hooks are always guaranteed to run from rcu read side, so remove nested rcu_read_lock() where possible. Patch from Taehee Yoo. 4) nf_tables new ruleset generation notifications include PID and name of the process that has updated the ruleset, from Phil Sutter. 5) Use skb_header_pointer() from nft_fib, so we can reuse this code from the nf_family netdev family. Patch from Pablo M. Bermudo. 6) Add support for nft_fib in nf_tables netdev family, also from Pablo. 7) Use deferrable workqueue for conntrack garbage collection, to reduce power consumption, from Patch from Subash Abhinov Kasiviswanathan. 8) Add nf_ct_expect_iterate_net() helper and use it. From Florian Westphal. 9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian. 10) Drop references on conntrack removal path when skbuffs has escaped via nfqueue, from Florian. 11) Don't queue packets to nfqueue with dying conntrack, from Florian. 12) Constify nf_hook_ops structure, from Florian. 13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter. 14) Add nla_strdup(), from Phil Sutter. 15) Rise nf_tables objects name size up to 255 chars, people want to use DNS names, so increase this according to what RFC 1035 specifies. Patch series from Phil Sutter. 16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook registration on demand, suggested by Eric Dumazet, patch from Florian. 17) Remove unused variables in compat_copy_entry_from_user both in ip_tables and arp_tables code. Patch from Taehee Yoo. 18) Constify struct nf_conntrack_l4proto, from Julia Lawall. 19) Constify nf_loginfo structure, also from Julia. 20) Use a single rb root in connlimit, from Taehee Yoo. 21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo. 22) Use audit_log() instead of open-coding it, from Geliang Tang. 23) Allow to mangle tcp options via nft_exthdr, from Florian. 24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes a fix for a miscalculation of the minimal length. 25) Simplify branch logic in h323 helper, from Nick Desaulniers. 26) Calculate netlink attribute size for conntrack tuple at compile time, from Florian. 27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure. From Florian. 28) Remove holes in nf_conntrack_l4proto structure, so it becomes smaller. From Florian. 29) Get rid of print_tuple() indirection for /proc conntrack listing. Place all the code in net/netfilter/nf_conntrack_standalone.c. Patch from Florian. 30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is off. From Florian. 31) Constify most nf_conntrack_{l3,l4}proto helper functions, from Florian. 32) Fix broken indentation in ebtables extensions, from Colin Ian King. 33) Fix several harmless sparse warning, from Florian. 34) Convert netfilter hook infrastructure to use array for better memory locality, joint work done by Florian and Aaron Conole. Moreover, add some instrumentation to debug this. 35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once per batch, from Florian. 36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian. 37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao. 38) Remove unused code in the generic protocol tracker, from Davide Caratti. I think I will have material for a second Netfilter batch in my queue if time allow to make it fit in this merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03l2tp: pass tunnel pointer to ->session_create()Guillaume Nault
Using l2tp_tunnel_find() in pppol2tp_session_create() and l2tp_eth_create() is racy, because no reference is held on the returned session. These functions are only used to implement the ->session_create callback which is run by l2tp_nl_cmd_session_create(). Therefore searching for the parent tunnel isn't necessary because l2tp_nl_cmd_session_create() already has a pointer to it and holds a reference. This patch modifies ->session_create()'s prototype to directly pass the the parent tunnel as parameter, thus avoiding searching for it in pppol2tp_session_create() and l2tp_eth_create(). Since we have to touch the ->session_create() call in l2tp_nl_cmd_session_create(), let's also remove the useless conditional: we know that ->session_create isn't NULL at this point because it's already been checked earlier in this same function. Finally, one might be tempted to think that the removed l2tp_tunnel_find() calls were harmless because they would return the same tunnel as the one held by l2tp_nl_cmd_session_create() anyway. But that tunnel might be removed and a new one created with same tunnel Id before the l2tp_tunnel_find() call. In this case l2tp_tunnel_find() would return the new tunnel which wouldn't be protected by the reference held by l2tp_nl_cmd_session_create(). Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03l2tp: prevent creation of sessions on terminated tunnelsGuillaume Nault
l2tp_tunnel_destruct() sets tunnel->sock to NULL, then removes the tunnel from the pernet list and finally closes all its sessions. Therefore, it's possible to add a session to a tunnel that is still reachable, but for which tunnel->sock has already been reset. This can make l2tp_session_create() dereference a NULL pointer when calling sock_hold(tunnel->sock). This patch adds the .acpt_newsess field to struct l2tp_tunnel, which is used by l2tp_tunnel_closeall() to prevent addition of new sessions to tunnels. Resetting tunnel->sock is done after l2tp_tunnel_closeall() returned, so that l2tp_session_add_to_tunnel() can safely take a reference on it when .acpt_newsess is true. The .acpt_newsess field is modified in l2tp_tunnel_closeall(), rather than in l2tp_tunnel_destruct(), so that it benefits all tunnel removal mechanisms. E.g. on UDP tunnels, a session could be added to a tunnel after l2tp_udp_encap_destroy() proceeded. This would prevent the tunnel from being removed because of the references held by this new session on the tunnel and its socket. Even though the session could be removed manually later on, this defeats the purpose of commit 9980d001cec8 ("l2tp: add udp encap socket destroy handler"). Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Revert "net: fix percpu memory leaks"Jesper Dangaard Brouer
This reverts commit 1d6119baf0610f813eb9d9580eb4fd16de5b4ceb. After reverting commit 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") then here is no need for this fix-up patch. As percpu_counter is no longer used, it cannot memory leak it any-longer. Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Fixes: 1d6119baf061 ("net: fix percpu memory leaks") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03Revert "net: use lib/percpu_counter API for fragmentation mem accounting"Jesper Dangaard Brouer
This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2. There is a bug in fragmentation codes use of the percpu_counter API, that can cause issues on systems with many CPUs. The frag_mem_limit() just reads the global counter (fbc->count), without considering other CPUs can have upto batch size (130K) that haven't been subtracted yet. Due to the 3MBytes lower thresh limit, this become dangerous at >=24 CPUs (3*1024*1024/130000=24). The correct API usage would be to use __percpu_counter_compare() which does the right thing, and takes into account the number of (online) CPUs and batch size, to account for this and call __percpu_counter_sum() when needed. We choose to revert the use of the lib/percpu_counter API for frag memory accounting for several reasons: 1) On systems with CPUs > 24, the heavier fully locked __percpu_counter_sum() is always invoked, which will be more expensive than the atomic_t that is reverted to. Given systems with more than 24 CPUs are becoming common this doesn't seem like a good option. To mitigate this, the batch size could be decreased and thresh be increased. 2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX CPU, before SKBs are pushed into sockets on remote CPUs. Given NICs can only hash on L2 part of the IP-header, the NIC-RXq's will likely be limited. Thus, a fair chance that atomic add+dec happen on the same CPU. Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks") removed init_frag_mem_limit() and instead use inet_frags_init_net(). After this revert, inet_frags_uninit_net() becomes empty. Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Fixes: 1d6119baf061 ("net: fix percpu memory leaks") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: Add module reference to FIB notifiersIdo Schimmel
When a listener registers to the FIB notification chain it receives a dump of the FIB entries and rules from existing address families by invoking their dump operations. While we call into these modules we need to make sure they aren't removed. Do that by increasing their reference count before invoking their dump operations and decrease it afterwards. Fixes: 04b1d4e50e82 ("net: core: Make the FIB notification chain generic") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: convert (struct ubuf_info)->refcnt to refcount_tEric Dumazet
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. v2: added the change in drivers/vhost/net.c as spotted by Willem. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: prepare (struct ubuf_info)->refcnt conversionEric Dumazet
In order to convert this atomic_t refcnt to refcount_t, we need to init the refcount to one to not trigger a 0 -> 1 transition. This also removes one atomic operation in fast path. v2: removed dead code in sock_zerocopy_put_abort() as suggested by Willem. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01tcp_diag: report TCP MD5 signing keys and addressesIvan Delalande
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is not possible to retrieve these from the kernel once they have been configured on sockets. Signed-off-by: Ivan Delalande <colona@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01inet_diag: allow protocols to provide additional dataIvan Delalande
Extend inet_diag_handler to allow individual protocols to report additional data on INET_DIAG_INFO through idiag_get_aux. The size can be dynamic and is computed by idiag_get_aux_size. Signed-off-by: Ivan Delalande <colona@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01ipv6: sr: Use ARRAY_SIZE macroThomas Meyer
Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first candidates. Maybe a coccinelle can catch all of those. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01inetpeer: fix RCU lookup()Eric Dumazet
Excess of seafood or something happened while I cooked the commit adding RB tree to inetpeer. Of course, RCU rules need to be respected or bad things can happen. In this particular loop, we need to read *pp once per iteration, not twice. Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Bluetooth: make baswap src constLoic Poulain
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel Borkmann. 2) IPSEC ESP error paths leak memory, from Steffen Klassert. 3) We need an RCU grace period before freeing fib6_node objects, from Wei Wang. 4) Must check skb_put_padto() return value in HSR driver, from FLorian Fainelli. 5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew Jeffery. 6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric Dumazet. 7) Use after free when tcf_chain_destroy() called multiple times, from Jiri Pirko. 8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli. 9) Fix leak of uninitialized memory in sctp_get_sctp_info(), inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill(). From Stefano Brivio. 10) L2TP tunnel refcount fixes from Guillaume Nault. 11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi Kauperman. 12) Revert a PHY layer change wrt. handling of PHY_HALTED state in phy_stop_machine(), it causes regressions for multiple people. From Florian Fainelli. 13) When packets are sent out of br0 we have to clear the offload_fwdq_mark value. 14) Several NULL pointer deref fixes in packet schedulers when their ->init() routine fails. From Nikolay Aleksandrov. 15) Aquantium devices cannot checksum offload correctly when the packet is <= 60 bytes. From Pavel Belous. 16) Fix vnet header access past end of buffer in AF_PACKET, from Benjamin Poirier. 17) Double free in probe error paths of nfp driver, from Dan Carpenter. 18) QOS capability not checked properly in DCB init paths of mlx5 driver, from Huy Nguyen. 19) Fix conflicts between firmware load failure and health_care timer in mlx5, also from Huy Nguyen. 20) Fix dangling page pointer when DMA mapping errors occur in mlx5, from Eran Ben ELisha. 21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly, from Michael Chan. 22) Missing MSIX vector free in bnxt_en, also from Michael Chan. 23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo Colitti. 24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann. 25) bpf_setsockopts() erroneously always returns -EINVAL even on success. Fix from Yuchung Cheng. 26) tipc_rcv() needs to linearize the SKB before parsing the inner headers, from Parthasarathy Bhuvaragan. 27) Fix deadlock between link status updates and link removal in netvsc driver, from Stephen Hemminger. 28) Missed locking of page fragment handling in ESP output, from Steffen Klassert. 29) Fix refcnt leak in ebpf congestion control code, from Sabrina Dubroca. 30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value, from Christophe Jaillet. 31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during early demux, from Paolo Abeni. 32) Several info leaks in xfrm_user layer, from Mathias Krause. 33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio. 34) Properly propagate obsolete state of route upwards in ipv6 so that upper holders like xfrm can see it. From Xin Long. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits) udp: fix secpath leak bridge: switchdev: Clear forward mark when transmitting packet mlxsw: spectrum: Forbid linking to devices that have uppers wl1251: add a missing spin_lock_init() Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 kcm: do not attach PF_KCM sockets to avoid deadlock sch_tbf: fix two null pointer dereferences on init failure sch_sfq: fix null pointer dereference on init failure sch_netem: avoid null pointer deref on init failure sch_fq_codel: avoid double free on init failure sch_cbq: fix null pointer dereferences on init failure sch_hfsc: fix null pointer deref and double free on init failure sch_hhf: fix null pointer dereference on init failure sch_multiq: fix double free on init failure sch_htb: fix crash on init failure net/mlx5e: Fix CQ moderation mode not set properly net/mlx5e: Fix inline header size for small packets net/mlx5: E-Switch, Unload the representors in the correct order net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address ...
2017-09-01bpf: Collapse offset checks in sock_filter_is_valid_accessDavid Ahern
Make sock_filter_is_valid_access consistent with other is_valid_access helpers. Requested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01udp: fix secpath leakYossi Kuperman
After commit dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC") we preserve the secpath for the whole skb lifecycle, but we also end up leaking a reference to it. We must clear the head state on skb reception, if secpath is present. Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bridge: switchdev: Clear forward mark when transmitting packetIdo Schimmel
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") added the 'offload_fwd_mark' bit to the skb in order to allow drivers to indicate to the bridge driver that they already forwarded the packet in L2. In case the bit is set, before transmitting the packet from each port, the port's mark is compared with the mark stored in the skb's control block. If both marks are equal, we know the packet arrived from a switch device that already forwarded the packet and it's not re-transmitted. However, if the packet is transmitted from the bridge device itself (e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark stored in the skb's control block isn't valid. This scenario can happen in rare cases where a packet was trapped during L3 forwarding and forwarded by the kernel to a bridge device. Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Yotam Gigi <yotamg@mellanox.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum: Forbid linking to devices that have uppersIdo Schimmel
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the device in case a port is enslaved to a master netdev such as bridge or bond. Since the driver ignores events unrelated to its ports and their uppers, it's possible to engineer situations in which the device's data path differs from the kernel's. One example to such a situation is when a port is enslaved to a bond that is already enslaved to a bridge. When the bond was enslaved the driver ignored the event - as the bond wasn't one of its uppers - and therefore a bridge port instance isn't created in the device. Until such configurations are supported forbid them by checking that the upper device doesn't have uppers of its own. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Nogah Frankel <nogahf@mellanox.com> Tested-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-09-01 This should be the last ipsec-next pull request for this release cycle: 1) Support netdevice ESP trailer removal when decryption is offloaded. From Yossi Kuperman. 2) Fix overwritten return value of copy_sec_ctx(). Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>