Age | Commit message (Collapse) | Author |
|
The driver that offloads flower rules needs to know with which priority
user inserted the rules. So add this information into offload struct.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add ACL core infrastructure for Spectrum ASIC. This infra provides an
abstraction layer over specific HW implementations. There are two basic
objects used. One is "rule" and the second is "ruleset" which serves as a
container of multiple rules. In general, within one ruleset the rules are
allowed to have multiple priorities and masks. Each ruleset is bound to
either ingress or egress a of port netdevice.
The initial TCAM implementation is very simple and limited. It utilizes
parman lsort manager to take care of TCAM region layout.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This introduces a infrastructure for management of linear priority
areas. Priority order in an array matters, however order of items inside
a priority group does not matter.
As an initial implementation, L-sort algorithm is used. It is quite
trivial. More advanced algorithm called P-sort will be introduced as a
follow-up. The infrastructure is prepared for other algos.
Alongside this, a testing module is introduced as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to list_for_each_entry_continue and its reverse variant
list_for_each_entry_continue_reverse, introduce reverse helper for
list_for_each_entry_from.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add couple of resource limits related to ACL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce basic set of Spectrum flexible key blocks. It contains blocks
needed to carry all elements defined so far.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each entry which is matched during ACL lookup points to an action set.
This action set contains up to three separate actions. If more actions
are needed to be chained, the extended set is created to hold them
in KVD linear area.
This patch implements handling of sets and encoding of actions.
Currectly, only two actions are supported. Drop and forward. Forward
action uses PBS pointer to KVD linear area, so the action code needs to
take care of this as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hardware supports matching on so called "flexible keys". The idea is to
assemble an optimal key to use for matching according to the fields in
packet (elements) requested by user. Certain sets of elements are
combined into pre-defined blocks. There is a picker to find needed blocks.
Keys consist of 1..n blocks.
Alongside with that, an initial portion of elements is introduced in order
to be able to offload basic cls_flower rules.
Picked keys are cached so multiple rules could share them.
There is an encode function provided that takes care of encoding key and
mask values according to given key.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PEFA register is used for accessing an extended flexible action entry
in the central KVD Linear Database.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PPBS register retrieves and sets Policy Based Switching Table entries.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PRCR register is used for accessing rules within a TCAM region.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PPBT is used for configuration of the Port Binding Table.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PTCE-V2 register is used for accessing rules within a TCAM region.
It is a new version of PTCE in order to support wider key, mask and
action within a TCAM region.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PTAR register is used for allocation of regions in the TCAM.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PAGT register is used for configuration of the ACL Group Table.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PACL register is used for configuration of the ACL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sometimes it is handy to get a pointer to a char buffer item and use it
direcly to write/read data. So add these helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Item heplers for 8bit values are needed, let's add them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function bond_info_query alwarys returns 0. As such, in the function
bond_do_ioctl, it is not necessary to check the returned value. So the
interface type of the function bond_info_query is changed to void. The
redundant check is removed.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Josef Bacik diagnosed following problem :
I was seeing random disconnects while testing NBD over loopback.
This turned out to be because NBD sets pfmemalloc on it's socket,
however the receiving side is a user space application so does not
have pfmemalloc set on its socket. This means that
sk_filter_trim_cap will simply drop this packet, under the
assumption that the other side will simply retransmit. Well we do
retransmit, and then the packet is just dropped again for the same
reason.
It seems the better way to address this problem is to clear pfmemalloc
in the TCP transmit path. pfmemalloc strict control really makes sense
on the receive path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.10-rc7
One more device ID for pl2303.
Signed-off-by: Johan Hovold <johan@kernel.org>
|
|
|
|
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot of code, we also remove one spin_lock()
from driver fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Compared to custom busy_poll, the generic NAPI one is simpler and
removes a lot of code. It removes one atomic in the fast path (when
busy poll is not in action) since we do not have to use an extra
spinlock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.
Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.
As Cong suggested, we now use a work queue.
It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.
netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.
[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Compared to custom busy_poll, the generic NAPI one is better, since
it allows to use GRO, and it removes a lot of code and extra locked
operations in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv6 stack does not set the protocol for local routes, so those routes show
up with proto "none":
$ ip -6 ro ls table local
local ::1 dev lo proto none metric 0 pref medium
local 2100:3:: dev lo proto none metric 0 pref medium
local 2100:3::4 dev lo proto none metric 0 pref medium
local fe80:: dev lo proto none metric 0 pref medium
...
Set rt6i_protocol to RTPROT_KERNEL for consistency with IPv4. Now routes
show up with proto "kernel":
$ ip -6 ro ls table local
local ::1 dev lo proto kernel metric 0 pref medium
local 2100:3:: dev lo proto kernel metric 0 pref medium
local 2100:3::4 dev lo proto kernel metric 0 pref medium
local fe80:: dev lo proto kernel metric 0 pref medium
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Steven suggested to improve trace_print_hex_seq() a bit after commit
2acae0d5b0f7 ("trace: add variant without spacing in trace_print_hex_seq")
in two ways: i) by adding a kdoc comment for the helper function
itself and ii) by renaming 'spacing' argument into 'concatenate'
to better denote that we don't add spaces between each hex bytes.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We should check that we're within bounds first before checking that
"chip->active_banks[i] != TPM2_ALG_ERROR" so I've re-ordered the two
checks.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
Ivan will be taking care of switchdev code from now on.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Roopa Prabhu says:
====================
bridge: per vlan dst_metadata support
High level summary:
lwt and dst_metadata have enabled vxlan l3 deployments
to use a single vxlan netdev for multiple vnis eliminating the scalability
problem with using a single vxlan netdev per vni. This series tries to
do the same for vxlan netdevs in pure l2 bridged networks.
Use-case/deployment and details are below.
Deployment scerario details:
As we know VXLAN is used to build layer 2 virtual networks across the
underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
or a vswitch in the hypervisor. This patch series mainly
focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
along with vlan id is used to identify layer 2 segments in a vxlan
overlay network. Vxlan bridging is the function provided by Vteps to terminate
vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
the original Layer 2 packet if there is one before encapsulating the packet
into the VXLAN format to transmit it through the underlay network. The remote
VTEP devices have information about the VLAN in which the packet will be
placed based on their own VLAN-to-VXLAN VNI mapping configurations.
Existing solution:
Without this patch series one can deploy such a vtep configuration by
adding the local ports and vxlan netdevs into a vlan filtering bridge.
The local ports are configured as trunk ports carrying all vlans.
A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
The vxlan netdev only receives traffic corresponding to the vlan it is mapped
to. This configuration maps traffic belonging to a vlan to the corresponding
vxlan segment.
-----------------------------------
| bridge |
| |
-----------------------------------
|100,200 |100 (pvid) |200 (pvid)
| | |
swp1 vxlan1000 vxlan2000
This provides the required vxlan bridging function but poses a
scalability problem with using a separate vxlan netdev for each vni.
Solution in this patch series:
The Goal is to use a single vxlan device to carry all vnis similar
to the vxlan collect metadata mode but additionally allowing the bridge
and vxlan driver to carry all the forwarding information and also learn.
This implementation uses the existing dst_metadata infrastructure to map
vlan to a tunnel id.
- vxlan driver changes:
- enable collect metadata mode to be used with learning,
replication and fdb
- A single fdb table hashed by (mac, vni)
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata and relies
on learnt or static forwarding information table to forward the packet
- Bridge driver changes: per vlan dst_metadata support:
- Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
kept the api generic for any tunnel info
- Uapi to configure/unconfigure/dump per vlan tunnel data
- new bridge port flag to turn this feature on/off. off by default
- ingress hook:
- if port is a tunnel port, use tunnel info in
attached dst_metadata to map it to a local vlan
- egress hook:
- if port is a tunnel port, use tunnel info attached to vlan
to set dst_metadata on the skb
Other approaches tried and vetoed:
- tc vlan push/pop and tunnel metadata dst:
- though tc can be used to do part of this, these patches address a deployment
case where bridge driver vlan filtering and forwarding information
database along with vxlan driver forwarding information table and learning
are required.
- making vxlan driver understand vlan-vni mapping:
- I had a series almost ready with this one but soon realized
it duplicated a lot of vlan handling code in the vxlan driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- ingress hook:
- if port is a tunnel port, use tunnel info in
attached dst_metadata to map it to a local vlan
- egress hook:
- if port is a tunnel port, use tunnel info attached to
vlan to set dst_metadata on the skb
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support to attach per vlan tunnel info dst
metadata. This enables bridge driver to map vlan to tunnel_info
at ingress and egress. It uses the kernel dst_metadata infrastructure.
The initial use case is vlan to vni bridging, but the api is generic
to extend to any tunnel_info in the future:
- Uapi to configure/unconfigure/dump per vlan tunnel data
- netlink functions to configure vlan and tunnel_info mapping
- Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
dst_metadata to bridged packets on ports. off by default.
- changes to existing code is mainly refactor some existing vlan
handling netlink code + hooks for new vlan tunnel code
- I have kept the vlan tunnel code isolated in separate files.
- most of the netlink vlan tunnel code is handling of vlan-tunid
ranges (follows the vlan range handling code). To conserve space
vlan-tunid by default are always dumped in ranges if applicable.
Use case:
example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis).
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's and
vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metdata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New nested netlink attribute to associate tunnel info per vlan.
This is used by bridge driver to send tunnel metadata to
bridge ports in vlan tunnel mode. This patch also adds new per
port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
off by default.
One example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis). User can configure
per-vlan tunnel information which the bridge driver can use
to bridge vlan into the corresponding vn-segment.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.
use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.
Changes introduced by this patch:
- allow learning and forwarding database state in vxlan netdev in
COLLECT_METADATA mode. Current behaviour is not changed
by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
to support the new bridge friendly mode.
- A single fdb table hashed by (mac, vni) to allow fdb entries with
multiple vnis in the same fdb table
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata
- prior to this series, fdb remote_dsts carried remote vni and
the vxlan device carrying the fdb table represented the
source vni. With the vxlan device now representing multiple vnis,
this patch adds a src vni attribute to the fdb entry. The remote
vni already uses NDA_VNI attribute. This patch introduces
NDA_SRC_VNI netlink attribute to represent the src vni in a multi
vni fdb table.
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New ip_tunnel_info flag to represent bridged tunnel metadata.
Used by bridge driver later in the series to pass per vlan dst
metadata to bridge ports.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yotam Gigi says:
====================
Extract IFE logic to module
Extract ife logic from the tc_ife action into an independent module, and
make the tc_ife action use it. This way, the ife encapsulation can be used
by other modules other than tc_ife action.
v1->v2:
Fix duplicate symbol error by introducing a new patch that makes the
original symbol static.
The symbol ife_tlv_meta_extract is exported in act_ife, though not being
used by any other module. As the symbol is being moved to the new ife
module, introducing the new module creates duplicate symbol. To fix it,
add a new patch (1/3) that makes the ife_tlv_meta_extract symbol static in
act_ife, thus the symbol does not collide.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
- ife_encode: encode skb and reserve space for the ife meta header
- ife_decode: decode skb and extract the meta header size
- ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
header space.
- ife_tlv_meta_decode - decodes one tlv entry from the packet
- ife_tlv_meta_next - advance to the next tlv
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the function ife_tlv_meta_encode is not used by any other module,
unexport it and make it static for the act_ife module.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In cap_t the size of the type bool is assumed to be one byte. This
commit sorts out the issue by changing the type to u8.
Fixes: c659af78eb7b ("tpm: Check size of response before accessing data")
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
Unlike the device driver support for TPM 1.2, the TPM 2.0 does
not support the securityfs pseudo files for displaying the
firmware event log.
This patch enables support for providing the TPM 2.0 event log in
binary form. TPM 2.0 event log supports a crypto agile format that
records multiple digests, which is different from TPM 1.2. This
patch enables the tpm_bios_log_setup for TPM 2.0 and adds the
event log parser which understand the TPM 2.0 crypto agile format.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
Physical TPMs use Open Firmware Device Tree bindings that are similar
to the IBM Power virtual TPM to support event log. However, these
properties store the values in different endianness for Physical
and Virtual TPM.
This patch fixes the endianness issue by doing appropriate conversion
based on Physical or Virtual TPM.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
The current TPM 2.0 device driver extends only the SHA1 PCR bank
but the TCG Specification[1] recommends extending all active PCR
banks, to prevent malicious users from setting unused PCR banks with
fake measurements and quoting them.
The existing in-kernel interface(tpm_pcr_extend()) expects only a
SHA1 digest. To extend all active PCR banks with differing
digest sizes, the SHA1 digest is padded with trailing 0's as needed.
This patch reuses the defined digest sizes from the crypto subsystem,
adding a dependency on CRYPTO_HASH_INFO module.
[1] TPM 2.0 Specification referred here is "TCG PC Client Specific
Platform Firmware Profile for TPM 2.0"
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
This patch implements the TPM 2.0 capability TPM_CAP_PCRS to
retrieve the active PCR banks from the TPM. This is needed
to enable extending all active banks as recommended by TPM 2.0
TCG Specification.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Kenneth Goldman <kgold@linux.vnet.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
The error code handling is broken as any error code that has the same
bits set as TPM_RC_HASH passes. Implemented tpm2_rc_value() helper to
parse the error value from FMT0 and FMT1 error codes so that these types
of mistakes are prevented in the future.
Fixes: 5ca4c20cfd37 ("keys, trusted: select hash algorithm for TPM2 chips")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
|
|
probe_itpm() function is supposed to send command without an itpm flag set
and if this fails to repeat it, this time with the itpm flag set.
However, commit 41a5e1cf1fe15 ("tpm/tpm_tis: Split tpm_tis driver into a
core and TCG TIS compliant phy") moved the itpm flag from an "itpm"
variable to a TPM_TIS_ITPM_POSSIBLE chip flag, so setting the
(now function-local) itpm variable no longer had any effect.
Finally, this function-local itpm variable was removed by
commit 56af322156dbe9 ("tpm/tpm_tis: remove unused itpm variable")
Tested only on non-iTPM TIS TPM.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
For a long time the cdev read/write interface had this strange
idea that userspace had to read the result within 60 seconds otherwise
it is discarded. Perhaps this made sense under some older locking regime,
but in the modern kernel it is not required and is just dangerous.
Since something may be relying on this, double the timeout and print a
warning. We can remove the code in a few years, but this should be
enough to prevent new users.
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
These are non-generic functions and do not belong to tpm.h.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"MMC host: sdhci: Avoid hang when receiving spurious CARD_INT
interrupts"
* tag 'mmc-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci: Ignore unexpected CARD_INT interrupts
|