summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-29mptcp: implement and use MPTCP-level retransmissionPaolo Abeni
On timeout event, schedule a work queue to do the retransmission. Retransmission code closely resembles the sendmsg() implementation and re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags' sake - and peeking the relevant dfrag from the rtx head. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: rework mptcp_sendmsg_frag to accept optional dfragPaolo Abeni
This will simplify mptcp-level retransmission implementation in the next patch. If dfrag is provided by the caller, skip kernel space memory allocation and use data and metadata provided by the dfrag itself. Because a peer could ack data at TCP level but refrain from sending mptcp-level ACKs, we could grow the mptcp socket backlog indefinitely. We should thus block mptcp_sendmsg until the peer has acked some of the sent data. In order to be able to do so, increment the mptcp socket wmem_queued counter on memory allocation and decrement it when releasing the memory on mptcp-level ack reception. Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make this the mptcp sk_sndbuf limit. In the future we could add experiment with autotuning as TCP does in tcp_sndbuf_expand(). v2 -> v3: - remove 'inline' in foo.c files (David S. Miller) Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: allow partial cleaning of rtx head dfragFlorian Westphal
After adding wmem accounting for the mptcp socket we could get into a situation where the mptcp socket can't transmit more data, and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced because it currently will only remove entire dfrags. Allow advancing the dfrag head sequence and reduce wmem, even though this isn't correct (as we can't release the page). Because we will soon block on mptcp sk in case wmem is too large, call sk_stream_write_space() in case we reduced the backlog so userspace task blocked in sendmsg or poll will be woken up. This isn't an issue if the send buffer is large, but it is when SO_SNDBUF is used to reduce it to a lower value. Note we can still get a deadlock for low SO_SNDBUF values in case both sides of the connection write to the socket: both could be blocked due to wmem being too small -- and current mptcp stack will only increment mptcp ack_seq on recv. This doesn't happen with the selftest as it uses poll() and will always call recv if there is data to read. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: implement memory accounting for mptcp rtx queuePaolo Abeni
Charge the data on the rtx queue to the master MPTCP socket, too. Such memory in uncharged when the data is acked/dequeued. Also account mptcp sockets inuse via a protocol specific pcpu counter. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: introduce MPTCP retransmission timerPaolo Abeni
The timer will be used to schedule retransmission. It's frequency is based on the current subflow RTO estimation and is reset on every una_seq update The timer is clearer for good by __mptcp_clear_xmit() Also clean MPTCP rtx queue before each transmission. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: queue data for mptcp level retransmissionPaolo Abeni
Keep the send page fragment on an MPTCP level retransmission queue. The queue entries are allocated inside the page frag allocator, acquiring an additional reference to the page for each list entry. Also switch to a custom page frag refill function, to ensure that the current page fragment can always host an MPTCP rtx queue entry. The MPTCP rtx queue is flushed at disconnect() and close() time Note that now we need to call __mptcp_init_sock() regardless of mptcp enable status, as the destructor will try to walk the rtx_queue. v2 -> v3: - remove 'inline' in foo.c files (David S. Miller) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: update per unacked sequence on pkt receptionPaolo Abeni
So that we keep per unacked sequence number consistent; since we update per msk data, use an atomic64 cmpxchg() to protect against concurrent updates from multiple subflows. Initialize the snd_una at connect()/accept() time. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: Implement path manager interface commandsPeter Krystad
Fill in more path manager functionality by adding a worker function and modifying the related stub functions to schedule the worker. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: Add handling of outgoing MP_JOIN requestsPeter Krystad
Subflow creation may be initiated by the path manager when the primary connection is fully established and a remote address has been received via ADD_ADDR. Create an in-kernel sock and use kernel_connect() to initiate connection. Passive sockets can't acquire the mptcp socket lock at subflow creation time, so an additional list protected by a new spinlock is used to track the MPJ subflows. Such list is spliced into conn_list tail every time the msk socket lock is acquired, so that it will not interfere with data flow on the original connection. Data flow and connection failover not addressed by this commit. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: Add handling of incoming MP_JOIN requestsPeter Krystad
Process the MP_JOIN option in a SYN packet with the same flow as MP_CAPABLE but when the third ACK is received add the subflow to the MPTCP socket subflow list instead of adding it to the TCP socket accept queue. The subflow is added at the end of the subflow list so it will not interfere with the existing subflows operation and no data is expected to be transmitted on it. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: Add path manager interfacePeter Krystad
Add enough of a path manager interface to allow sending of ADD_ADDR when an incoming MPTCP connection is created. Capable of sending only a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection sock will need to be expanded to handle multiple interfaces and IPv6. Partial processing of the incoming ADD_ADDR is included so the path manager notification of that event happens at the proper time, which involves validating the incoming address information. This is a skeleton interface definition for events generated by MPTCP. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mptcp: Add ADD_ADDR handlingPeter Krystad
Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6, and RM_ADDR suboptions. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mlx4: fix "initializer element not constant" compiler errorJacob Keller
A recent commit e8937681797c ("devlink: prepare to support region operations") used the region_cr_space_str and region_fw_health_str variables as initializers for the devlink_region_ops structures. This can result in compiler errors: drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: error: initializer element is not constant .name = region_cr_space_str, ^ drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: note: (near initialization for ‘region_cr_space_ops.name’) drivers/net/ethernet/mellanox//mlx4/crdump.c:50:10: error: initializer element is not constant .name = region_fw_health_str, The variables were made to be "const char * const", indicating that both the pointer and data were constant. This was enough to resolve this on recent GCC (gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1) for this author). Unfortunately this is not enough for older compilers to realize that the variable can be treated as a constant expression. Fix this by introducing macros for the string and use those instead of the variable name in the region ops structures. Reported-by: tanhuazhong <tanhuazhong@huawei.com> Fixes: e8937681797c ("devlink: prepare to support region operations") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29devlink: don't wrap commands in rST shell blocksJacob Keller
The devlink-region.rst and ice-region.rst documentation files wrapped some lines within shell code blocks due to being longer than 80 lines. It was pointed out during review that wrapping these lines shouldn't be done. Fix these two rST files and remove the line wrapping on these shell command examples. Reported-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: dsa: mt7530: use resolved link config in mac_link_up()René van Dorst
Convert the mt7530 switch driver to use the finalised link parameters in mac_link_up() rather than the parameters in mac_config(). Signed-off-by: René van Dorst <opensource@vdorst.com> Tested-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: dsa: sja1105: show more ethtool statistics counters for P/Q/R/SVladimir Oltean
It looks like the P/Q/R/S series supports some more counters, generically named "Ethernet statistics counter", which we were not printing. Add them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29sctp: fix possibly using a bad saddr with a given dstMarcelo Ricardo Leitner
Under certain circumstances, depending on the order of addresses on the interfaces, it could be that sctp_v[46]_get_dst() would return a dst with a mismatched struct flowi. For example, if when walking through the bind addresses and the first one is not a match, it saves the dst as a fallback (added in 410f03831c07), but not the flowi. Then if the next one is also not a match, the previous dst will be returned but with the flowi information for the 2nd address, which is wrong. The fix is to use a locally stored flowi that can be used for such attempts, and copy it to the parameter only in case it is a possible match, together with the corresponding dst entry. The patch updates IPv6 code mostly just to be in sync. Even though the issue is also present there, it fallback is not expected to work with IPv6. Fixes: 410f03831c07 ("sctp: add routing output fallback") Reported-by: Jin Meng <meng.a.jin@nokia-sbell.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29s390/qeth: support net namespaces for L3 devicesJulian Wiedmann
Enable the L3 driver's IPv4 address notifier to watch for events on qeth devices that have been moved into a net namespace. We need to program those IPs into the HW just as usual, otherwise inbound traffic won't flow. Fixes: 6133fb1aa137 ("[NETNS]: Disable inetaddr notifiers in namespaces other than initial.") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29sctp: fix refcount bug in sctp_wfreeQiujun Huang
We should iterate over the datamsgs to move all chunks(skbs) to newsk. The following case cause the bug: for the trouble SKB, it was in outq->transmitted list sctp_outq_sack sctp_check_transmitted SKB was moved to outq->sacked list then throw away the sack queue SKB was deleted from outq->sacked (but it was held by datamsg at sctp_datamsg_to_asoc So, sctp_wfree was not called here) then migrate happened sctp_for_each_tx_datachunk( sctp_clear_owner_w); sctp_assoc_migrate(); sctp_for_each_tx_datachunk( sctp_set_owner_w); SKB was not in the outq, and was not changed to newsk finally __sctp_outq_teardown sctp_chunk_put (for another skb) sctp_datamsg_put __kfree_skb(msg->frag_list) sctp_wfree (for SKB) SKB->sk was still oldsk (skb->sk != asoc->base.sk). Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Acked-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: Fix typo of SKB_SGO_CB_OFFSETCambda Zhu
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the offset of the GSO in skb cb. This patch fixes the typo. Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ipv4: fix a RCU-list lock in fib_triestat_seq_showQian Cai
fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head, tb_hlist) without rcu_read_lock() will trigger a warning, net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by proc01/115277: #0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670 Call Trace: dump_stack+0xf4/0x164 (unreliable) lockdep_rcu_suspicious+0x140/0x164 fib_triestat_seq_show+0x750/0x880 seq_read+0x1a0/0x670 proc_reg_read+0x10c/0x1b0 __vfs_read+0x3c/0x70 vfs_read+0xac/0x170 ksys_read+0x7c/0x140 system_call+0x5c/0x68 Fix it by adding a pair of rcu_read_lock/unlock() and use cond_resched_rcu() to avoid the situation where walking of a large number of items may prevent scheduling for a long time. Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29qed: Fix race condition between scheduling and destroying the slowpath workqueueYuval Basson
Calling queue_delayed_work concurrently with destroy_workqueue might race to an unexpected outcome - scheduled task after wq is destroyed or other resources (like ptt_pool) are freed (yields NULL pointer dereference). cancel_delayed_work prevents the race by cancelling the timer triggered for scheduling a new task. Fixes: 59ccf86fe ("qed: Add driver infrastucture for handling mfw requests") Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Yuval Basson <ybason@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: page pool: allow to pass zero flags to page_pool_init()Denis Kirjanov
page pool API can be useful for non-DMA cases like xen-netfront driver so let's allow to pass zero flags to page pool flags. v2: check DMA direction only if PP_FLAG_DMA_MAP is set Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29selftests: move timestamping selftests to net folderJian Yang
For historical reasons, there are several timestamping selftest targets in selftests/networking/timestamping. Move them to the standard directory for networking tests: selftests/net. Signed-off-by: Jian Yang <jianyang@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ARM: dts: apalis-imx6qdl: use rgmii-id instead of rgmiiPhilippe Schenker
Until now a PHY-fixup in mach-imx set our rgmii timing correctly. For the PHY KSZ9131 there is no PHY-fixup in mach-imx. To support this PHY too, use rgmii-id. For the now used KSZ9031 nothing will change, as rgmii-id is only implemented and supported by the KSZ9131. Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: phy: micrel.c: add rgmii interface delay possibility to ksz9131Philippe Schenker
The KSZ9131 provides DLL controlled delays on RXC and TXC lines. This patch makes use of those delays. The information which delays should be enabled or disabled comes from the interface names, documented in ethernet-controller.yaml: rgmii: Disable RXC and TXC delays rgmii-id: Enable RXC and TXC delays rgmii-txid: Enable only TXC delay, disable RXC delay rgmii-rxid: Enable onlx RXC delay, disable TXC delay Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: macsec: add support for specifying offload upon link creationMark Starovoytov
This patch adds new netlink attribute to allow a user to (optionally) specify the desired offload mode immediately upon MACSec link creation. Separate iproute patch will be required to support this from user space. Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29mac80211: fix authentication with iwlwifi/mvmJohannes Berg
The original patch didn't copy the ieee80211_is_data() condition because on most drivers the management frames don't go through this path. However, they do on iwlwifi/mvm, so we do need to keep the condition here. Cc: stable@vger.kernel.org Fixes: ce2e1ca70307 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() case") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor comment conflict in mac80211. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29seccomp: Add missing compat_ioctl for notifySven Schnelle
Executing the seccomp_bpf testsuite under a 64-bit kernel with 32-bit userland (both s390 and x86) doesn't work because there's no compat_ioctl handler defined. Add the handler. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200310123332.42255-1-svens@linux.ibm.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-03-30crypto: af_alg - bool type cosmeticsLothar Rubusch
When working with bool values the true and false definitions should be used instead of 1 and 0. Hopefully I fixed my mailer and apologize for that. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: arm[64]/poly1305 - add artifact to .gitignore filesJason A. Donenfeld
The .S_shipped yields a .S, and the pattern in these directories is to add that to .gitignore so that git-status doesn't raise a fuss. Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Reported-by: Emil Renner Berthing <kernel@esmil.dk> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - limit single JD RNG output to maximum of 16 bytesAndrey Smirnov
In order to follow recommendation in SP800-90C (section "9.4 The Oversampling-NRBG Construction") limit the output of "generate" JD submitted to CAAM. See https://lore.kernel.org/linux-crypto/VI1PR0402MB3485EF10976A4A69F90E5B0F98580@VI1PR0402MB3485.eurprd04.prod.outlook.com/ for more details. This change should make CAAM's hwrng driver good enough to have 1024 quality rating. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - enable prediction resistance in HRWNGAndrey Smirnov
Instantiate CAAM RNG with prediction resistance enabled to improve its quality (with PR on DRNG is forced to reseed from TRNG every time random data is generated). Management Complex firmware with version lower than 10.20.0 doesn't provide prediction resistance support. Consider this and only instantiate rng when mc f/w version is lower. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Andrei Botila <andrei.botila@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30bus: fsl-mc: add api to retrieve mc versionAndrei Botila
Add a new api that returns Management Complex firmware version and make the required structure public. The api's first user will be the caam driver for setting prediction resistance bits. Signed-off-by: Andrei Botila <andrei.botila@nxp.com> Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - invalidate entropy register during RNG initializationAndrey Smirnov
In order to make sure that we always use non-stale entropy data, change the code to invalidate entropy register during RNG initialization. Signed-off-by: Aymen Sghaier <aymen.sghaier@nxp.com> Signed-off-by: Vipul Kumar <vipul_kumar@mentor.com> [andrew.smirnov@gmail.com ported to upstream kernel, rewrote commit msg] Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - check if RNG job failedAndrey Smirnov
We shouldn't stay silent if RNG job fails. Add appropriate code to check for that case and propagate error code up appropriately. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - simplify RNG implementationAndrey Smirnov
Rework CAAM RNG implementation as follows: - Make use of the fact that HWRNG supports partial reads and will handle such cases gracefully by removing recursion in caam_read() - Convert blocking caam_read() codepath to do a single blocking job read directly into requested buffer, bypassing any intermediary buffers - Convert async caam_read() codepath into a simple single reader/single writer FIFO use-case, thus simplifying concurrency handling and delegating buffer read/write position management to KFIFO subsystem. - Leverage the same low level RNG data extraction code for both async and blocking caam_read() scenarios, get rid of the shared job descriptor and make non-shared one as a simple as possible (just HEADER + ALGORITHM OPERATION + FIFO STORE) - Split private context from DMA related memory, so that the former could be allocated without GFP_DMA. NOTE: On its face value this commit decreased throughput numbers reported by dd if=/dev/hwrng of=/dev/null bs=1 count=100K [iflag=nonblock] by about 15%, however commits that enable prediction resistance and limit JR total size impact the performance so much and move the bottleneck such as to make this regression irrelevant. NOTE: On the bright side, this commit reduces RNG in kernel DMA buffer memory usage from 2 x RN_BUF_SIZE (~256K) to 32K. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - drop global context pointer and init_doneAndrey Smirnov
Leverage devres to get rid of code storing global context as well as init_done flag. Original code also has a circular deallocation dependency where unregister_algs() -> caam_rng_exit() -> caam_jr_free() chain would only happen if all of JRs were freed. Fix this by moving caam_rng_exit() outside of unregister_algs() and doing it specifically for JR that instantiated HWRNG. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - use struct hwrng's .init for initializationAndrey Smirnov
Make caamrng code a bit more symmetric by moving initialization code to .init hook of struct hwrng. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-imx@nxp.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: caam - allocate RNG instantiation descriptor with GFP_DMAAndrey Smirnov
Be consistent with the rest of the codebase and use GFP_DMA when allocating memory for a CAAM JR descriptor. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Horia Geantă <horia.geanta@nxp.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Iuliana Prodan <iuliana.prodan@nxp.com> Cc: linux-imx@nxp.com Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30crypto: ccree - remove duplicated include from cc_aead.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-30kbuild: deb-pkg: fix warning when CONFIG_DEBUG_INFO is unsetReinhard Karcher
Creating a Debian package without CONFIG_DEBUG_INFO produces a warning that no debug package was created. This patch excludes the debug package from the control file, if no debug package is created by this configuration. Signed-off-by: Reinhard Karcher <reinhard.karcher@gmx.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-30netfilter: flowtable: add counter support in HW offloadwenxu
Store the conntrack counters to the conntrack entry in the HW flowtable offload. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: conntrack: add nf_ct_acct_add()wenxu
Add nf_ct_acct_add function to update the conntrack counter with packets and bytes. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nf_tables: skip set types that do not support for expressionsPablo Neira Ayuso
The bitmap set does not support for expressions, skip it from the estimation step. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_dynset: validate set expression definitionPablo Neira Ayuso
If the global set expression definition mismatches the dynset expression, then bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_set_bitmap: initialize set element extension in lookupsPablo Neira Ayuso
Otherwise, nft_lookup might dereference an uninitialized pointer to the element extension. Fixes: 665153ff5752 ("netfilter: nf_tables: add bitmap set type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: ctnetlink: be more strict when NF_CONNTRACK_MARK is not setRomain Bellan
When CONFIG_NF_CONNTRACK_MARK is not set, any CTA_MARK or CTA_MARK_MASK in netlink message are not supported. We should return an error when one of them is set, not both Fixes: 9306425b70bf ("netfilter: ctnetlink: must check mark attributes vs NULL") Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30Merge branch 'bpf-lsm'Daniel Borkmann
KP Singh says: ==================== ** Motivation Google does analysis of rich runtime security data to detect and thwart threats in real-time. Currently, this is done in custom kernel modules but we would like to replace this with something that's upstream and useful to others. The current kernel infrastructure for providing telemetry (Audit, Perf etc.) is disjoint from access enforcement (i.e. LSMs). Augmenting the information provided by audit requires kernel changes to audit, its policy language and user-space components. Furthermore, building a MAC policy based on the newly added telemetry data requires changes to various LSMs and their respective policy languages. This patchset allows BPF programs to be attached to LSM hooks This facilitates a unified and dynamic (not requiring re-compilation of the kernel) audit and MAC policy. ** Why an LSM? Linux Security Modules target security behaviours rather than the kernel's API. For example, it's easy to miss out a newly added system call for executing processes (eg. execve, execveat etc.) but the LSM framework ensures that all process executions trigger the relevant hooks irrespective of how the process was executed. Allowing users to implement LSM hooks at runtime also benefits the LSM eco-system by enabling a quick feedback loop from the security community about the kind of behaviours that the LSM Framework should be targeting. ** How does it work? The patchset introduces a new eBPF (https://docs.cilium.io/en/v1.6/bpf/) program type BPF_PROG_TYPE_LSM which can only be attached to LSM hooks. Loading and attachment of BPF programs requires CAP_SYS_ADMIN. The new LSM registers nop functions (bpf_lsm_<hook_name>) as LSM hook callbacks. Their purpose is to provide a definite point where BPF programs can be attached as BPF_TRAMP_MODIFY_RETURN trampoline programs for hooks that return an int, and BPF_TRAMP_FEXIT trampoline programs for void LSM hooks. Audit logs can be written using a format chosen by the eBPF program to the perf events buffer or to global eBPF variables or maps and can be further processed in user-space. ** BTF Based Design The current design uses BTF: * https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html * https://lwn.net/Articles/803258 which allows verifiable read-only structure accesses by field names rather than fixed offsets. This allows accessing the hook parameters using a dynamically created context which provides a certain degree of ABI stability: // Only declare the structure and fields intended to be used // in the program struct vm_area_struct { unsigned long vm_start; } __attribute__((preserve_access_index)); // Declare the eBPF program mprotect_audit which attaches to // to the file_mprotect LSM hook and accepts three arguments. SEC("lsm/file_mprotect") int BPF_PROG(mprotect_audit, struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot, int ret) { unsigned long vm_start = vma->vm_start; return 0; } By relocating field offsets, BTF makes a large portion of kernel data structures readily accessible across kernel versions without requiring a large corpus of BPF helper functions and requiring recompilation with every kernel version. The BTF type information is also used by the BPF verifier to validate memory accesses within the BPF program and also prevents arbitrary writes to the kernel memory. The limitations of BTF compatibility are described in BPF Co-Re (http://vger.kernel.org/bpfconf2019_talks/bpf-core.pdf, i.e. field renames, #defines and changes to the signature of LSM hooks). This design imposes that the MAC policy (eBPF programs) be updated when the inspected kernel structures change outside of BTF compatibility guarantees. In practice, this is only required when a structure field used by a current policy is removed (or renamed) or when the used LSM hooks change. We expect the maintenance cost of these changes to be acceptable as compared to the design presented in the RFC. (https://lore.kernel.org/bpf/20190910115527.5235-1-kpsingh@chromium.org/). ** Usage Examples A simple example and some documentation is included in the patchset. In order to better illustrate the capabilities of the framework some more advanced prototype (not-ready for review) code has also been published separately: * Logging execution events (including environment variables and arguments) https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_audit_env.c * Detecting deletion of running executables: https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_detect_exec_unlink.c * Detection of writes to /proc/<pid>/mem: https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_audit_env.c We have updated Google's internal telemetry infrastructure and have started deploying this LSM on our Linux Workstations. This gives us more confidence in the real-world applications of such a system. ** Changelog: - v8 -> v9: https://lore.kernel.org/bpf/20200327192854.31150-1-kpsingh@chromium.org/ * Fixed a selftest crash when CONFIG_LSM doesn't have "bpf". * Added James' Ack. * Rebase. - v7 -> v8: https://lore.kernel.org/bpf/20200326142823.26277-1-kpsingh@chromium.org/ * Removed CAP_MAC_ADMIN check from bpf_lsm_verify_prog. LSMs can add it in their own bpf_prog hook. This can be revisited as a separate patch. * Added Andrii and James' Ack/Review tags. * Fixed an indentation issue and missing newlines in selftest error a cases. * Updated a comment as suggested by Alexei. * Updated the documentation to use the newer libbpf API and some other fixes. * Rebase - v6 -> v7: https://lore.kernel.org/bpf/20200325152629.6904-1-kpsingh@chromium.org/ * Removed __weak from the LSM attachment nops per Kees' suggestion. Will send a separate patch (if needed) to update the noinline definition in include/linux/compiler_attributes.h. * waitpid to wait specifically for the forked child in selftests. * Comment format fixes in security/... as suggested by Casey. * Added Acks from Kees and Andrii and Casey's Reviewed-by: tags to the respective patches. * Rebase - v5 -> v6: https://lore.kernel.org/bpf/20200323164415.12943-1-kpsingh@chromium.org/ * Updated LSM_HOOK macro to define a default value and cleaned up the BPF LSM hook declarations. * Added Yonghong's Acks and Kees' Reviewed-by tags. * Simplification of the selftest code. * Rebase and fixes suggested by Andrii and Yonghong and some other minor fixes noticed in internal review. - v4 -> v5: https://lore.kernel.org/bpf/20200220175250.10795-1-kpsingh@chromium.org/ * Removed static keys and special casing of BPF calls from the LSM framework. * Initialized the BPF callbacks (nops) as proper LSM hooks. * Updated to using the newly introduced BPF_TRAMP_MODIFY_RETURN trampolines in https://lkml.org/lkml/2020/3/4/877 * Addressed Andrii's feedback and rebased. - v3 -> v4: * Moved away from allocating a separate security_hook_heads and adding a new special case for arch_prepare_bpf_trampoline to using BPF fexit trampolines called from the right place in the LSM hook and toggled by static keys based on the discussion in: https://lore.kernel.org/bpf/CAG48ez25mW+_oCxgCtbiGMX07g_ph79UOJa07h=o_6B6+Q-u5g@mail.gmail.com/ * Since the code does not deal with security_hook_heads anymore, it goes from "being a BPF LSM" to "BPF program attachment to LSM hooks". * Added a new test case which ensures that the BPF programs' return value is reflected by the LSM hook. - v2 -> v3 does not change the overall design and has some minor fixes: * LSM_ORDER_LAST is introduced to represent the behaviour of the BPF LSM * Fixed the inadvertent clobbering of the LSM Hook error codes * Added GPL license requirement to the commit log * The lsm_hook_idx is now the more conventional 0-based index * Some changes were split into a separate patch ("Load btf_vmlinux only once per object") https://lore.kernel.org/bpf/20200117212825.11755-1-kpsingh@chromium.org/ * Addressed Andrii's feedback on the BTF implementation * Documentation update for using generated vmlinux.h to simplify programs * Rebase - Changes since v1: https://lore.kernel.org/bpf/20191220154208.15895-1-kpsingh@chromium.org * Eliminate the requirement to maintain LSM hooks separately in security/bpf/hooks.h Use BPF trampolines to dynamically allocate security hooks * Drop the use of securityfs as bpftool provides the required introspection capabilities. Update the tests to use the bpf_skeleton and global variables * Use O_CLOEXEC anonymous fds to represent BPF attachment in line with the other BPF programs with the possibility to use bpf program pinning in the future to provide "permanent attachment". * Drop the logic based on prog names for handling re-attachment. * Drop bpf_lsm_event_output from this series and send it as a separate patch. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>