summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-31octeontx2-af: Update get/set resource count functionsSubbaraya Sundeep
Since multiple blocks of same type are present in 98xx, modify functions which get resource count and which update resource count to work with individual block address instead of block type. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: axienet: Properly handle PCS/PMA PHY for 1000BaseX modeRobert Hancock
Update the axienet driver to properly support the Xilinx PCS/PMA PHY component which is used for 1000BaseX and SGMII modes, including properly configuring the auto-negotiation mode of the PHY and reading the negotiated state from the PHY. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Link: https://lore.kernel.org/r/20201028171429.1699922-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: ipa: avoid a bogus warningAlex Elder
The previous commit added support for IPA having up to six source and destination resources. But currently nothing uses more than four. (Five of each are used in a newer version of the hardware.) I find that in one of my build environments the compiler complains about newly-added code in two spots. Inspection shows that the warnings have no merit, but this compiler does not recognize that. ipa_main.c:457:39: warning: array index 5 is past the end of the array (which contains 4 elements) [-Warray-bounds] (and the same warning at line 483) We can make this warning go away by changing the number of elements in the source and destination resource limit arrays--now rather than waiting until we need it to support the newer hardware. This change was coming soon anyway; make it now to get rid of the warning. Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20201031151524.32132-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31Merge branch 'ipv6-reply-icmp-error-if-fragment-doesn-t-contain-all-headers'Jakub Kicinski
Hangbin Liu says: ==================== IPv6: reply ICMP error if fragment doesn't contain all headers When our Engineer run latest IPv6 Core Conformance test, test v6LC.1.3.6: First Fragment Doesn’t Contain All Headers[1] failed. The test purpose is to verify that the node (Linux for example) should properly process IPv6 packets that don’t include all the headers through the Upper-Layer header. Based on RFC 8200, Section 4.5 Fragment Header - If the first fragment does not include all headers through an Upper-Layer header, then that fragment should be discarded and an ICMP Parameter Problem, Code 3, message should be sent to the source of the fragment, with the Pointer field set to zero. The first patch add a definition for ICMPv6 Parameter Problem, code 3. The second patch add a check for the 1st fragment packet to make sure Upper-Layer header exist. [1] Page 68, v6LC.1.3.6: First Fragment Doesn’t Contain All Headers part A, B, C and D at https://ipv6ready.org/docs/Core_Conformance_5_0_0.pdf [2] My reproducer: import sys, os from scapy.all import * def send_frag_dst_opt(src_ip6, dst_ip6): ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44) frag_1 = IPv6ExtHdrFragment(nh = 60, m = 1) dst_opt = IPv6ExtHdrDestOpt(nh = 58) frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1) icmp_echo = ICMPv6EchoRequest(seq = 1) pkt_1 = ip6/frag_1/dst_opt pkt_2 = ip6/frag_2/icmp_echo send(pkt_1) send(pkt_2) def send_frag_route_opt(src_ip6, dst_ip6): ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44) frag_1 = IPv6ExtHdrFragment(nh = 43, m = 1) route_opt = IPv6ExtHdrRouting(nh = 58) frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1) icmp_echo = ICMPv6EchoRequest(seq = 2) pkt_1 = ip6/frag_1/route_opt pkt_2 = ip6/frag_2/icmp_echo send(pkt_1) send(pkt_2) if __name__ == '__main__': src = sys.argv[1] dst = sys.argv[2] conf.iface = sys.argv[3] send_frag_dst_opt(src, dst) send_frag_route_opt(src, dst) ==================== Link: https://lore.kernel.org/r/20201027123313.3717941-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31IPv6: reply ICMP error if the first fragment don't include all headersHangbin Liu
Based on RFC 8200, Section 4.5 Fragment Header: - If the first fragment does not include all headers through an Upper-Layer header, then that fragment should be discarded and an ICMP Parameter Problem, Code 3, message should be sent to the source of the fragment, with the Pointer field set to zero. Checking each packet header in IPv6 fast path will have performance impact, so I put the checking in ipv6_frag_rcv(). As the packet may be any kind of L4 protocol, I only checked some common protocols' header length and handle others by (offset + 1) > skb->len. Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments (fragmented packet with only one fragment). When send ICMP error message, if the 1st truncated fragment is ICMP message, icmp6_send() will break as is_ineligible() return true. So I added a check in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header return false. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31ICMPv6: Add ICMPv6 Parameter Problem, code 3 definitionHangbin Liu
Based on RFC7112, Section 6: IANA has added the following "Type 4 - Parameter Problem" message to the "Internet Control Message Protocol version 6 (ICMPv6) Parameters" registry: CODE NAME/DESCRIPTION 3 IPv6 First Fragment has incomplete IPv6 Header Chain Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: atm: fix update of position index in lec_seq_nextColin Ian King
The position index in leq_seq_next is not updated when the next entry is fetched an no more entries are available. This causes seq_file to report the following error: "seq_file: buggy .next function lec_seq_next [lec] did not update position index" Fix this by always updating the position index. [ Note: this is an ancient 2002 bug, the sha is from the tglx/history repo ] Fixes 4aea2cbff417 ("[ATM]: Move lan seq_file ops to lec.c [1/3]") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31Merge tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fix from Christoph Hellwig: "Fix an integer overflow on 32-bit platforms in the new DMA range code (Geert Uytterhoeven)" * tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n
2020-10-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four driver fixes and one core fix. The core fix closes a race window where we could kick off a second asynchronous scan because the test and set of the variable preventing it isn't atomic" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: hisi_sas: Stop using queue #0 always for v2 hw scsi: ibmvscsi: Fix potential race after loss of transport scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove() scsi: qla2xxx: Return EBUSY on fcport deletion scsi: core: Don't start concurrent async scan on same host
2020-10-31Merge branch ↵Jakub Kicinski
'net-add-functionality-to-net-core-byte-packet-counters-and-use-it-in-r8169' Heiner Kallweit says: ==================== net: add functionality to net core byte/packet counters and use it in r8169 This series adds missing functionality to the net core handling of byte/packet counters and statistics. The extensions are then used to remove private rx/tx byte/packet counters in r8169 driver. ==================== Link: https://lore.kernel.org/r/1fdb8ecd-be0a-755d-1d92-c62ed8399e77@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31r8169: remove no longer needed private rx/tx packet/byte countersHeiner Kallweit
After switching to the net core rx/tx byte/packet counters we can remove the now unused private version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31r8169: use struct pcpu_sw_netstats for rx/tx packet/byte countersHeiner Kallweit
Switch to the net core rx/tx byte/packet counter infrastructure. This simplifies the code, only small drawback is some memory overhead because we use just one queue, but allocate the counters per cpu. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: core: add devm_netdev_alloc_pcpu_statsHeiner Kallweit
We have netdev_alloc_pcpu_stats(), and we have devm_alloc_percpu(). Add a managed version of netdev_alloc_pcpu_stats, e.g. for allocating the per-cpu stats in the probe() callback of a driver. It needs to be a macro for dealing properly with the type argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: core: add dev_sw_netstats_tx_addHeiner Kallweit
Add dev_sw_netstats_tx_add(), complementing already existing dev_sw_netstats_rx_add(). Other than dev_sw_netstats_rx_add allow to pass the number of packets as function argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31Merge branch 'in_interrupt-cleanup-part-2'Jakub Kicinski
Sebastian Andrzej Siewior says: ==================== in_interrupt() cleanup, part 2 in the discussion about preempt count consistency across kernel configurations: https://lore.kernel.org/r/20200914204209.256266093@linutronix.de/ Linus clearly requested that code in drivers and libraries which changes behaviour based on execution context should either be split up so that e.g. task context invocations and BH invocations have different interfaces or if that's not possible the context information has to be provided by the caller which knows in which context it is executing. This includes conditional locking, allocation mode (GFP_*) decisions and avoidance of code paths which might sleep. In the long run, usage of 'preemptible, in_*irq etc.' should be banned from driver code completely. This is part two addressing remaining drivers except for orinoco-usb. ==================== Cherry picking only Ethernet changes. Link: https://lore.kernel.org/r/20201027225454.3492351-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: tlan: Replace in_irq() usageSebastian Andrzej Siewior
The driver uses in_irq() to determine if the tlan_priv::lock has to be acquired in tlan_mii_read_reg() and tlan_mii_write_reg(). The interrupt handler acquires the lock outside of these functions so the in_irq() check is meant to prevent a lock recursion deadlock. But this check is incorrect when interrupt force threading is enabled because then the handler runs in thread context and in_irq() correctly returns false. The usage of in_*() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be seperated or the context be conveyed in an argument passed by the caller, which usually knows the context. tlan_set_timer() has this conditional as well, but this function is only invoked from task context or the timer callback itself. So it always has to lock and the check can be removed. tlan_mii_read_reg(), tlan_mii_write_reg() and tlan_phy_print() are invoked from interrupt and other contexts. Split out the actual function body into helper variants which are called from interrupt context and make the original functions wrappers which acquire tlan_priv::lock unconditionally. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Samuel Chessman <chessman@tux.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: forcedeth: Replace context and lock check with a lockdep_assert()Sebastian Andrzej Siewior
nv_update_stats() triggers a WARN_ON() when invoked from hard interrupt context because the locks in use are not hard interrupt safe. It also has an assert_spin_locked() which was the lock check before the lockdep era. Lockdep has way broader locking correctness checks and covers both issues, so replace the warning and the lock assert with lockdep_assert_held(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Rain River <rain.1986.08.12@gmail.com> Cc: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31net: neterion: s2io: Replace in_interrupt() for context detectionSebastian Andrzej Siewior
wait_for_cmd_complete() uses in_interrupt() to detect whether it is safe to sleep or not. The usage of in_interrupt() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be seperated or the context be conveyed in an argument passed by the caller, which usually knows the context. in_interrupt() also is only partially correct because it fails to chose the correct code path when just preemption or interrupts are disabled. Add an argument 'may_block' to both functions and adjust the callers to pass the context information. The following call chains which end up invoking wait_for_cmd_complete() were analyzed to be safe to sleep: s2io_card_up() s2io_set_multicast() init_nic() init_tti() s2io_close() do_s2io_delete_unicast_mc() do_s2io_add_mac() s2io_set_mac_addr() do_s2io_prog_unicast() do_s2io_add_mac() s2io_reset() do_s2io_restore_unicast_mc() do_s2io_add_mc() do_s2io_add_mac() s2io_open() do_s2io_prog_unicast() do_s2io_add_mac() The following call chains which end up invoking wait_for_cmd_complete() were analyzed to be safe to sleep: __dev_set_rx_mode() s2io_set_multicast() s2io_txpic_intr_handle() s2io_link() init_tti() Add a may_sleep argument to wait_for_cmd_complete(), s2io_set_multicast() and init_tti() and hand the context information in from the call sites. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Jon Mason <jdmason@kudzu.us> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-31KVM: vmx: remove unused variablePaolo Bonzini
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31KVM: selftests: Don't require THP to run testsAndrew Jones
Unless we want to test with THP, then we shouldn't require it to be configured by the host kernel. Unfortunately, even advising with MADV_NOHUGEPAGE does require it, so check for THP first in order to avoid madvise failing with EINVAL. Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201029201703.102716-2-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work againVitaly Kuznetsov
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed nowadays despite the code checking 'enable_evmcs' static key looking correct. Turns out, static key magic doesn't work in '__init' section (and it is unclear when things changed) but setup_vmcs_config() is called only once per CPU so we don't really need it to. Switch to checking 'enlightened_vmcs' instead, it is supposed to be in sync with 'enable_evmcs'. Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded extra newline from it. Reported-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31KVM: selftests: test behavior of unmapped L2 APIC-access addressJim Mattson
Add a regression test for commit 671ddc700fd0 ("KVM: nVMX: Don't leak L1 MMIO regions to L2"). First, check to see that an L2 guest can be launched with a valid APIC-access address that is backed by a page of L1 physical memory. Next, set the APIC-access address to a (valid) L1 physical address that is not backed by memory. KVM can't handle this situation, so resuming L2 should result in a KVM exit for internal error (emulation). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201026180922.3120555-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31netfilter: ipset: Expose the initval hash parameter to userspaceJozsef Kadlecsik
It makes possible to reproduce exactly the same set after a save/restore. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Add bucketsize parameter to all hash typesJozsef Kadlecsik
The parameter defines the upper limit in any hash bucket at adding new entries from userspace - if the limit would be exceeded, ipset doubles the hash size and rehashes. It means the set may consume more memory but gives faster evaluation at matching in the set. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Support the -exist flag with the destroy commandJozsef Kadlecsik
The -exist flag was supported with the create, add and delete commands. In order to gracefully handle the destroy command with nonexistent sets, the -exist flag is added to destroy too. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Update byte and packet counters regardless of whether they ↵Stefano Brivio
match In ip_set_match_extensions(), for sets with counters, we take care of updating counters themselves by calling ip_set_update_counter(), and of checking if the given comparison and values match, by calling ip_set_match_counter() if needed. However, if a given comparison on counters doesn't match the configured values, that doesn't mean the set entry itself isn't matching. This fix restores the behaviour we had before commit 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching"), without reintroducing the issue fixed there: back then, mtype_data_match() first updated counters in any case, and then took care of matching on counters. Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set, ip_set_update_counter() will anyway skip counter updates if desired. The issue observed is illustrated by this reproducer: ipset create c hash:ip counters ipset add c 192.0.2.1 iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP if we now send packets from 192.0.2.1, bytes and packets counters for the entry as shown by 'ipset list' are always zero, and, no matter how many bytes we send, the rule will never match, because counters themselves are not updated. Reported-by: Mithil Mhatre <mmhatre@redhat.com> Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: nft_reject: add reject verdict support for netdevJose M. Guisado Gomez
Adds support for reject from ingress hook in netdev family. Both stacks ipv4 and ipv6. With reject packets supporting ICMP and TCP RST. This ability is required in devices that need to REJECT legitimate clients which traffic is forwarded from the ingress hook. Joint work with Laura Garcia. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: nft_reject: unify reject init and dump into nft_rejectJose M. Guisado Gomez
Bridge family is using the same static init and dump function as inet. This patch removes duplicate code unifying these functions body into nft_reject.c so they can be reused in the rest of families supporting reject verdict. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: nf_reject: add reject skbuff creation helpersJose M. Guisado Gomez
Adds reject skbuff creation helper functions to ipv4/6 nf_reject infrastructure. Use these functions for reject verdict in bridge family. Can be reused by all different families that support reject and will not inject the reject packet through ip local out. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-30Merge branch 'l2-multicast-forwarding-for-ocelot-switch'Jakub Kicinski
Vladimir Oltean says: ==================== L2 multicast forwarding for Ocelot switch This series enables the mscc_ocelot switch to forward raw L2 (non-IP) mdb entries as configured by the bridge driver after this patch: https://patchwork.ozlabs.org/project/netdev/patch/20201028233831.610076-1-vladimir.oltean@nxp.com/ ==================== Link: https://lore.kernel.org/r/20201029022738.722794-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: mscc: ocelot: support L2 multicast entriesVladimir Oltean
There is one main difference in mscc_ocelot between IP multicast and L2 multicast. With IP multicast, destination ports are encoded into the upper bytes of the multicast MAC address. Example: to deliver the address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to program the address of 00:03:08:11:22:33 into hardware. Whereas for L2 multicast, the MAC table entry points to a Port Group ID (PGID), and that PGID contains the port mask that the packet will be forwarded to. As to why it is this way, no clue. My guess is that not all port combinations can be supported simultaneously with the limited number of PGIDs, and this was somehow an issue for IP multicast but not for L2 multicast. Anyway. Prior to this change, the raw L2 multicast code was bogus, due to the fact that there wasn't really any way to test it using the bridge code. There were 2 issues: - A multicast PGID was allocated for each MDB entry, but it wasn't in fact programmed to hardware. It was dummy. - In fact we don't want to reserve a multicast PGID for every single MDB entry. That would be odd because we can only have ~60 PGIDs, but thousands of MDB entries. So instead, we want to reserve a multicast PGID for every single port combination for multicast traffic. And since we can have 2 (or more) MDB entries delivered to the same port group (and therefore PGID), we need to reference-count the PGIDs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: mscc: ocelot: make entry_type a member of struct ocelot_multicastVladimir Oltean
This saves a re-classification of the MDB address on deletion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: mscc: ocelot: remove the "new" variable in ocelot_port_mdb_addVladimir Oltean
It is Not Needed, a comment will suffice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: mscc: ocelot: use ether_addr_copyVladimir Oltean
Since a helper is available for copying Ethernet addresses, let's use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: mscc: ocelot: classify L2 mdb entries as LOCKEDVladimir Oltean
ocelot.h says: /* MAC table entry types. * ENTRYTYPE_NORMAL is subject to aging. * ENTRYTYPE_LOCKED is not subject to aging. * ENTRYTYPE_MACv4 is not subject to aging. For IPv4 multicast. * ENTRYTYPE_MACv6 is not subject to aging. For IPv6 multicast. */ We don't want the permanent entries added with 'bridge mdb' to be subject to aging. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: bridge: explicitly convert between mdb entry state and port group flagsVladimir Oltean
When creating a new multicast port group, there is implicit conversion between the __u8 state member of struct br_mdb_entry and the unsigned char flags member of struct net_bridge_port_group. This implicit conversion relies on the fact that MDB_PERMANENT is equal to MDB_PG_FLAGS_PERMANENT. Let's be more explicit and convert the state to flags manually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028234815.613226-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: bridge: mcast: add support for raw L2 multicast groupsNikolay Aleksandrov
Extend the bridge multicast control and data path to configure routes for L2 (non-IP) multicast groups. The uapi struct br_mdb_entry union u is extended with another variant, mac_addr, which does not change the structure size, and which is valid when the proto field is zero. To be compatible with the forwarding code that is already in place, which acts as an IGMP/MLD snooping bridge with querier capabilities, we need to declare that for L2 MDB entries (for which there exists no such thing as IGMP/MLD snooping/querying), that there is always a querier. Otherwise, these entries would be flooded to all bridge ports and not just to those that are members of the L2 multicast group. Needless to say, only permanent L2 multicast groups can be installed on a bridge port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30Merge branch 'sfc-ef100-tso-enhancements'Jakub Kicinski
Edward Cree says: ==================== sfc: EF100 TSO enhancements Support TSO over encapsulation (with GSO_PARTIAL), and over VLANs (which the code already handled but we didn't advertise). Also correct our handling of IPID mangling. I couldn't find documentation of exactly what shaped SKBs we can get given, so patch #2 is slightly guesswork, but when I tested TSO over both underlay and (VxLAN) overlay, the checksums came out correctly, so at least in those cases the edits we're making must be the right ones. Similarly, I'm not 100% sure I've correctly understood how FIXEDID and MANGLEID are supposed to work in patch #3. ==================== Link: https://lore.kernel.org/r/6e1ea05f-faeb-18df-91ef-572445691d89@solarflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30sfc: advertise our vlan featuresEdward Cree
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30sfc: only use fixed-id if the skb asks for itEdward Cree
AIUI, the NETIF_F_TSO_MANGLEID flag is a signal to the stack that a driver may _need_ to mangle IDs in order to do TSO, and conversely a signal from the stack that the driver is permitted to do so. Since we support both fixed and incrementing IPIDs, we should rely on the SKB_GSO_FIXEDID flag on a per-skb basis, rather than using the MANGLEID feature to make all TSOs fixed-id. Includes other minor cleanups of ef100_make_tso_desc() coding style. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30sfc: implement encap TSO on EF100Edward Cree
The NIC only needs to know where the headers it has to edit (TCP and inner and outer IPv4) are, which fits GSO_PARTIAL nicely. It also supports non-PARTIAL offload of UDP tunnels, again just needing to be told the outer transport offset so that it can edit the UDP length field. (It's not clear to me whether the stack will ever use the non-PARTIAL version with the netdev feature flags we're setting here.) Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30sfc: extend bitfield macros to 17 fieldsEdward Cree
We need EFX_POPULATE_OWORD_17 for an encap TSO descriptor on EF100. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30Merge branch 'net-ipa-minor-bug-fixes'Jakub Kicinski
Alex Elder says: ==================== net: ipa: minor bug fixes This series fixes several bugs. They are minor, in that the code currently works on supported platforms even without these patches applied, but they're bugs nevertheless and should be fixed. Version 2 improves the commit message for the fourth patch. It also fixes a bug in two spots in the last patch. Both of these changes were suggested by Willem de Bruijn. ==================== Link: https://lore.kernel.org/r/20201028194148.6659-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: ipa: avoid going past end of resource group arrayAlex Elder
The minimum and maximum limits for resources assigned to a given resource group are programmed in pairs, with the limits for two groups set in a single register. If the number of supported resource groups is odd, only half of the register that defines these limits is valid for the last group; that group has no second group in the pair. Currently we ignore this constraint, and it turns out to be harmless, but it is not guaranteed to be. This patch addresses that, and adds support for programming the 5th resource group's limits. Rework how the resource group limit registers are programmed by having a single function program all group pairs rather than having one function program each pair. Add the programming of the 4-5 resource group pair limits to this function. If a resource group is not supported, pass a null pointer to ipa_resource_config_common() for that group and have that function write zeroes in that case. Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: ipa: distinguish between resource group typesAlex Elder
The number of resource groups supported by the hardware can be different for source and destination resources. Determine the number supported for each using separate functions. Make the functions inline end move their definitions into "ipa_reg.h", because they determine whether certain register definitions are valid. Pass just the IPA hardware version as argument. IPA_RESOURCE_GROUP_COUNT represents the maximum number of resource groups the driver supports for any hardware version. Change that symbol to be two separate constants, one for source and the other for destination resource groups. Rename them to end with "_MAX" rather than "_COUNT", to reflect their true purpose. Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: ipa: assign endpoint to a resource groupAlex Elder
The IPA hardware manages various resources (e.g. descriptors) internally to perform its functions. The resources are grouped, allowing different endpoints to use separate resource pools. This way one group of endpoints can be configured to operate unaffected by the resource use of endpoints in a different group. Endpoints should be assigned to a resource group, but we currently don't do that. Define a new resource_group field in the endpoint configuration data, and use it to assign the proper resource group to use for each AP endpoint. Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: ipa: fix resource group field mask definitionAlex Elder
The mask for the RSRC_GRP field in the INIT_RSRC_GRP endpoint initialization register is incorrectly defined for IPA v4.2 (where it is only one bit wide). So we need to fix this. The fix is not straightforward, however. Field masks are passed to functions like u32_encode_bits(), and for that they must be constant. To address this, we define a new inline function that returns the *encoded* value to use for a given RSRC_GRP field, which depends on the IPA version. The caller can then use something like this, to assign a given endpoint resource id 1: u32 offset = IPA_REG_ENDP_INIT_RSRC_GRP_N_OFFSET(endpoint_id); u32 val = rsrc_grp_encoded(ipa->version, 1); iowrite32(val, ipa->reg_virt + offset); The next patch requires this fix. Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: ipa: assign proper packet context baseAlex Elder
At the end of ipa_mem_setup() we write the local packet processing context base register to tell it where the processing context memory is. But we are writing the wrong value. The value written turns out to be the offset of the modem header memory region (assigned earlier in the function). Fix this bug. Tested-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: dec: tulip: de2104x: Add shutdown handler to stop NICMoritz Fischer
The driver does not implement a shutdown handler which leads to issues when using kexec in certain scenarios. The NIC keeps on fetching descriptors which gets flagged by the IOMMU with errors like this: DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000 DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000 DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000 DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000 DMAR: DMAR:[DMA read] Request device [5e:00.0]fault addr fffff000 Signed-off-by: Moritz Fischer <mdf@kernel.org> Link: https://lore.kernel.org/r/20201028172125.496942-1-mdf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30net: phy: marvell: add special handling of Finisar modules with 88E1111Robert Hancock
The Finisar FCLF8520P2BTL 1000BaseT SFP module uses a Marvel 88E1111 PHY with a modified PHY ID. Add support for this ID using the 88E1111 methods. By default these modules do not have 1000BaseX auto-negotiation enabled, which is not generally desirable with Linux networking drivers. Add handling to enable 1000BaseX auto-negotiation when these modules are used in 1000BaseX mode. Also, some special handling is required to ensure that 1000BaseT auto-negotiation is enabled properly when desired. Based on existing handling in the AMD xgbe driver and the information in the Finisar FAQ: https://www.finisar.com/sites/default/files/resources/an-2036_1000base-t_sfp_faqreve1.pdf Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20201028171540.1700032-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>