summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-15Merge tag 'spi-fix-v5.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small fixes. Mostly driver specific but there's one in the core which fixes a deadlock when adding devices on spi-mux that's triggered because spi-mux is a SPI device which is itself a SPI controller and so can instantiate devices when registered. We were using a global lock to protect against reusing chip selects but they're a per controller thing so moving the lock per controller resolves that" * tag 'spi-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi-mux: Fix false-positive lockdep splats spi: Fix deadlock when adding SPI controllers on SPI buses spi: bcm-qspi: clear MSPI spifie interrupt during probe spi: spi-nxp-fspi: don't depend on a specific node name erratum workaround spi: mediatek: skip delays if they are 0 spi: atmel: Fix PDC transfer setup bug spi: spidev: Add SPI ID table spi: Use 'flash' node name instead of 'spi-flash' in example
2021-10-15Merge tag 'regulator-fix-v5.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Just a trivial fix to the MAINTAINERS file for an update missed during conversion of the DT bindings to YAML format" * tag 'regulator-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: MAINTAINERS: rectify entry for SY8106A REGULATOR DRIVER
2021-10-15ksmbd: validate credit charge after validating SMB2 PDU body sizeRalph Boehme
smb2_validate_credit_charge() accesses fields in the SMB2 PDU body, but until smb2_calc_size() is called the PDU has not yet been verified to be large enough to access the PDU dynamic part length field. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-15ksmbd: add buffer validation for smb directHyunchul Lee
Add buffer validation for smb direct. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-15ksmbd: limit read/write/trans buffer size not to exceed 8MBNamjae Jeon
ksmbd limit read/write/trans buffer size not to exceed maximum 8MB. And set the minimum value of max response buffer size to 64KB. Windows client doesn't send session setup request if ksmbd set max trans/read/write size lower than 64KB in smb2 negotiate. It means windows allow at least 64 KB or more about this value. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-15Merge tag 'mtd/fixes-for-5.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fix from Miquel Raynal: "Raw NAND controller driver fix: - Qcom: Update code word value for raw reads (QPIC v2+)" * tag 'mtd/fixes-for-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: qcom: Update code word value for raw read
2021-10-15Merge tag 'drm-fixes-2021-10-15-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "It has a few scattered msm and i915 fixes, a few core fixes and a mediatek feature revert. I've had to pick a bunch of patches into this, as the drm-misc-fixes tree had a bunch of vc4 patches I wasn't comfortable with sending to you at least as part of this, they were delayed due to your reverts. If it's really useful as fixes I'll do a separate pull. Summary: Core: - clamp fbdev size - edid cap blocks read to avoid out of bounds panel: - fix missing crc32 dependency msm: - Fix a new crash on dev file close if the dev file was opened when GPU is not loaded (such as missing fw in initrd) - Switch to single drm_sched_entity per priority level per drm_file to unbreak multi-context userspace - Serialize GMU access to fix GMU OOB errors - Various error path fixes - A couple integer overflow fixes - Fix mdp5 cursor plane WARNs i915: - Fix ACPI object leak - Fix context leak in user proto-context creation - Fix missing i915_sw_fence_fini call hyperv: - hide hw pointer nouveau: - fix engine selection bit r128: - fix UML build rcar-du: - unconncted LVDS regression fix mediatek: - revert CMDQ refinement patches" * tag 'drm-fixes-2021-10-15-1' of git://anongit.freedesktop.org/drm/drm: (34 commits) drm/panel: olimex-lcd-olinuxino: select CRC32 drm/r128: fix build for UML drm/nouveau/fifo: Reinstate the correct engine bit programming drm/hyperv: Fix double mouse pointers drm/fbdev: Clamp fbdev surface size if too large drm/edid: In connector_bad_edid() cap num_of_ext by num_blocks read drm/i915: Free the returned object of acpi_evaluate_dsm() drm/i915: Fix bug in user proto-context creation that leaked contexts drm: rcar-du: Don't create encoder for unconnected LVDS outputs drm/msm/dsi: fix off by one in dsi_bus_clk_enable error handling drm/msm/dsi: Fix an error code in msm_dsi_modeset_init() drm/msm/dsi: dsi_phy_14nm: Take ready-bit into account in poll_for_ready drm/msm/dsi/phy: fix clock names in 28nm_8960 phy drm/msm/dpu: Fix address of SM8150 PINGPONG5 IRQ register drm/msm: Do not run snapshot on non-DPU devices drm/msm/a3xx: fix error handling in a3xx_gpu_init() drm/msm/a4xx: fix error handling in a4xx_gpu_init() drm/msm: Fix null pointer dereference on pointer edp drm/msm/mdp5: fix cursor-related warnings drm/msm: Avoid potential overflow in timeout_to_jiffies() ...
2021-10-15Merge tag 'ntfs3_for_5.15' of ↵Linus Torvalds
git://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: "Use the new api for mounting as requested by Christoph. Also fixed: - some memory leaks and panic - xfstests (tested on x86_64) generic/016 generic/021 generic/022 generic/041 generic/274 generic/423 - some typos, wrong returned error codes, dead code, etc" * tag 'ntfs3_for_5.15' of git://github.com/Paragon-Software-Group/linux-ntfs3: (70 commits) fs/ntfs3: Check for NULL pointers in ni_try_remove_attr_list fs/ntfs3: Refactor ntfs_read_mft fs/ntfs3: Refactor ni_parse_reparse fs/ntfs3: Refactor ntfs_create_inode fs/ntfs3: Refactor ntfs_readlink_hlp fs/ntfs3: Rework ntfs_utf16_to_nls fs/ntfs3: Fix memory leak if fill_super failed fs/ntfs3: Keep prealloc for all types of files fs/ntfs3: Remove unnecessary functions fs/ntfs3: Forbid FALLOC_FL_PUNCH_HOLE for normal files fs/ntfs3: Refactoring of ntfs_set_ea fs/ntfs3: Remove locked argument in ntfs_set_ea fs/ntfs3: Use available posix_acl_release instead of ntfs_posix_acl_release fs/ntfs3: Check for NULL if ATTR_EA_INFO is incorrect fs/ntfs3: Refactoring of ntfs_init_from_boot fs/ntfs3: Reject mount if boot's cluster size < media sector size fs/ntfs3: Refactoring lock in ntfs_init_acl fs/ntfs3: Change posix_acl_equiv_mode to posix_acl_update_mode fs/ntfs3: Pass flags to ntfs_set_ea in ntfs_set_acl_ex fs/ntfs3: Refactor ntfs_get_acl_ex for better readability ...
2021-10-16KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guestMichael Ellerman
We call idle_kvm_start_guest() from power7_offline() if the thread has been requested to enter KVM. We pass it the SRR1 value that was returned from power7_idle_insn() which tells us what sort of wakeup we're processing. Depending on the SRR1 value we pass in, the KVM code might enter the guest, or it might return to us to do some host action if the wakeup requires it. If idle_kvm_start_guest() is able to handle the wakeup, and enter the guest it is supposed to indicate that by returning a zero SRR1 value to us. That was the behaviour prior to commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C"), however in that commit the handling of SRR1 was reworked, and the zeroing behaviour was lost. Returning from idle_kvm_start_guest() without zeroing the SRR1 value can confuse the host offline code, causing the guest to crash and other weirdness. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
2021-10-16KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()Michael Ellerman
In commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") kvm_start_guest() became idle_kvm_start_guest(). The old code allocated a stack frame on the emergency stack, but didn't use the frame to store anything, and also didn't store anything in its caller's frame. idle_kvm_start_guest() on the other hand is written more like a normal C function, it creates a frame on entry, and also stores CR/LR into its callers frame (per the ABI). The problem is that there is no caller frame on the emergency stack. The emergency stack for a given CPU is allocated with: paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; So emergency_sp actually points to the first address above the emergency stack allocation for a given CPU, we must not store above it without first decrementing it to create a frame. This is different to the regular kernel stack, paca->kstack, which is initialised to point at an initial frame that is ready to use. idle_kvm_start_guest() stores the backchain, CR and LR all of which write outside the allocation for the emergency stack. It then creates a stack frame and saves the non-volatile registers. Unfortunately the frame it creates is not large enough to fit the non-volatiles, and so the saving of the non-volatile registers also writes outside the emergency stack allocation. The end result is that we corrupt whatever is at 0-24 bytes, and 112-248 bytes above the emergency stack allocation. In practice this has gone unnoticed because the memory immediately above the emergency stack happens to be used for other stack allocations, either another CPUs mc_emergency_sp or an IRQ stack. See the order of calls to irqstack_early_init() and emergency_stack_init(). The low addresses of another stack are the top of that stack, and so are only used if that stack is under extreme pressue, which essentially never happens in practice - and if it did there's a high likelyhood we'd crash due to that stack overflowing. Still, we shouldn't be corrupting someone else's stack, and it is purely luck that we aren't corrupting something else. To fix it we save CR/LR into the caller's frame using the existing r1 on entry, we then create a SWITCH_FRAME_SIZE frame (which has space for pt_regs) on the emergency stack with the backchain pointing to the existing stack, and then finally we switch to the new frame on the emergency stack. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au
2021-10-15Merge branch 'tcp-md5-vrf-fix'David S. Miller
Leonard Crestez says: ==================== tcp: md5: Fix overlap between vrf and non-vrf keys With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to accept connection from the same client address in different VRFs. It is also possible to set different MD5 keys for these clients which differ only in the tcpm_l3index field. This appears to work when distinguishing between different VRFs but not between non-VRF and VRF connections. In particular: * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key. This means that adding a key with l3index != 0 after a key with l3index == 0 will cause the earlier key to be deleted. Both keys can be present if the non-vrf key is added later. * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This casues failures if the passwords differ. This can be fixed by making tcp_md5_do_lookup_exact perform an actual exact comparison on l3index and by making __tcp_md5_do_lookup perfer vrf-bound keys above other considerations like prefixlen. The fact that keys with l3index==0 affect VRF connections is usually not desirable, VRFs are meant to be completely independent. This behavior needs to preserved for backwards compatibility. Also, applications can just bind listen sockets to VRF and never specify TCP_MD5SIG_FLAG_IFINDEX at all. So far the combination of TCP_MD5SIG_FLAG_IFINDEX with tcpm_ifindex == 0 was an error, accept this to mean "key only applies to default VRF". This is what applications using VRFs for traffic separation want. This also contains tests for the second part. It does not contain tests for overlapping keys, that would require more changes in nettest to add multiple keys. These scenarios are also covered by my tests for TCP-AO, especially around this area: https://github.com/cdleonard/tcp-authopt-test/blob/main/tcp_authopt_test/test_vrf_bind.py Changes since V2: * Rename --do-bind-key-ifindex to --force-bind-key-ifindex * Fix referencing TCP_MD5SIG_FLAG_IFINDEX as TCP_MD5SIG_IFINDEX Link to v2: https://lore.kernel.org/netdev/cover.1634107317.git.cdleonard@gmail.com/ Changes since V1: * Accept (TCP_MD5SIG_IFINDEX with tcpm_ifindex == 0) * Add flags for explicitly including or excluding TCP_MD5SIG_FLAG_IFINDEX to nettest * Add few more tests in fcnal-test.sh. Link to v1: https://lore.kernel.org/netdev/3d8387d499f053dba5cd9184c0f7b8445c4470c6.1633542093.git.cdleonard@gmail.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15selftests: net/fcnal: Test --{force,no}-bind-key-ifindexLeonard Crestez
Test that applications binding listening sockets to VRFs without specifying TCP_MD5SIG_FLAG_IFINDEX will work as expected. This would be broken if __tcp_md5_do_lookup always made a strict comparison on l3index. See this email: https://lore.kernel.org/netdev/209548b5-27d2-2059-f2e9-2148f5a0291b@gmail.com/ Applications using tcp_l3mdev_accept=1 and a single global socket (not bound to any interface) also should have a way to specify keys that are only for the default VRF, this is done by --force-bind-key-ifindex without otherwise binding to a device. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15selftests: nettest: Add --{force,no}-bind-key-ifindexLeonard Crestez
These options allow explicit control over the TCP_MD5SIG_FLAG_IFINDEX flag instead of always setting it based on binding to an interface. Do this by converting to getopt_long because nettest has too many single-character flags already and getopt_long is widely used in selftests. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15tcp: md5: Allow MD5SIG_FLAG_IFINDEX with ifindex=0Leonard Crestez
Multiple VRFs are generally meant to be "separate" but right now md5 keys for the default VRF also affect connections inside VRFs if the IP addresses happen to overlap. So far the combination of TCP_MD5SIG_FLAG_IFINDEX with tcpm_ifindex == 0 was an error, accept this to mean "key only applies to default VRF". This is what applications using VRFs for traffic separation want. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15tcp: md5: Fix overlap between vrf and non-vrf keysLeonard Crestez
With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to accept connection from the same client address in different VRFs. It is also possible to set different MD5 keys for these clients which differ only in the tcpm_l3index field. This appears to work when distinguishing between different VRFs but not between non-VRF and VRF connections. In particular: * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key. This means that adding a key with l3index != 0 after a key with l3index == 0 will cause the earlier key to be deleted. Both keys can be present if the non-vrf key is added later. * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This casues failures if the passwords differ. Fix this by making tcp_md5_do_lookup_exact perform an actual exact comparison on l3index and by making __tcp_md5_do_lookup perfer vrf-bound keys above other considerations like prefixlen. Fixes: dea53bb80e07 ("tcp: Add l3index to tcp_md5sig_key and md5 functions") Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15lan78xx: select CRC32Vegard Nossum
Fix the following build/link error by adding a dependency on the CRC32 routines: ld: drivers/net/usb/lan78xx.o: in function `lan78xx_set_multicast': lan78xx.c:(.text+0x48cf): undefined reference to `crc32_le' The actual use of crc32_le() comes indirectly through ether_crc(). Fixes: 55d7de9de6c30 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15Merge branch 'dpaa2-irq-coalescing'David S. Miller
Ioana Ciornei says: ==================== dpaa2-eth: add support for IRQ coalescing This patch set adds support for interrupts coalescing in dpaa2-eth. The first patches add support for the hardware level configuration of the IRQ coalescing in the dpio driver, while the ones that touch the dpaa2-eth driver are responsible for the ethtool user interraction. With the adaptive IRQ coalescing in place and enabled we have observed the following changes in interrupt rates on one A72 core @2.2GHz (LX2160A) while running a Rx TCP flow. The TCP stream is sent on a 10Gbit link and the only cpu that does Rx is fully utilized. IRQ rate (irqs / sec) before: 4.59 Gbits/sec 24k after: 5.67 Gbits/sec 1.3k ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dpaa2: add adaptive interrupt coalescingIoana Ciornei
Add support for adaptive interrupt coalescing to the dpaa2-eth driver. First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported coalesce parameter and the requested state is configured through the dpio APIs added in the previous patch. Besides the ethtool API interaction, we keep track of how many bytes and frames are dequeued per CDAN (Channel Data Availability Notification) and update the Net DIM instance through the dpaa2_io_update_net_dim() API. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: add Net DIM integrationIoana Ciornei
Use the generic dynamic interrupt moderation (dim) framework to implement adaptive interrupt coalescing on Rx. With the per-packet interrupt scheme, a high interrupt rate has been noted for moderate traffic flows leading to high CPU utilization. The dpio driver exports new functions to enable/disable adaptive IRQ coalescing on a DPIO object, to query the state or to update Net DIM with a new set of bytes and frames dequeued. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dpaa2: add support for manual setup of IRQ coalesingIoana Ciornei
Use the newly exported dpio driver API to manually configure the IRQ coalescing parameters requested by the user. The .get_coalesce() and .set_coalesce() net_device callbacks are implemented and directly export or setup the rx-usecs on all the channels configured. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: add support for irq coalescing per software portalIoana Ciornei
In DPAA2 based SoCs, the IRQ coalesing support per software portal has 2 configurable parameters: - the IRQ timeout period (QBMAN_CINH_SWP_ITPR): how many 256 QBMAN cycles need to pass until a dequeue interrupt is asserted. - the IRQ threshold (QBMAN_CINH_SWP_DQRR_ITR): how many dequeue responses in the DQRR ring would generate an IRQ. Add support for setting up and querying these IRQ coalescing related parameters. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: extract the QBMAN clock frequency from the attributesIoana Ciornei
Through the dpio_get_attributes() firmware call the dpio driver has access to the QBMAN clock frequency. Extend the structure which holds the firmware's response so that we can have access to this information. This will be needed in the next patches which also add support for interrupt coalescing which needs to be configured based on the frequency. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15Merge tag 'usb-serial-5.15-rc6' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for 5.15-rc6 Here are some new modem device ids. All have been in linux-next with no reported issues. * tag 'usb-serial-5.15-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: qcserial: add EM9191 QDL support USB: serial: option: add Quectel EC200S-CN module support USB: serial: option: add prod. id for Quectel EG91 USB: serial: option: add Telit LE910Cx composition 0x1204
2021-10-15Merge branch 'L4S-style-ce_threshold_ect1-marking'David S. Miller
Eric Dumazet says: ==================== net/sched: implement L4S style ce_threshold_ect1 marking As suggested by Ingemar Johansson, Neal Cardwell, and others, fq_codel can be used for Low Latency, Low Loss, Scalable Throughput (L4S) with a small change. In ce_threshold_ect1 mode, only ECT(1) packets can be marked to CE if their sojourn time is above the threshold. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15fq_codel: implement L4S style ce_threshold_ect1 markingEric Dumazet
Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency, Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold. If enabled, only packets with ECT(1) can be transformed to CE if their sojourn time is above the ce_threshold. Note that this new option does not change rules for codel law. In particular, if TCA_FQ_CODEL_ECN is left enabled (this is the default when fq_codel qdisc is created), ECT(0) packets can still get CE if codel law (as governed by limit/target) decides so. Section 4.3.b of current draft [1] states: b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can be used for L4S. For instance within each queue of an FQ-CoDel system, as well as a CoDel AQM, there is typically also ECN marking at an immediate (unsmoothed) shallow threshold to support use in data centres (see Sec.5.2.7 of [RFC8290]). This can be modified so that the shallow threshold is solely applied to ECT(1) packets. Then if there is a flow of non-ECN or ECT(0) packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is applied; while if there is a flow of ECT(1) packets in the queue, the shallower (typically sub-millisecond) threshold is applied. Tested: tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec netperf ... -t TCP_STREAM -- K dctcp tc -s -d qd sh dev eth1 qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64 Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013) backlog 0b 0p requeues 152013 maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639 new_flows_len 0 old_flows_len 0 [1] L4S current draft: https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Cc: Bob Briscoe <in@bobbriscoe.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: add skb_get_dsfield() helperEric Dumazet
skb_get_dsfield(skb) gets dsfield from skb, or -1 if an error was found. This is basically a wrapper around ipv4_get_dsfield() and ipv6_get_dsfield(). Used by following patch for fq_codel. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15tcp: switch orphan_count to bare per-cpu countersEric Dumazet
Use of percpu_counter structure to track count of orphaned sockets is causing problems on modern hosts with 256 cpus or more. Stefan Bach reported a serious spinlock contention in real workloads, that I was able to reproduce with a netfilter rule dropping incoming FIN packets. 53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | --53.51%--_raw_spin_lock_irqsave | --53.51%--__percpu_counter_sum tcp_check_oom | |--39.03%--__tcp_close | tcp_close | inet_release | inet6_release | sock_close | __fput | ____fput | task_work_run | exit_to_usermode_loop | do_syscall_64 | entry_SYSCALL_64_after_hwframe | __GI___libc_close | --14.48%--tcp_out_of_resources tcp_write_timeout tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn expire_timers __run_timers run_timer_softirq __softirqentry_text_start As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter batch for dst entries accounting"), default batch size is too big for the default value of tcp_max_orphans (262144). But even if we reduce batch sizes, there would still be cases where the estimated count of orphans is beyond the limit, and where tcp_too_many_orphans() has to call the expensive percpu_counter_sum_positive(). One solution is to use plain per-cpu counters, and have a timer to periodically refresh this cache. Updating this cache every 100ms seems about right, tcp pressure state is not radically changing over shorter periods. percpu_counter was nice 15 years ago while hosts had less than 16 cpus, not anymore by current standards. v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m, reported by kernel test robot <lkp@intel.com> Remove unused socket argument from tcp_too_many_orphans() Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stefan Bach <sfb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15mctp: Avoid leak of mctp_sk_keyMatt Johnston
mctp_key_alloc() returns a key already referenced. The mctp_route_input() path receives a packet for a bind socket and allocates a key. It passes the key to mctp_key_add() which takes a refcount and adds the key to lists. mctp_route_input() should then release its own refcount when setting the key pointer to NULL. In the mctp_alloc_local_tag() path (for mctp_local_output()) we similarly need to unref the key before returning (mctp_reserve_tag() takes a refcount and adds the key to lists). Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15sctp: fix transport encap_port update in sctp_vtag_verifyXin Long
transport encap_port update should be updated when sctp_vtag_verify() succeeds, namely, returns 1, not returns 0. Correct it in this patch. While at it, also fix the indentation. Fixes: a1dd2cf2f1ae ("sctp: allow changing transport encap_port by peer packets") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15ptp: fix error print of ptp_kvm on X86_64 platformKele Huang
Commit a86ed2cfa13c5 ("ptp: Don't print an error if ptp_kvm is not supported") fixes the error message print on ARM platform by only concerning about the case that the error returned from kvm_arch_ptp_init() is not -EOPNOTSUPP. Although the ARM platform returns -EOPNOTSUPP if ptp_kvm is not supported while X86_64 platform returns -KVM_EOPNOTSUPP, both error codes share the same value 95. Actually kvm_arch_ptp_init() on X86_64 platform can return three kinds of errors (-KVM_ENOSYS, -KVM_EOPNOTSUPP and -KVM_EFAULT). The problem is that -KVM_EOPNOTSUPP is masked out and -KVM_EFAULT is ignored among them. This patch fixes this by returning them to ptp_kvm_init() respectively. Signed-off-by: Kele Huang <huangkele@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15Merge branch 'qca8337-improvements'David S. Miller
Ansuel Smith says: ==================== Multiple improvement for qca8337 switch This series is the final step of a long process of porting 80+ devices to use the new qca8k driver instead of the hacky qca one based on never merged swconfig platform. Some background to justify all these additions. QCA used a special binding to declare raw initval to set the swich. I made a script to convert all these magic values and convert 80+ dts and scan all the needed "unsupported regs". We find a baseline where we manage to find the common and used regs so in theory hopefully we don't have to add anymore things. We discovered lots of things with this, especially about how differently qca8327 works compared to qca8337. In short, we found that qca8327 have some problem with suspend/resume for their internal phy. It instead sets some dedicated regs that suspend the phy without setting the standard bit. First 4 patch are to fix this. There is also a patch about preferring master. This is directly from the original driver and it seems to be needed to prevent some problem with the pause frame. Every ipq806x target sets the mac power sel and this specific reg regulates the output voltage of the regulator. Without this some instability can occur. Some configuration (for some reason) swap mac6 with mac0. We add support for this. Also, we discovered that some device doesn't work at all with pll enabled for sgmii line. In the original code this was based on the switch revision. In later revision the pll regs were decided based on the switch type (disabled for qca8327 and enabled for qca8337) but still some device had that disabled in the initval regs. Considering we found at least one qca8337 device that required pll disabled to work (no traffic problem) we decided to introduce a binding to enable pll and set it only with that. Lastly, we add support for led open drain that require the power-on-sel to set. Also, some device have only the power-on-sel set in the initval so we add also support for that. This is needed for the correct function of the switch leds. Qca8327 have a special reg in the pws regs that set it to a reduced 48pin layout. This is needed or the switch doesn't work. These are all the special configuration we find on all these devices that are from various targets. Mostly ath79, ipq806x and bcm53xx. Changes v7: - Fix missing newline in yaml - Handle error with wrong cpu port detected - Move yaml commit as last to fix bot error Changes v6: - Convert Documentation to yaml - Add extra check for cpu port and invalid phy mode - Add co developed by tag to give credits to Matthew Changes v5: - Swap patch. Document first then implement. - Fix some grammar error reported. - Rework function. Remove phylink mac_config DT scan and move everything to dedicated function in probe. - Introduce new logic for delay selection where is also supported with internal delay declared and rgmii set as phy mode - Start working on ymal conversion. Will later post this in v6 when we finally take final decision about mac swap. Changes v4: - Fix typo in SGMII falling edge about using PHY id instead of switch id Changes v3: - Drop phy patches (proposed separateley) - Drop special pwr binding. Rework to ipq806x specific - Better describe compatible and add serial print on switch chip - Drop mac exchange. Rework falling edge and move it to mac_config - Add support for port 6 cpu port. Drop hardcoded cpu port to port0 - Improve port stability with sgmii. QCA source have intenal delay also for sgmii - Add warning with pll enabled on wrong configuration Changes v2: - Reword Documentation patch to dt-bindings - Propose first 2 phy patch to net - Better describe and add hint on how to use all the new bindings - Rework delay scan function and move to phylink mac_config - Drop package48 wrong binding - Introduce support for qca8328 switch - Fix wrong binding name power-on-sel - Return error on wrong config with led open drain and ignore-power-on-sel not set ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: convert to YAML schemaMatthew Hagan
Convert the qca8k bindings to YAML format. Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: ipq8064-mdio: fix warning with new qca8k switchAnsuel Smith
Fix warning now that we have qca8k switch Documentation using yaml. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: move port config to dedicated structAnsuel Smith
Move ports related config to dedicated struct to keep things organized. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: set internal delay also for sgmiiAnsuel Smith
QCA original code report port instability and sa that SGMII also require to set internal delay. Generalize the rgmii delay function and apply the advised value if they are not defined in DT. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for QCA8328Ansuel Smith
QCA8328 switch is the bigger brother of the qca8327. Same regs different chip. Change the function to set the correct pin layout and introduce a new match_data to differentiate the 2 switch as they have the same ID and their internal PHY have the same ID. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: document support for qca8328Ansuel Smith
QCA8328 is the bigger brother of qca8327. Document the new compatible binding and add some information to understand the various switch compatible. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for pws config regAnsuel Smith
Some qca8327 switch require to force the ignore of power on sel strapping. Some switch require to set the led open drain mode in regs instead of using strapping. While most of the device implements this using the correct way using pin strapping, there are still some broken device that require to be set using sw regs. Introduce a new binding and support these special configuration. As led open drain require to ignore pin strapping to work, the probe fails with EINVAL error with incorrect configuration. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Document qca,led-open-drain bindingAnsuel Smith
Document new binding qca,ignore-power-on-sel used to ignore power on strapping and use sw regs instead. Document qca,led-open.drain to set led to open drain mode, the qca,ignore-power-on-sel is mandatory with this enabled or an error will be reported. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add explicit SGMII PLL enableAnsuel Smith
Support enabling PLL on the SGMII CPU port. Some device require this special configuration or no traffic is transmitted and the switch doesn't work at all. A dedicated binding is added to the CPU node port to apply the correct reg on mac config. Fail to correctly configure sgmii with qca8327 switch and warn if pll is used on qca8337 with a revision greater than 1. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Document qca,sgmii-enable-pllAnsuel Smith
Document qca,sgmii-enable-pll binding used in the CPU nodes to enable SGMII PLL on MAC config. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: rework rgmii delay logic and scan for cpu port 6Ansuel Smith
Future proof commit. This switch have 2 CPU ports and one valid configuration is first CPU port set to sgmii and second CPU port set to rgmii-id. The current implementation detects delay only for CPU port zero set to rgmii and doesn't count any delay set in a secondary CPU port. Drop the current delay scan function and move it to the sgmii parser function to generalize and implicitly add support for secondary CPU port set to rgmii-id. Introduce new logic where delay is enabled also with internal delay binding declared and rgmii set as PHY mode. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for cpu port 6Ansuel Smith
Currently CPU port is always hardcoded to port 0. This switch have 2 CPU ports. The original intention of this driver seems to be use the mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration where device have connected only the CPU port 6. To skip the introduction of a new binding, rework the driver to address the secondary CPU port as primary and drop any reference of hardcoded port. With configuration of mac06 exchange, just skip the definition of port0 and define the CPU port as a secondary. The driver will autoconfigure the switch to use that as the primary CPU port. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Document support for CPU port 6Ansuel Smith
The switch now support CPU port to be set 6 instead of be hardcoded to 0. Document support for it and describe logic selection. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for sgmii falling edgeAnsuel Smith
Add support for this in the qca8k driver. Also add support for SGMII rx/tx clock falling edge. This is only present for pad0, pad5 and pad6 have these bit reserved from Documentation. Add a comment that this is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and setting falling in port0 applies to both configuration with sgmii used for port0 or port6. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dt-bindings: net: dsa: qca8k: Add SGMII clock phase propertiesAnsuel Smith
Add names and descriptions of additional PORT0_PAD_CTRL properties. qca,sgmii-(rx|tx)clk-falling-edge are for setting the respective clock phase to failling edge. Co-developed-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Matthew Hagan <mnhagan88@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15dsa: qca8k: add mac_power_sel supportAnsuel Smith
Add missing mac power sel support needed for ipq8064/5 SoC that require 1.8v for the internal regulator port instead of the default 1.5v. If other device needs this, consider adding a dedicated binding to support this. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15xen-netback: Remove redundant initialization of variable errColin Ian King
The variable err is being initialized with a value that is never read, it is being updated immediately afterwards. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15page_pool: disable dma mapping support for 32-bit arch with 64-bit DMAYunsheng Lin
As the 32-bit arch with 64-bit DMA seems to rare those days, and page pool might carry a lot of code and complexity for systems that possibly. So disable dma mapping support for such systems, if drivers really want to work on such systems, they have to implement their own DMA-mapping fallback tracking outside page_pool. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15perf/x86/msr: Add Sapphire Rapids CPU supportKan Liang
SMI_COUNT MSR is supported on Sapphire Rapids CPU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com