linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-12-14	ibmvnic: Update driver return codes	Dany Madden
	Update return codes to be more informative. Signed-off-by: Jacob Root <otis@otisroot.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Merge branch 'mlxsw-fixes'	David S. Miller
	Ido Schimmel says: ==================== mlxsw: MAC profiles occupancy fix Patch #1 fixes a router interface (RIF) MAC profiles occupancy bug that was merged in the last cycle. Patch #2 adds a selftest that fails without the fix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	selftests: mlxsw: Add a test case for MAC profiles consolidation	Danielle Ratson
	Add a test case to cover the bug fixed by the previous patch. Edit the MAC address of one netdev so that it matches the MAC address of the second netdev. Verify that the two MAC profiles were consolidated by testing that the MAC profiles occupancy decreased by one. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	mlxsw: spectrum_router: Consolidate MAC profiles when possible	Danielle Ratson
	Currently, when setting a router interface (RIF) MAC address while the MAC profile is not shared with other RIFs, the profile is edited so that the new MAC address is assigned to it. This does not take into account a situation in which the new MAC address already matches an existing MAC profile. In that situation, two MAC profiles will be occupied even though they hold MAC addresses from the same profile. In order to prevent that, add a check to ensure that editing a MAC profile takes place only when the new MAC address does not match an existing profile. Fixes: 605d25cd782a6 ("mlxsw: spectrum_router: Add RIF MAC profiles support") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Tested-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Revert "pktgen: use min() to make code cleaner"	David S. Miller
	This reverts commit 13510fef48a3803d9ee8f044b015dacfb06fe0f5. Causes build warnings. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	rds: memory leak in __rds_conn_create()	Hangyu Hua
	__rds_conn_create() did not release conn->c_path when loop_trans != 0 and trans->t_prefer_loopback != 0 and is_outgoing == 0. Fixes: aced3ce57cd3 ("RDS tcp loopback connection can hang") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Sharath Srinivasan <sharath.srinivasan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	pktgen: use min() to make code cleaner	Changcheng Deng
	Use min() in order to make code cleaner. Issue found by coccinelle. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Merge tag 'mac80211-for-net-2021-12-14' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A fairly large number of fixes this time: * fix a station info memory leak on insert collisions * a rate control fix for retransmissions * two aggregation setup fixes * reload current regdomain when reloading database * a locking fix in regulatory work * a probe request allocation size fix in mac80211 * apply TCP vs. aggregation (sk pacing) on mesh * fix ordering of channel context update vs. station state * set up skb->dev for mesh forwarding properly * track QoS data frames only for admission control to avoid out-of-bounds read (found by syzbot) * validate extended element ID vs. existing data to avoid out-of-bounds read (found by syzbot) * fix locking in mac80211 aggregation TX setup * fix traffic stall after HW restart when TXQs are used * fix ordering of reconfig/restart after HW restart * fix interface type for extended aggregation capability lookup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Merge branch 'dsa-fixups'	David S. Miller
	Vladimir Oltean says: ==================== DSA tagger-owned storage fixups It seems that the DSA tagger-owned storage changes were insufficiently tested and do not work in all cases. Specifically, the NXP Bluebox 3 (arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts) got broken by these changes, because (a) I forgot that DSA_TAG_PROTO_SJA1110 exists and differs from DSA_TAG_PROTO_SJA1105 (b) the Bluebox 3 uses a DSA switch tree with 2 switches, and the tagger-owned storage patches don't cover that use case well, it seems Therefore, I'm sorry to say that there needs to be an API fixup: tagging protocol drivers will from now on connect to individual switches from a tree, rather than to the tree as a whole. This is more robust against various ordering constraints in the DSA probe and teardown paths, and is also symmetrical with the connection API exposed to the switch drivers themselves, which is also per switch. With these changes, the Bluebox 3 also works fine. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net: dsa: make tagging protocols connect to individual switches from a tree	Vladimir Oltean
	On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105, the mechanism through which the tagger connects to the switch tree is broken, due to improper DSA code design. At the time when tag_ops->connect() is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all the ports, so it doesn't know how large the tree is and how many ports it has. It has just seen the first CPU port by this time. As a result, this function will call the tagger's ->connect method too early, and the tagger will connect only to the first switch from the tree. This could be perhaps addressed a bit more simply by just moving the tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup), but there is already a design inconsistency at present: on the switch side, the notification is on a per-switch basis, but on the tagger side, it is on a per-tree basis. Furthermore, the persistent storage itself is per switch (ds->tagger_data). And the tagger connect and disconnect procedures (at least the ones that exist currently) could see a fair bit of simplification if they didn't have to iterate through the switches of a tree. To fix the issue, this change transforms tag_ops->connect(dst) into tag_ops->connect(ds) and moves it somewhere where we already iterate over all switches of a tree. That is in dsa_switch_setup_tag_protocol(), which is a good placement because we already have there the connection call to the switch side of things. As for the dsa_tree_bind_tag_proto() method (called from the code path that changes the tag protocol), things are a bit more complicated because we receive the tree as argument, yet when we unwind on errors, it would be nice to not call tag_ops->disconnect(ds) where we didn't previously call tag_ops->connect(ds). We didn't have this problem before because the tag_ops connection operations passed the entire dst before, and this is more fine grained now. To solve the error rewind case using the new API, we have to create yet one more cross-chip notifier for disconnection, and stay connected with the old tag protocol to all the switches in the tree until we've succeeded to connect with the new one as well. So if something fails half way, the whole tree is still connected to the old tagger. But there may still be leaks if the tagger fails to connect to the 2nd out of 3 switches in a tree: somebody needs to tell the tagger to disconnect from the first switch. Nothing comes for free, and this was previously handled privately by the tagging protocol driver before, but now we need to emit a disconnect cross-chip notifier for that, because DSA has to take care of the unwind path. We assume that the tagging protocol has connected to a switch if it has set ds->tagger_data to something, otherwise we avoid calling its disconnection method in the error rewind path. The rest of the changes are in the tagging protocol drivers, and have to do with the replacement of dst with ds. The iteration is removed and the error unwind path is simplified, as mentioned above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net: dsa: sja1105: fix broken connection with the sja1110 tagger	Vladimir Oltean
	The driver was incorrectly converted assuming that "sja1105" is the only tagger supported by this driver. This results in SJA1110 switches failing to probe: sja1105 spi1.0: Unable to connect to tag protocol "sja1110": -EPROTONOSUPPORT sja1105: probe of spi1.2 failed with error -93 Add DSA_TAG_PROTO_SJA1110 to the list of supported taggers by the sja1105 driver. The sja1105_tagger_data structure format is common for the two tagging protocols. Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net: dsa: tag_sja1105: fix zeroization of ds->priv on tag proto disconnect	Vladimir Oltean
	The method was meant to zeroize ds->tagger_data but got the wrong pointer. Fix this. Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-12-13 This series contains updates to iavf driver only. Dan Carpenter fixes some missing mutex unlocking. Stefan Assmann restores stopping watchdog from overriding to reset state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	bareudp: Add extack support to bareudp_configure()	Guillaume Nault
	Add missing extacks for common configuration errors. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	ethtool: fix null-ptr-deref on ref tracker	Jakub Kicinski
	dev can be a NULL here, not all requests set require_dev. Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net: dev: Change the order of the arguments for the contended condition.	Sebastian Andrzej Siewior
	Change the order of arguments and make qdisc_is_running() appear first. This is more readable for the general case. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	flow_offload: return EOPNOTSUPP for the unsupported mpls action type	Baowen Zheng
	We need to return EOPNOTSUPP for the unsupported mpls action type when setup the flow action. In the original implement, we will return 0 for the unsupported mpls action type, actually we do not setup it and the following actions to the flow action entry. Fixes: 9838b20a7fb2 ("net: sched: take rtnl lock in tc_setup_flow_action()") Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	mmc: sdhci-tegra: Fix switch to HS400ES mode	Prathamesh Shete
	When CMD13 is sent after switching to HS400ES mode, the bus is operating at either MMC_HIGH_26_MAX_DTR or MMC_HIGH_52_MAX_DTR. To meet Tegra SDHCI requirement at HS400ES mode, force SDHCI interface clock to MMC_HS200_MAX_DTR (200 MHz) so that host controller CAR clock and the interface clock are rate matched. Signed-off-by: Prathamesh Shete <pshete@nvidia.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: dfc9700cef77 ("mmc: tegra: Implement HS400 enhanced strobe") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211214113653.4631-1-pshete@nvidia.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-12-14	net: stmmac: fix tc flower deletion for VLAN priority Rx steering	Ong Boon Leong
	To replicate the issue:- 1) Add 1 flower filter for VLAN Priority based frame steering:- $ IFDEVNAME=eth0 $ tc qdisc add dev $IFDEVNAME ingress $ tc qdisc add dev $IFDEVNAME root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 $ tc filter add dev $IFDEVNAME parent ffff: protocol 802.1Q \ flower vlan_prio 0 hw_tc 0 2) Get the 'pref' id $ tc filter show dev $IFDEVNAME ingress 3) Delete a specific tc flower record (say pref 49151) $ tc filter del dev $IFDEVNAME parent ffff: pref 49151 From dmesg, we will observe kernel NULL pointer ooops [ 197.170464] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 197.171367] #PF: supervisor read access in kernel mode [ 197.171367] #PF: error_code(0x0000) - not-present page [ 197.171367] PGD 0 P4D 0 [ 197.171367] Oops: 0000 [#1] PREEMPT SMP NOPTI <snip> [ 197.171367] RIP: 0010:tc_setup_cls+0x20b/0x4a0 [stmmac] <snip> [ 197.171367] Call Trace: [ 197.171367] <TASK> [ 197.171367] ? __stmmac_disable_all_queues+0xa8/0xe0 [stmmac] [ 197.171367] stmmac_setup_tc_block_cb+0x70/0x110 [stmmac] [ 197.171367] tc_setup_cb_destroy+0xb3/0x180 [ 197.171367] fl_hw_destroy_filter+0x94/0xc0 [cls_flower] The above issue is due to previous incorrect implementation of tc_del_vlan_flow(), shown below, that uses flow_cls_offload_flow_rule() to get struct flow_rule rule which is no longer valid for tc filter delete operation. struct flow_rule rule = flow_cls_offload_flow_rule(cls); struct flow_dissector *dissector = rule->match.dissector; So, to ensure tc_del_vlan_flow() deletes the right VLAN cls record for earlier configured RX queue (configured by hw_tc) in tc_add_vlan_flow(), this patch introduces stmmac_rfs_entry as driver-side flow_cls_offload record for 'RX frame steering' tc flower, currently used for VLAN priority. The implementation has taken consideration for future extension to include other type RX frame steering such as EtherType based. v2: - Clean up overly extensive backtrace and rewrite git message to better explain the kernel NULL pointer issue. Fixes: 0e039f5cf86c ("net: stmmac: add RX frame steering based on VLAN priority in tc flower") Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	drivers/perf: hisi: Add driver for HiSilicon PCIe PMU	Qi Liu
	PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported to sample bandwidth, latency, buffer occupation etc. Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is registered as a PMU in /sys/bus/event_source/devices, so users can select target PMU, and use filter to do further sets. Filtering options contains: event - select the event. port - select target Root Ports. Information of Root Ports are shown under sysfs. bdf - select requester_id of target EP device. trig_len - set trigger condition for starting event statistics. trig_mode - set trigger mode. 0 means starting to statistic when bigger than trigger condition, and 1 means smaller. thr_len - set threshold for statistics. thr_mode - set threshold mode. 0 means count when bigger than threshold, and 1 means smaller. Acked-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20211202080633.2919-3-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	docs: perf: Add description for HiSilicon PCIe PMU driver	Qi Liu
	PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported on HiSilicon HIP09 platform. Document it to provide guidance on how to use it. Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20211202080633.2919-2-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	Merge branch 'hwtstamp_bonding'	David S. Miller
	Hangbin Liu says: ==================== net: add new hwtstamp flag HWTSTAMP_FLAG_BONDED_PHC_INDEX This patchset add a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX. When user want to get bond active interface's PHC, they need to add this flag and aware the PHC index may changed. v3: Use bitwise test to check the flags validation v2: rename the flag to HWTSTAMP_FLAG_BONDED_PHC_INDEX ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP	Hangbin Liu
	When there is a failover, the PHC index of bond active interface will be changed. This may break the user space program if the author didn't aware. By setting this flag, the user should aware that the PHC index get/set by syscall is not stable. And the user space is able to deal with it. Without this flag, the kernel will reject the request forwarding to bonding. Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX	Hangbin Liu
	Since commit 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") the user could get bond active interface's PHC index directly. But when there is a failover, the bond active interface will change, thus the PHC index is also changed. This may break the user's program if they did not update the PHC timely. This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX. When the user wants to get the bond active interface's PHC, they need to add this flag and be aware the PHC index may be changed. With the new flag. All flag checks in current drivers are removed. Only the checking in net_hwtstamp_validate() is kept. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error	Thomas Gleixner
	PCI_MSIX_FLAGS_MASKALL is set in the MSI-X control register at MSI-X interrupt setup time. It's cleared on success, but the error handling path only clears the PCI_MSIX_FLAGS_ENABLE bit. That's incorrect as the reset state of the PCI_MSIX_FLAGS_MASKALL bit is zero. That can be observed via lspci: Capabilities: [b0] MSI-X: Enable- Count=67 Masked+ Clear the bit in the error path to restore the reset state. Fixes: 438553958ba1 ("PCI/MSI: Enable and mask MSI-X early") Reported-by: Stefan Roese <sr@denx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Stefan Roese <sr@denx.de> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Marek Vasut <marex@denx.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tufevoqx.ffs@tglx
2021-12-14	PCI/MSI: Mask MSI-X vectors only on success	Stefan Roese
	Masking all unused MSI-X entries is done to ensure that a crash kernel starts from a clean slate, which correponds to the reset state of the device as defined in the PCI-E specificion 3.0 and later: Vector Control for MSI-X Table Entries -------------------------------------- "00: Mask bit: When this bit is set, the function is prohibited from sending a message using this MSI-X Table entry. ... This bit’s state after reset is 1 (entry is masked)." A Marvell NVME device fails to deliver MSI interrupts after trying to enable MSI-X interrupts due to that masking. It seems to take the MSI-X mask bits into account even when MSI-X is disabled. While not specification compliant, this can be cured by moving the masking into the success path, so that the MSI-X table entries stay in device reset state when the MSI-X setup fails. [ tglx: Move it into the success path, add comment and amend changelog ] Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Marek Vasut <marex@denx.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
2021-12-14	dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings	Bhaskara Budiredla
	Add device tree bindings for Last-level-cache Tag-and-data (LLC-TAD) unit PMU for Marvell CN10K SoCs. Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211115043506.6679-3-bbudiredla@marvell.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	drivers: perf: Add LLC-TAD perf counter support	Bhaskara Budiredla
	This driver adds support for Last-level cache tag-and-data unit (LLC-TAD) PMU that is featured in some of the Marvell's CN10K infrastructure silicons. The LLC is divided into 2N slices distributed across N Mesh tiles in a single-socket configuration. The driver always configures the same counter for all of the TADs. The user would end up effectively reserving one of eight counters in every TAD to look across all TADs. The occurrences of events are aggregated and presented to the user at the end of an application run. The driver does not provide a way for the user to partition TADs so that different TADs are used for different applications. The event counters are zeroed to start event counting to avoid any rollover issues. TAD perf counters are 64-bit, so it's not currently possible to overflow event counters at current mesh and core frequencies. To measure tad pmu events use perf tool stat command. For instance: perf stat -e tad_dat_msh_in_dss,tad_req_msh_out_any <application> perf stat -e tad_alloc_any,tad_hit_any,tad_tag_rd <application> Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com> Link: https://lore.kernel.org/r/20211115043506.6679-2-bbudiredla@marvell.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	arm64/xor: use EOR3 instructions when available	Ard Biesheuvel
	Use the EOR3 instruction to implement xor_blocks() if the instruction is available, which is the case if the CPU implements the SHA-3 extension. This is about 20% faster on Apple M1 when using the 5-way version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20211213140252.2856053-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14	powerpc/module_64: Fix livepatching for RO modules	Russell Currey
	Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. Work around this by performing these writes through the text poke area by using patch_instruction(). R_PPC_REL24 is the only relocation type generated by the kpatch-build userspace tool or klp-convert kernel tree that I observed applying a relocation to a post-init module. A more comprehensive solution is planned, but using patch_instruction() for R_PPC_REL24 on should serve as a sufficient fix. This does have a performance impact, I observed ~15% overhead in module_load() on POWER8 bare metal with checksum verification off. Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX") Cc: stable@vger.kernel.org # v5.14+ Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> [mpe: Check return codes from patch_instruction()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-14	perf/smmuv3: Synthesize IIDR from CoreSight ID registers	Robin Murphy
	The SMMU_PMCG_IIDR register was not present in older revisions of the Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of fields from several PIDR registers, allowing us to present a standardized identifier to userspace. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20211117144844.241072-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/smmuv3: Add devicetree support	Jean-Philippe Brucker
	Add device-tree support to the SMMUv3 PMCG driver. Signed-off-by: Jay Chen <jkchen@linux.alibaba.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20211117144844.241072-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	dt-bindings: Add Arm SMMUv3 PMCG binding	Jean-Philippe Brucker
	Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is placed as a sibling node of the SMMU. Although the PMCGs registers may be within the SMMU MMIO region, they are separate devices, and there can be multiple PMCG devices for each SMMU (for example one for the TCU and one for each TBU). Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20211117144844.241072-2-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Add debugfs topology info	Robin Murphy
	In general, detailed performance analysis will require knoweldge of the the SoC beyond the CMN itself - e.g. which actual CPUs/peripherals/etc. are connected to each node. However for certain development and bringup tasks it can be useful to have a quick overview of the CMN internal topology to hand too. Add a debugfs file to map this out. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/159fd4d7e19fb3c8801a8cb64ee73ec50f55903c.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Add CI-700 Support	Robin Murphy
	Add the identifiers and events for the CI-700 coherent interconnect. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/28f566ab23a83733c6c9ef9414c010b760b4549c.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	dt-bindings: perf: arm-cmn: Add CI-700	Robin Murphy
	CI-700 is a new client-level coherent interconnect derived from the enterprise-level CMN family, and shares the same PMU design. CC: devicetree@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/5f0b372f808f1468e6d9500cedafbecd10254674.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Support new IP features	Robin Murphy
	The second generation of CMN IPs add new node types and significantly expand the configuration space with options for extra device ports on edge XPs, either plumbed into the regular DTM or with extra dedicated DTMs to monitor them, plus larger (and smaller) mesh sizes. Add basic support for pulling this new information out of the hardware, piping it around as necessary, and handling (most of) the new choices. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e58b495bcc7deec3882be4bac910ed0bf6979674.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Demarcate CMN-600 specifics	Robin Murphy
	In preparation for supporting newer CMN products, let's introduce a means to differentiate the features and events which are specific to a particular IP from those which remain common to the whole family. The newer designs have also smoothed off some of the rough edges in terms of discoverability, so separate out the parts of the flow which have effectively now become CMN-600 quirks. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/9f6368cdca4c821d801138939508a5bba54ccabb.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Move group validation data off-stack	Robin Murphy
	With the value of CMN_MAX_DTMS increasing significantly, our validation data structure is set to get quite big. Technically we could pack it at least twice as densely, since we only need around 19 bits of information per DTM, but that makes the code even more mind-bogglingly impenetrable, and even half of "quite big" may still be uncomfortably large for a stack frame (~1KB). Just move it to an off-stack allocation instead. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0cabff2e5839ddc0979e757c55515966f65359e4.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Optimise DTC counter accesses	Robin Murphy
	In cases where we do know which DTC domain a node belongs to, we can skip initialising or reading the global count in DTCs where we know it won't change. The machinery to achieve that is mostly in place already, so finish hooking it up by converting the vestigial domain tracking to propagate suitable bitmaps all the way through to events. Note that this does not allow allocating such an unused counter to a different event on that DTC, because that is a flippin' nightmare. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/51d930fd945ef51c81f5889ccca055c302b0a1d0.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Optimise DTM counter reads	Robin Murphy
	When multiple nodes of the same type are connected to the same XP (particularly in CAL configurations), it seems that they are likely to be consecutive in logical ID. Therefore, we're likely to gain a small benefit from an easy tweak to optimise out consecutive reads of the same set of DTM counters for an aggregated event. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/7777d77c2df17693cd3dabb6e268906e15238d82.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Refactor DTM handling	Robin Murphy
	Untangle DTMs from XPs into a dedicated abstraction. This helps make things a little more obvious and robust, but primarily paves the way for further development where new IPs can grow extra DTMs per XP. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/9cca18b1b98f482df7f1aaf3d3213e7f39500423.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Streamline node iteration	Robin Murphy
	Refactor the places where we scan through the set of nodes to switch from explicit array indexing to pointer-based iteration. This leads to slightly simpler object code, but also makes the source less dense and more pleasant for further development. It also unearths an almost-bug in arm_cmn_event_init() where we've been depending on the "array index" of NULL relative to cmn->dns being a sufficiently large number, yuck. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/ee0c9eda9a643f46001ac43aadf3f0b1fd5660dd.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Refactor node ID handling	Robin Murphy
	Add a bit more abstraction for the places where we decompose node IDs. This will help keep things nice and manageable when we come to add yet more variables which affect the node ID format. Also use the opportunity to move the rest of the low-level node management helpers back up to the logical place they were meant to be - how they ended up buried right in the middle of the event-related definitions is somewhat of a mystery... Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/a2242a8c3c96056c13a04ae87bf2047e5e64d2d9.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Drop compile-test restriction	Robin Murphy
	Although CMN is currently (and overwhelmingly likely to remain) deployed in arm64-only (modulo userspace) systems, the 64-bit "dependency" for compile-testing was just laziness due to heavy reliance on readq/writeq accessors. Since we only need one extra include for robustness in that regard, let's pull that in, widen the compile-test coverage, and fix up the smattering of type laziness that that brings to light. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/baee9ee0d0bdad8aaeb70f5a4b98d8fd4b1f5786.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Account for NUMA affinity	Robin Murphy
	On a system with multiple CMN meshes, ideally we'd want to access each PMU from within its own mesh, rather than with a long CML round-trip, wherever feasible. Since such a system is likely to be presented as multiple NUMA nodes, let's also hope a proximity domain is specified for each CMN programming interface, and use that to guide our choice of IRQ affinity to favour a node-local CPU where possible. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/32438b0d016e0649d882d47d30ac2000484287b9.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	perf/arm-cmn: Fix CPU hotplug unregistration	Robin Murphy
	Attempting to migrate the PMU context after we've unregistered the PMU device, or especially if we never successfully registered it in the first place, is a woefully bad idea. It's also fundamentally pointless anyway. Make sure to unregister an instance from the hotplug handler without invoking the teardown callback. Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/2c221d745544774e4b07583b65b5d4d94f7e0fe4.1638530442.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	Documentation: arm64: Document PMU counters access from userspace	Raphael Gault
	Add documentation to describe the access to the pmu hardware counters from userspace. Signed-off-by: Raphael Gault <raphael.gault@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211208201124.310740-6-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	arm64: perf: Enable PMU counter userspace access for perf event	Rob Herring
	Arm PMUs can support direct userspace access of counters which allows for low overhead (i.e. no syscall) self-monitoring of tasks. The same feature exists on x86 called 'rdpmc'. Unlike x86, userspace access will only be enabled for thread bound events. This could be extended if needed, but simplifies the implementation and reduces the chances for any information leaks (which the x86 implementation suffers from). PMU EL0 access will be enabled when an event with userspace access is part of the thread's context. This includes when the event is not scheduled on the PMU. There's some additional overhead clearing dirty counters when access is enabled in order to prevent leaking disabled counter data from other tasks. Unlike x86, enabling of userspace access must be requested with a new attr bit: config1:1. If the user requests userspace access with 64-bit counters, then the event open will fail if the h/w doesn't support 64-bit counters. Chaining is not supported with userspace access. The modes for config1 are as follows: config1 = 0 : user access disabled and always 32-bit config1 = 1 : user access disabled and always 64-bit (using chaining if needed) config1 = 2 : user access enabled and always 32-bit config1 = 3 : user access enabled and always 64-bit Based on work by Raphael Gault <raphael.gault@arm.com>, but has been completely re-written. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211208201124.310740-5-robh@kernel.org [will: Made armv8pmu_proc_user_access_handler() static] Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14	arm64: perf: Add userspace counter access disable switch	Rob Herring
	Like x86, some users may want to disable userspace PMU counter altogether. Add a sysctl 'perf_user_access' file to control userspace counter access. The default is '0' which is disabled. Writing '1' enables access. Note that x86 supports globally enabling user access by writing '2' to /sys/bus/event_source/devices/cpu/rdpmc. As there's not existing userspace support to worry about, this shouldn't be necessary for Arm. It could be added later if the need arises. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-perf-users@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211208201124.310740-4-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>