summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-14KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUsSuravee Suthikulpanit
Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask that effectively controls the largest guest physical APIC ID supported by x2AVIC, instead of hardcoding the number of bits to 8 (and the number of VM bits to 24). The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a reference back to KVM in case the IOMMU cannot inject an interrupt into a non-running vCPU. In such a case, the IOMMU notifies software by creating a GALog entry with the corresponded GATag, and KVM then uses the GATag to find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Fix a benign off-by-one bug in AVIC physical table maskSean Christopherson
Define the "physical table max index mask" as bits 8:0, not 9:0. x2AVIC currently supports a max of 512 entries, i.e. the max index is 511, and the inputs to GENMASK_ULL() are inclusive. The bug is benign as bit 9 is reserved and never set by KVM, i.e. KVM is just clearing bits that are guaranteed to be zero. Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area. I.e. that table wasn't updated when x2AVIC support was added. Opportunistically fix the comment for the max AVIC ID to align with the code, and clean up comment formatting too. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14selftests: KVM: skip hugetlb tests if huge pages are not availablePaolo Bonzini
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0) the test will fail with a memory allocation error. This makes it impossible to add tests that default to hugetlb-backed memory, because on a machine with a default configuration they will fail. Therefore, check HugePages_Total as well and, if zero, direct the user to enable hugepages in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Use tabs instead of spaces for indentationRong Tao
Code indentation should use tabs where possible and miss a '*'. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_A492CB3F9592578451154442830EA1B02C07@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Fix indentation coding style issueRong Tao
Code indentation should use tabs where possible. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_31E6ACADCB6915E157CF5113C41803212107@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: remove unnecessary #ifdefPaolo Bonzini
nested_vmx_check_controls() has already run by the time KVM checks host state, so the "host address space size" exit control can only be set on x86-64 hosts. Simplify the condition at the cost of adding some dead code to 32-bit kernels. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: add missing consistency checks for CR0 and CR4Paolo Bonzini
The effective values of the guest CR0 and CR4 registers may differ from those included in the VMCS12. In particular, disabling EPT forces CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1. Therefore, checks on these bits cannot be delegated to the processor and must be performed by KVM. Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14Merge tag 'kvmarm-fixes-6.3-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.3, part #1 A single patch to address a rather annoying bug w.r.t. guest timer offsetting. Effectively the synchronization of timer offsets between vCPUs was broken, leading to inconsistent timer reads within the VM.
2023-03-14powerpc/pseries: RTAS work area requires GENERIC_ALLOCATORRandy Dunlap
The RTAS work area allocator uses code that is built by GENERIC_ALLOCATOR, so the PSERIES Kconfig should select the required Kconfig symbol to fix multiple build errors. powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.rtas_work_area_allocator_init': rtas-work-area.c:(.init.text+0x288): undefined reference to `.gen_pool_create' powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x2dc): undefined reference to `.gen_pool_set_algo' powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x310): undefined reference to `.gen_pool_add_owner' powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x43c): undefined reference to `.gen_pool_destroy' powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o:(.toc+0x0): undefined reference to `gen_pool_first_fit_order_align' powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.__rtas_work_area_alloc': rtas-work-area.c:(.ref.text+0x14c): undefined reference to `.gen_pool_alloc_algo_owner' powerpc64-linux-ld: rtas-work-area.c:(.ref.text+0x238): undefined reference to `.gen_pool_alloc_algo_owner' powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.rtas_work_area_free': rtas-work-area.c:(.ref.text+0x44c): undefined reference to `.gen_pool_free_owner' Fixes: 43033bc62d34 ("powerpc/pseries: add RTAS work area allocator") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230223070116.660-2-rdunlap@infradead.org
2023-03-13net: ipa: fix a surprising number of bad offsetsAlex Elder
A recent commit eliminated a hack that adjusted the offset used for many GSI registers. It became possible because we now specify all GSI register offsets explicitly for every version of IPA. Unfortunately, a large number of register offsets were *not* updated as they should have been in that commit. For IPA v4.5+, the offset for every GSI register *except* the two inter-EE interrupt masking registers were supposed to have been reduced by 0xd000. Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # SM8350-HDK Fixes: 59b12b1d27f3 ("net: ipa: kill gsi->virt_raw") Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230310193709.1477102-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: Use of_property_present() for testing DT property presenceRob Herring
It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property/of_find_property functions for reading properties. As part of this, convert of_get_property/of_find_property calls to the recently added of_property_present() helper when we just want to test for presence of a property and nothing more. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230310144716.1544083-1-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: dsa: mt7530: set PLL frequency and trgmii only when trgmii is usedArınç ÜNAL
As my testing on the MCM MT7530 switch on MT7621 SoC shows, setting the PLL frequency does not affect MII modes other than trgmii on port 5 and port 6. So the assumption is that the operation here called "setting the PLL frequency" actually sets the frequency of the TRGMII TX clock. Make it so that it and the rest of the trgmii setup run only when the trgmii mode is used. Tested rgmii and trgmii modes of port 6 on MCM MT7530 on MT7621AT Unielec U7621-06 and standalone MT7530 on MT7623NI Bananapi BPI-R2. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20230310073338.5836-2-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: dsa: mt7530: remove now incorrect comment regarding port 5Arınç ÜNAL
Remove now incorrect comment regarding port 5 as GMAC5. This is supposed to be supported since commit 38f790a80560 ("net: dsa: mt7530: Add support for port 5") under mt7530_setup_port5(). Fixes: 38f790a80560 ("net: dsa: mt7530: Add support for port 5") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20230310073338.5836-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== i40e: support XDP multi-buffer Tirthendu Sarkar says: This patchset adds multi-buffer support for XDP. Tx side already has support for multi-buffer. This patchset focuses on Rx side. The last patch contains actual multi-buffer changes while the previous ones are preparatory patches. On receiving the first buffer of a packet, xdp_buff is built and its subsequent buffers are added to it as frags. While 'next_to_clean' keeps pointing to the first descriptor, the newly introduced 'next_to_process' keeps track of every descriptor for the packet. On receiving EOP buffer the XDP program is called and appropriate action is taken (building skb for XDP_PASS, reusing page for XDP_DROP, adjusting page offsets for XDP_{REDIRECT,TX}). The patchset also streamlines page offset adjustments for buffer reuse to make it easier to post process the rx_buffers after running XDP prog. With this patchset there does not seem to be any performance degradation for XDP_PASS and some improvement (~1% for XDP_TX, ~5% for XDP_DROP) when measured using xdp_rxq_info program from samples/bpf/ for 64B packets. v1: https://lore.kernel.org/netdev/20230306210822.3381942-1-anthony.l.nguyen@intel.com/ * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: add support for XDP multi-buffer Rx i40e: add xdp_buff to i40e_ring struct i40e: introduce next_to_process to i40e_ring i40e: use frame_sz instead of recalculating truesize for building skb i40e: Change size to truesize when using i40e_rx_buffer_flip() i40e: add pre-xdp page_count in rx_buffer i40e: change Rx buffer size for legacy-rx to support XDP multi-buffer i40e: consolidate maximum frame size calculation for vsi ==================== Link: https://lore.kernel.org/r/20230309212819.1198218-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13qed/qed_dev: guard against a possible division by zeroDaniil Tatianin
Previously we would divide total_left_rate by zero if num_vports happened to be 1 because non_requested_count is calculated as num_vports - req_count. Guard against this by validating num_vports at the beginning and returning an error otherwise. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Fixes: bcd197c81f63 ("qed: Add vport WFQ configuration APIs") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230309201556.191392-1-d-tatianin@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13selftests: tc-testing: add tests for action bindingPedro Tammela
Add tests that check if filters can bind actions, that is create an action independently and then bind to a filter. tdc-tests under category 'infra': 1..18 ok 1 abdc - Reference pedit action object in filter ok 2 7a70 - Reference mpls action object in filter ok 3 d241 - Reference bpf action object in filter ok 4 383a - Reference connmark action object in filter ok 5 c619 - Reference csum action object in filter ok 6 a93d - Reference ct action object in filter ok 7 8bb5 - Reference ctinfo action object in filter ok 8 2241 - Reference gact action object in filter ok 9 35e9 - Reference gate action object in filter ok 10 b22e - Reference ife action object in filter ok 11 ef74 - Reference mirred action object in filter ok 12 2c81 - Reference nat action object in filter ok 13 ac9d - Reference police action object in filter ok 14 68be - Reference sample action object in filter ok 15 cf01 - Reference skbedit action object in filter ok 16 c109 - Reference skbmod action object in filter ok 17 4abc - Reference tunnel_key action object in filter ok 18 dadd - Reference vlan action object in filter Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230309175554.304824-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13bnxt: avoid overflow in bnxt_get_nvram_directory()Maxim Korotkov
The value of an arithmetic expression is subject of possible overflow due to a failure to cast operands to a larger data type before performing arithmetic. Used macro for multiplication instead operator for avoiding overflow. Found by Security Code and Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Maxim Korotkov <korotkov.maxim.s@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230309174347.3515-1-korotkov.maxim.s@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: virtio_net: implement exact header length guest featureJiri Pirko
Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when set implicates that device benefits from knowing the exact size of the header. For compatibility, to signal to the device that the header is reliable driver also needs to set this feature. Without this feature set by driver, device has to figure out the header size itself. Quoting the original virtio spec: "hdr_len is a hint to the device as to how much of the header needs to be kept to copy into each packet" "a hint" might not be clear for the reader what does it mean, if it is "maybe like that" of "exactly like that". This feature just makes it crystal clear and let the device count on the hdr_len being filled up by the exact length of header. Also note the spec already has following note about hdr_len: "Due to various bugs in implementations, this field is not useful as a guarantee of the transport header size." Without this feature the device needs to parse the header in core data path handling. Accurate information helps the device to eliminate such header parsing and directly use the hardware accelerators for GSO operation. virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb). The driver already complies to fill the correct value. Introduce the feature and advertise it. Note that virtio spec also includes following note for device implementation: "Caution should be taken by the implementation so as to prevent a malicious driver from attacking the device by setting an incorrect hdr_len." There is a plan to support this feature in our emulated device. A device of SolidRun offers this feature bit. They claim this feature will save the device a few cycles for every GSO packet. Link: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-230006x3 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230309094559.917857-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: hsr: Don't log netdev_err message on unknown prp dst nodeKristian Overskeid
If no frames has been exchanged with a node for HSR_NODE_FORGET_TIME, the node will be deleted from the node_db list. If a frame is sent to the node after it is deleted, a netdev_err message for each slave interface is produced. This should not happen with dan nodes because of supervision frames, but can happen often with san nodes, which clutters the kernel log. Since the hsr protocol does not support sans, this is only relevant for the prp protocol. Signed-off-by: Kristian Overskeid <koverskeid@gmail.com> Link: https://lore.kernel.org/r/20230309092302.179586-1-koverskeid@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: phy: smsc: use device_property_present in smsc_phy_probeHeiner Kallweit
Use unified device property API. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/a969f012-1d3b-7a36-51cf-89a5f8f15a9b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()D. Wythe
When performing a stress test on SMC-R by rmmod mlx5_ib driver during the wrk/nginx test, we found that there is a probability of triggering a panic while terminating all link groups. This issue dues to the race between smc_smcr_terminate_all() and smc_buf_create(). smc_smcr_terminate_all smc_buf_create /* init */ conn->sndbuf_desc = NULL; ... __smc_lgr_terminate smc_conn_kill smc_close_abort smc_cdc_get_slot_and_msg_send __softirqentry_text_start smc_wr_tx_process_cqe smc_cdc_tx_handler READ(conn->sndbuf_desc->len); /* panic dues to NULL sndbuf_desc */ conn->sndbuf_desc = xxx; This patch tries to fix the issue by always to check the sndbuf_desc before send any cdc msg, to make sure that no null pointer is seen during cqe processing. Fixes: 0b29ec643613 ("net/smc: immediate termination for SMCR link groups") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/1678263432-17329-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: socket: suppress unused warningVincenzo Palazzo
suppress unused warnings and fix the error that there is with the W=1 enabled. Warning generated net/socket.c: In function ‘__sys_getsockopt’: net/socket.c:2300:13: error: variable ‘max_optlen’ set but not used [-Werror=unused-but-set-variable] 2300 | int max_optlen; Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230310221851.304657-1-vincenzopalazzodev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13bnxt_en: reset PHC frequency in free-running modeVadim Fedorenko
When using a PHC in shared between multiple hosts, the previous frequency value may not be reset and could lead to host being unable to compensate the offset with timecounter adjustments. To avoid such state reset the hardware frequency of PHC to zero on init. Some refactoring is needed to make code readable. Fixes: 85036aee1938 ("bnxt_en: Add a non-real time mode to access NIC clock") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230310151356.678059-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: phy: dp83867: Disable IRQs on suspendAlexander Stein
Before putting the PHY into IEEE power down mode, disable IRQs to prevent accessing the PHY once MDIO has already been shutdown. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230310074500.3472858-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13net: phy: micrel: Add support for PTP_PF_PEROUT for lan8841Horatiu Vultur
Lan8841 has 10 GPIOs and it has 2 events(EVENT_A and EVENT_B). It is possible to assigned the 2 events to any of the GPIOs, but a GPIO can have only 1 event at a time. These events are used to generate periodic signals. It is possible to configure the length, the start time and the period of the signal by configuring the event. Currently the SW uses only EVENT_A to generate the perout. These events are generated by comparing the target time with the PHC time. In case the PHC time is changed to a value bigger than the target time + reload time, then it would generate only 1 event and then it would stop because target time + reload time is small than PHC time. Therefore it is required to change also the target time every time when the PHC is changed. The same will apply also when the PHC time is changed to a smaller value. This was tested using: testptp -L 6,2 testptp -p 1000000000 -w 200000000 Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230307214402.793057-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13md: avoid signed overflow in slot_store()NeilBrown
slot_store() uses kstrtouint() to get a slot number, but stores the result in an "int" variable (by casting a pointer). This can result in a negative slot number if the unsigned int value is very large. A negative number means that the slot is empty, but setting a negative slot number this way will not remove the device from the array. I don't think this is a serious problem, but it could cause confusion and it is best to fix it. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <song@kernel.org>
2023-03-13ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg()Jacob Keller
The main loop in __ice_clean_ctrlq first checks if a VF might be malicious before calling ice_vc_process_vf_msg(). This results in duplicate code in both functions to obtain a reference to the VF, and exports the ice_is_malicious_vf() from ice_virtchnl.c unnecessarily. Refactor ice_is_malicious_vf() to be a static function that takes a pointer to the VF. Call this in ice_vc_process_vf_msg() just after we obtain a reference to the VF by calling ice_get_vf_by_id. Pass the mailbox data from the __ice_clean_ctrlq function into ice_vc_process_vf_msg() instead of calling ice_is_malicious_vf(). This reduces the number of exported functions and avoids the need to obtain the VF reference twice for every mailbox message. Note that the state check for ICE_VF_STATE_DIS is kept in ice_is_malicious_vf() and we call this before checking that state in ice_vc_process_vf_msg. This is intentional, as we stop responding to VF messages from a VF once we detect that it may be overflowing the mailbox. This ensures that we continue to silently ignore the message as before without responding via ice_vc_send_msg_to_vf(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: move ice_is_malicious_vf() to ice_virtchnl.cJacob Keller
The ice_is_malicious_vf() function is currently implemented in ice_sriov.c This function is not Single Root specific, and a future change is going to refactor the ice_vc_process_vf_msg() function to call this instead of calling it before ice_vc_process_vf_msg() in the main loop of __ice_clean_ctrlq. To make that change easier to review, first move this function into ice_virtchnl.c but leave the call in __ice_clean_ctrlq() alone. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: print message if ice_mbx_vf_state_handler returns an errorJacob Keller
If ice_mbx_vf_state_handler() returns an error, the ice_is_malicious_vf() function just exits without printing anything. Instead, use dev_warn_ratelimited to print a warning that we were unable to check the status for this VF. The _ratelimited variant is used to avoid potentially spamming the log if this function is failing consistently for every single mailbox message. Also we can drop the "goto" as it simply skips over a report_malvf check. That variable should always be false if ice_mbx_vf_state_handler returns non-zero. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: pass mbxdata to ice_is_malicious_vf()Jacob Keller
The ice_is_malicious_vf() function takes information about the current state of the mailbox during a single interrupt. This information includes the number of messages processed so far, as well as the number of pending messages not yet processed. A future refactor is going to make ice_vc_process_vf_msg() call ice_is_malicious_vf() instead of having it called separately in ice_main.c This change will require passing all the necessary arguments into ice_vc_process_vf_msg(). To make this simpler, have the main loop fill in the struct ice_mbx_data and pass that rather than passing in the num_msg_proc and num_msg_pending. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: remove unnecessary &array[0] and just use arrayJacob Keller
In ice_is_malicious_vf we print the VF MAC address using %pM by passing the address of the first element of vf->dev_lan_addr. This is equivalent to just passing vf->dev_lan_addr, so do that. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: always report VF overflowing mailbox even without PF VSIJacob Keller
In ice_is_malicious_vf we report a message warning the system administrator when a VF is potentially spamming the PF with asynchronous messages that could overflow the PF mailbox. The specific message was requested by our customer support team to include the VF and PF MAC address. In some cases we may not be able to locate the PF VSI to obtain the MAC address for the PF. The current implementation discards the message entirely in this case. Fix this to instead print a zero address in that case so that we always print something here. Note that dev_warn will also include the PCI device information allowing another mechanism for determining on which PF the potentially malicious VF belongs. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: declare ice_vc_process_vf_msg in ice_virtchnl.hJacob Keller
The ice_vc_process_vf_msg function is the main entry point for handling virtchnl messages. This function is defined in ice_virtchnl.c but its declaration is still in ice_sriov.c The ice_sriov.c file used to contain all of the virtualization logic until commit bf93bf791cec ("ice: introduce ice_virtchnl.c and ice_virtchnl.h") moved the virtchnl logic to its own file. The ice_vc_process_vf_msg function should have had its declaration moved to ice_virtchnl.h then. Fix this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: initialize mailbox snapshot earlier in PF initJacob Keller
Now that we no longer depend on the number of VFs being allocated, we can move the ice_mbx_init_snapshot function earlier. This will be required by Scalable IOV as we will not be calling ice_sriov_configure for Scalable VFs. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: merge ice_mbx_report_malvf with ice_mbx_vf_state_handlerJacob Keller
The ice_mbx_report_malvf function is used to update the ice_mbx_vf_info.malicious member after we detect a malicious VF. This is done by calling ice_mbx_report_malvf after ice_mbx_vf_state_handler sets its "is_malvf" return parameter true. Instead of requiring two steps, directly update the malicious bit in the state handler, and remove the need for separately calling ice_mbx_report_malvf. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13md: Free resources in __md_stopXiao Ni
If md_run() fails after ->active_io is initialized, then percpu_ref_exit is called in error path. However, later md_free_disk will call percpu_ref_exit again which leads to a panic because of null pointer dereference. It can also trigger this bug when resources are initialized but are freed in error path, then will be freed again in md_free_disk. BUG: kernel NULL pointer dereference, address: 0000000000000038 Oops: 0000 [#1] PREEMPT SMP Workqueue: md_misc mddev_delayed_delete RIP: 0010:free_percpu+0x110/0x630 Call Trace: <TASK> __percpu_ref_exit+0x44/0x70 percpu_ref_exit+0x16/0x90 md_free_disk+0x2f/0x80 disk_release+0x101/0x180 device_release+0x84/0x110 kobject_put+0x12a/0x380 kobject_put+0x160/0x380 mddev_delayed_delete+0x19/0x30 process_one_work+0x269/0x680 worker_thread+0x266/0x640 kthread+0x151/0x1b0 ret_from_fork+0x1f/0x30 For creating raid device, md raid calls do_md_run->md_run, dm raid calls md_run. We alloc those memory in md_run. For stopping raid device, md raid calls do_md_stop->__md_stop, dm raid calls md_stop->__md_stop. So we can free those memory resources in __md_stop. Fixes: 72adae23a72c ("md: Change active_io to percpu") Reported-and-tested-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org>
2023-03-13Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Some virtio / vhost / vdpa fixes accumulated so far" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: Ignore virtio-trace/trace-agent vdpa_sim: set last_used_idx as last_avail_idx in vdpasim_queue_ready vhost-vdpa: free iommu domain after last use during cleanup vdpa/mlx5: should not activate virtq object when suspended vp_vdpa: fix the crash in hot unplug with vp_vdpa
2023-03-13ice: remove ice_mbx_deinit_snapshotJacob Keller
The ice_mbx_deinit_snapshot function's only remaining job is to clear the previous snapshot data. This snapshot data is initialized when SR-IOV adds VFs, so it is not necessary to clear this data when removing VFs. Since no allocation occurs we no longer need to free anything and we can safely remove this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: move VF overflow message count into struct ice_mbx_vf_infoJacob Keller
The ice driver has some logic in ice_vf_mbx.c used to detect potentially malicious VF behavior with regards to overflowing the PF mailbox. This logic currently stores message counts in struct ice_mbx_vf_counter.vf_cntr as an array. This array is allocated during initialization with ice_mbx_init_snapshot. This logic makes sense for SR-IOV where all VFs are allocated at once up front. However, in the future with Scalable IOV this logic will not work. VFs can be added and removed dynamically. We could try to keep the vf_cntr array for the maximum possible number of VFs, but this is a waste of memory. Use the recently introduced struct ice_mbx_vf_info structure to store the message count. Pass a pointer to the mbx_info for a VF instead of using its VF ID. Replace the array of VF message counts with a linked list that tracks all currently active mailbox tracking info structures. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: track malicious VFs in new ice_mbx_vf_info structureJacob Keller
Currently the PF tracks malicious VFs in a malvfs bitmap which is used by the ice_mbx_clear_malvf and ice_mbx_report_malvf functions. This bitmap is used to ensure that we only report a VF as malicious once rather than continuously spamming the event log. This mechanism of storage for the malicious indication works well enough for SR-IOV. However, it will not work with Scalable IOV. This is because Scalable IOV VFs can be allocated dynamically and might change VF ID when their underlying VSI changes. To support this, the mailbox overflow logic will need to be refactored. First, introduce a new ice_mbx_vf_info structure which will be used to store data about a VF. Embed this structure in the struct ice_vf, and ensure it gets initialized when a new VF is created. For now this only stores the malicious indicator bit. Pass a pointer to the VF's mbx_info structure instead of using a bitmap to keep track of these bits. A future change will extend this structure and the rest of the logic associated with the overflow detection. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: convert ice_mbx_clear_malvf to void and use WARNJacob Keller
The ice_mbx_clear_malvf function checks for a few error conditions before clearing the appropriate data. These error conditions are really warnings that should never occur in a properly initialized driver. Every caller of ice_mbx_clear_malvf just prints a dev_dbg message on failure which will generally be ignored. Convert this function to void and switch the error return values to WARN_ON. This will make any potentially misconfiguration more visible and makes future refactors that involve changing how we store the malicious VF data easier. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13ice: re-order ice_mbx_reset_snapshot functionJacob Keller
A future change is going to refactor the VF mailbox overflow detection logic, including modifying ice_mbx_reset_snapshot and its callers. To make this change easier to review, first move the ice_mbx_reset_snapshot function higher in the ice_vf_mbx.c file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-13accel: Build sub-directories based on config optionsStanislaw Gruszka
When accel drivers are disabled do not process into sub-directories and create built-in archives: AR drivers/accel/habanalabs/built-in.a AR drivers/accel/ivpu/built-in.a Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU") Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230301162508.3963484-1-stanislaw.gruszka@linux.intel.com (cherry picked from commit dd61bbd0d1fba48cd9464e047a7f90b70a463e39) Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2023-03-13drm/i915/active: Fix misuse of non-idle barriers as fence trackersJanusz Krzysztofik
Users reported oopses on list corruptions when using i915 perf with a number of concurrently running graphics applications. Root cause analysis pointed at an issue in barrier processing code -- a race among perf open / close replacing active barriers with perf requests on kernel context and concurrent barrier preallocate / acquire operations performed during user context first pin / last unpin. When adding a request to a composite tracker, we try to reuse an existing fence tracker, already allocated and registered with that composite. The tracker we obtain may already track another fence, may be an idle barrier, or an active barrier. If the tracker we get occurs a non-idle barrier then we try to delete that barrier from a list of barrier tasks it belongs to. However, while doing that we don't respect return value from a function that performs the barrier deletion. Should the deletion ever fail, we would end up reusing the tracker still registered as a barrier task. Since the same structure field is reused with both fence callback lists and barrier tasks list, list corruptions would likely occur. Barriers are now deleted from a barrier tasks list by temporarily removing the list content, traversing that content with skip over the node to be deleted, then populating the list back with the modified content. Should that intentionally racy concurrent deletion attempts be not serialized, one or more of those may fail because of the list being temporary empty. Related code that ignores the results of barrier deletion was initially introduced in v5.4 by commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"). However, all users of the barrier deletion routine were apparently serialized at that time, then the issue didn't exhibit itself. Results of git bisect with help of a newly developed igt@gem_barrier_race@remote-request IGT test indicate that list corruptions might start to appear after commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), introduced in v5.5. Respect results of barrier deletion attempts -- mark the barrier as idle only if successfully deleted from the list. Then, before proceeding with setting our fence as the one currently tracked, make sure that the tracker we've got is not a non-idle barrier. If that check fails then don't use that tracker but go back and try to acquire a new, usable one. v3: use unlikely() to document what outcome we expect (Andi), - fix bad grammar in commit description. v2: no code changes, - blame commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), v5.5, not commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"), v5.4, - reword commit description. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333 Fixes: 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org # v5.5 Cc: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com (cherry picked from commit 506006055769b10d1b2b4e22f636f3b45e0e9fc7) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13drm/i915/sseu: fix max_subslices array-index-out-of-bounds accessAndrea Righi
It seems that commit bc3c5e0809ae ("drm/i915/sseu: Don't try to store EU mask internally in UAPI format") exposed a potential out-of-bounds access, reported by UBSAN as following on a laptop with a gen 11 i915 card: UBSAN: array-index-out-of-bounds in drivers/gpu/drm/i915/gt/intel_sseu.c:65:27 index 6 is out of range for type 'u16 [6]' CPU: 2 PID: 165 Comm: systemd-udevd Not tainted 6.2.0-9-generic #9-Ubuntu Hardware name: Dell Inc. XPS 13 9300/077Y9N, BIOS 1.11.0 03/22/2022 Call Trace: <TASK> show_stack+0x4e/0x61 dump_stack_lvl+0x4a/0x6f dump_stack+0x10/0x18 ubsan_epilogue+0x9/0x3a __ubsan_handle_out_of_bounds.cold+0x42/0x47 gen11_compute_sseu_info+0x121/0x130 [i915] intel_sseu_info_init+0x15d/0x2b0 [i915] intel_gt_init_mmio+0x23/0x40 [i915] i915_driver_mmio_probe+0x129/0x400 [i915] ? intel_gt_probe_all+0x91/0x2e0 [i915] i915_driver_probe+0xe1/0x3f0 [i915] ? drm_privacy_screen_get+0x16d/0x190 [drm] ? acpi_dev_found+0x64/0x80 i915_pci_probe+0xac/0x1b0 [i915] ... According to the definition of sseu_dev_info, eu_mask->hsw is limited to a maximum of GEN_MAX_SS_PER_HSW_SLICE (6) sub-slices, but gen11_sseu_info_init() can potentially set 8 sub-slices, in the !IS_JSL_EHL(gt->i915) case. Fix this by reserving up to 8 slots for max_subslices in the eu_mask struct. Reported-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Fixes: bc3c5e0809ae ("drm/i915/sseu: Don't try to store EU mask internally in UAPI format") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230220171858.131416-1-andrea.righi@canonical.com (cherry picked from commit 3cba09a6ac86ea1d456909626eb2685596c07822) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13drm/i915/dg2: Add HDMI pixel clock frequencies 267.30 and 319.89 MHzAnkit Nautiyal
Add snps phy table values for HDMI pixel clocks 267.30 MHz and 319.89 MHz. Values are based on the Bspec algorithm for PLL programming for HDMI. Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8008 Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Uma Shankar <uma.shankar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230223043619.3941382-1-ankit.k.nautiyal@intel.com (cherry picked from commit d46746b8b13cbd377ffc733e465d25800459a31b) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13drm/i915/psr: Use calculated io and fast wake linesJouni Högander
Currently we are using hardcoded 7 for io and fast wake lines. According to Bspec io and fast wake times are both 42us for DISPLAY_VER >= 12 and 50us and 32us for older platforms. Calculate line counts for these and configure them into PSR2_CTL accordingly Use 45 us for the fast wake calculation as 42 seems to be too tight based on testing. Bspec: 49274, 4289 Cc: Mika Kahola <mika.kahola@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Fixes: 64cf40a125ff ("drm/i915/psr: Program default IO buffer Wake and Fast Wake") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7725 Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230221085304.3382297-1-jouni.hogander@intel.com (cherry picked from commit cb42e8ede5b475c096e473b86c356b1158b4bc3b) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13drm/i915: Fix audio ELD handling for DP MSTVille Syrjälä
I forgot to call intel_audio_compute_config() on DP MST, which means ELD doesn't get populated and passed to the audio driver. References: https://gitlab.freedesktop.org/drm/intel/-/issues/8097 Fixes: 5d986635e296 ("drm/i915/audio: Precompute the ELD") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230220151731.6852-1-ville.syrjala@linux.intel.com Reviewed-by: Uma Shankar <uma.shankar@intel.com> (cherry picked from commit 518b761a7b0e2bb2fac2518f041c71b461adf761) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13drm/i915/hwmon: Enable PL1 power limitAshutosh Dixit
Previous documentation suggested that PL1 power limit is always enabled. However we now find this not to be the case on some platforms (such as ATSM). Therefore enable PL1 power limit during hwmon initialization. Bspec: 51864 v2: Add Bspec reference (Gwan-gyeong) v3: Add Fixes tag Fixes: 99f55efb79114 ("drm/i915/hwmon: Power PL1 limit and TDP setting") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230203155309.1042297-1-ashutosh.dixit@intel.com (cherry picked from commit 0349c41b05968befaffa5fbb7e73d0ee6004f610) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-03-13Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann
Backmerging to get latest upstream. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>