summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-24intel: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
2021-06-24freescale: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The dpaa and dpaa2 drivers have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Camelia Groza <camelia.groza@nxp.com> Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Cc: Madalin Bucur <madalin.bucur@nxp.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-11-toke@redhat.com
2021-06-24thunderx: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The thunderx driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Sunil Goutham <sgoutham@marvell.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/bpf/20210624160609.292325-10-toke@redhat.com
2021-06-24bnxt: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The bnxt driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-9-toke@redhat.com
2021-06-24ena: Remove rcu_read_lock() around XDP program invocationToke Høiland-Jørgensen
The ena driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Guy Tzalik <gtzalik@amazon.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-8-toke@redhat.com
2021-06-24bpf, sched: Remove unneeded rcu_read_lock() around BPF program invocationToke Høiland-Jørgensen
The rcu_read_lock() call in cls_bpf and act_bpf are redundant: on the TX side, there's already a call to rcu_read_lock_bh() in __dev_queue_xmit(), and on RX there's a covering rcu_read_lock() in netif_receive_skb{,_list}_internal(). With the previous patches we also amended the lockdep checks in the map code to not require any particular RCU flavour, so we can just get rid of the rcu_read_lock()s. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-7-toke@redhat.com
2021-06-24xdp: Add proper __rcu annotations to redirect map entriesToke Høiland-Jørgensen
XDP_REDIRECT works by a three-step process: the bpf_redirect() and bpf_redirect_map() helpers will lookup the target of the redirect and store it (along with some other metadata) in a per-CPU struct bpf_redirect_info. Next, when the program returns the XDP_REDIRECT return code, the driver will call xdp_do_redirect() which will use the information thus stored to actually enqueue the frame into a bulk queue structure (that differs slightly by map type, but shares the same principle). Finally, before exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will flush all the different bulk queues, thus completing the redirect. Pointers to the map entries will be kept around for this whole sequence of steps, protected by RCU. However, there is no top-level rcu_read_lock() in the core code; instead drivers add their own rcu_read_lock() around the XDP portions of the code, but somewhat inconsistently as Martin discovered[0]. However, things still work because everything happens inside a single NAPI poll sequence, which means it's between a pair of calls to local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could document this intention by using rcu_dereference_check() with rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and lockdep to verify that everything is done correctly. This patch does just that: we add an __rcu annotation to the map entry pointers and remove the various comments explaining the NAPI poll assurance strewn through devmap.c in favour of a longer explanation in filter.c. The goal is to have one coherent documentation of the entire flow, and rely on the RCU annotations as a "standard" way of communicating the flow in the map code (which can additionally be understood by sparse and lockdep). The RCU annotation replacements result in a fairly straight-forward replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE() becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the proper constructs to cast the pointer back and forth between __rcu and __kernel address space (for the benefit of sparse). The one complication is that xskmap has a few constructions where double-pointers are passed back and forth; these simply all gain __rcu annotations, and only the final reference/dereference to the inner-most pointer gets changed. With this, everything can be run through sparse without eliciting complaints, and lockdep can verify correctness even without the use of rcu_read_lock() in the drivers. Subsequent patches will clean these up from the drivers. [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/ [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-6-toke@redhat.com
2021-06-24bpf: Allow RCU-protected lookups to happen from bh contextToke Høiland-Jørgensen
XDP programs are called from a NAPI poll context, which means the RCU reference liveness is ensured by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so lockdep understands that the dereferences are safe from inside *either* an rcu_read_lock() section *or* a local_bh_disable() section. While both bh_disabled and rcu_read_lock() provide RCU protection, they are semantically distinct, so we need both conditions to prevent lockdep complaints. This change is done in preparation for removing the redundant rcu_read_lock()s from drivers. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-5-toke@redhat.com
2021-06-24doc: Give XDP as example of non-obvious RCU reader/updater pairingToke Høiland-Jørgensen
This commit gives an example of non-obvious RCU reader/updater pairing in the guise of the XDP feature in networking, which calls BPF programs from network-driver NAPI (softirq) context. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-4-toke@redhat.com
2021-06-24doc: Clarify and expand RCU updaters and corresponding readersPaul E. McKenney
This commit clarifies which primitives readers can use given that the corresponding updaters have made a specific choice. This commit also adds this information for the various RCU Tasks flavors. While in the area, it removes a paragraph that no longer applies in any straightforward manner. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-3-toke@redhat.com
2021-06-24rcu: Create an unrcu_pointer() to remove __rcu from a pointerPaul E. McKenney
The xchg() and cmpxchg() functions are sometimes used to carry out RCU updates. Unfortunately, this can result in sparse warnings for both the old-value and new-value arguments, as well as for the return value. The arguments can be dealt with using RCU_INITIALIZER(): old_p = xchg(&p, RCU_INITIALIZER(new_p)); But a sparse warning still remains due to assigning the __rcu pointer returned from xchg to the (most likely) non-__rcu pointer old_p. This commit therefore provides an unrcu_pointer() macro that strips the __rcu. This macro can be used as follows: old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p))); Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210624160609.292325-2-toke@redhat.com
2021-06-24i40e: Fix autoneg disabling for non-10GBaseT linksMateusz Palczewski
Disabling autonegotiation was allowed only for 10GBaseT PHY. The condition was changed to check if link media type is BaseT. Fixes: 3ce12ee9d8f9 ("i40e: Fix order of checks when enabling/disabling autoneg in ethtool") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Karen Sornek <karen.sornek@intel.com> Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-24i40e: Fix error handling in i40e_vsi_openDinghao Liu
When vsi->type == I40E_VSI_FDIR, we have caught the return value of i40e_vsi_request_irq() but without further handling. Check and execute memory clean on failure just like the other i40e_vsi_request_irq(). Fixes: 8a9eb7d3cbcab ("i40e: rework fdir setup and teardown") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-24iwlwifi: acpi: remove unused function iwl_acpi_eval_dsm_func()Kalle Valo
Stephen reported a warning: drivers/net/wireless/intel/iwlwifi/fw/acpi.c:720:12: warning: 'iwl_acpi_eval_dsm_func' defined but not used [-Wunused-function] The warning is correct and the function is not used anywhere, so let's just remove it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 7119f02b5d34 ("iwlwifi: mvm: support BIOS enable/disable for 11ax in Russia") Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210624052918.4946-1-kvalo@codeaurora.org
2021-06-24rtw88: fix c2h memory leakPo-Hao Huang
Fix erroneous code that leads to unreferenced objects. During H2C operations, some functions returned without freeing the memory that only the function have access to. Release these objects when they're no longer needed to avoid potentially memory leaks. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210624023459.10294-1-pkshih@realtek.com
2021-06-24brcmfmac: support parse country code map from DTShawn Guo
With any regulatory domain requests coming from either user space or 802.11 IE (Information Element), the country is coded in ISO3166 standard. It needs to be translated to firmware country code and revision with the mapping info in settings->country_codes table. Support populate country_codes table by parsing the mapping from DT. The BRCMF_BUSTYPE_SDIO bus_type check gets separated from general DT validation, so that country code can be handled as general part rather than SDIO bus specific one. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210417075428.2671-1-shawn.guo@linaro.org
2021-06-24Merge tag 'core-urgent-2021-06-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sigqueue cache fix from Ingo Molnar: "Fix a memory leak in the recently introduced sigqueue cache" * tag 'core-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: signal: Prevent sigqueue caching after task got released
2021-06-24Merge tag 'sched-urgent-2021-06-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "A last minute cgroup bandwidth scheduling fix for a recently introduced logic fail which triggered a kernel warning by LTP's cfs_bandwidth01 test" * tag 'sched-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Ensure that the CFS parent is added after unthrottling
2021-06-24Merge tag 'perf-urgent-2021-06-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Ingo Molnar: "An LBR buffer fix for code that probably only worked accidentally" * tag 'perf-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/lbr: Zero the xstate buffer on allocation
2021-06-24KVM: do not allow mapping valid but non-reference-counted pagesNicholas Piggin
It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly. Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page). This addresses CVE-2021-22543. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24block: fix trace completion for chained bioEdward Hsieh
For chained bio, trace_block_bio_complete in bio_endio is currently called only by the parent bio once upon all chained bio completed. However, the sector and size for the parent bio are modified in bio_split. Therefore, the size and sector of the complete events might not match the queue events in blktrace. The original fix of bio completion trace <fbbaf700e7b1> ("block: trace completion of all bios.") wants multiple complete events to correspond to one queue event but missed this. The issue can be reproduced by md/raid5 read with bio cross chunks. To fix, move trace completion into the loop for every chained bio to call. Fixes: fbbaf700e7b1 ("block: trace completion of all bios.") Reviewed-by: Wade Liang <wadel@synology.com> Reviewed-by: BingJing Chang <bingjingc@synology.com> Signed-off-by: Edward Hsieh <edwardh@synology.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123030.27014-1-edwardh@synology.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24KVM: stats: Add fd-based API to read binary stats dataJing Zhang
This commit defines the API for userspace and prepare the common functionalities to support per VM/VCPU binary stats data readings. The KVM stats now is only accessible by debugfs, which has some shortcomings this change series are supposed to fix: 1. The current debugfs stats solution in KVM could be disabled when kernel Lockdown mode is enabled, which is a potential rick for production. 2. The current debugfs stats solution in KVM is organized as "one stats per file", it is good for debugging, but not efficient for production. 3. The stats read/clear in current debugfs solution in KVM are protected by the global kvm_lock. Besides that, there are some other benefits with this change: 1. All KVM VM/VCPU stats can be read out in a bulk by one copy to userspace. 2. A schema is used to describe KVM statistics. From userspace's perspective, the KVM statistics are self-describing. 3. With the fd-based solution, a separate telemetry would be able to read KVM stats in a less privileged environment. 4. After the initial setup by reading in stats descriptors, a telemetry only needs to read the stats data itself, no more parsing or setup is needed. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-3-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: stats: Separate generic stats from architecture specific onesJing Zhang
Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: x86/mmu: Don't WARN on a NULL shadow page in TDP MMU checkSean Christopherson
Treat a NULL shadow page in the "is a TDP MMU" check as valid, non-TDP root. KVM uses a "direct" PAE paging MMU when TDP is disabled and the guest is running with paging disabled. In that case, root_hpa points at the pae_root page (of which only 32 bytes are used), not a standard shadow page, and the WARN fires (a lot). Fixes: 0b873fd7fb53 ("KVM: x86/mmu: Remove redundant is_tdp_mmu_enabled check") Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622072454.3449146-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: sefltests: Add x86-64 test to verify MMU reacts to CPUID updatesSean Christopherson
Add an x86-only test to verify that x86's MMU reacts to CPUID updates that impact the MMU. KVM has had multiple bugs where it fails to reconfigure the MMU after the guest's vCPU model changes. Sadly, this test is effectively limited to shadow paging because the hardware page walk handler doesn't support software disabling of GBPAGES support, and KVM doesn't manually walk the GVA->GPA on faults for performance reasons (doing so would large defeat the benefits of TDP). Don't require !TDP for the tests as there is still value in running the tests with TDP, even though the tests will fail (barring KVM hacks). E.g. KVM should not completely explode if MAXPHYADDR results in KVM using 4-level vs. 5-level paging for the guest. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Add hugepage support for x86-64Sean Christopherson
Add x86-64 hugepage support in the form of a x86-only variant of virt_pg_map() that takes an explicit page size. To keep things simple, follow the existing logic for 4k pages and disallow creating a hugepage if the upper-level entry is present, even if the desired pfn matches. Opportunistically fix a double "beyond beyond" reported by checkpatch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Genericize upper level page table entry structSean Christopherson
In preparation for adding hugepage support, replace "pageMapL4Entry", "pageDirectoryPointerEntry", and "pageDirectoryEntry" with a common "pageUpperEntry", and add a helper to create an upper level entry. All upper level entries have the same layout, using unique structs provides minimal value and requires a non-trivial amount of code duplication. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Add PTE helper for x86-64 in preparation for hugepagesSean Christopherson
Add a helper to retrieve a PTE pointer given a PFN, address, and level in preparation for adding hugepage support. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Rename x86's page table "address" to "pfn"Sean Christopherson
Rename the "address" field to "pfn" in x86's page table structs to match reality. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Add wrapper to allocate page table pageSean Christopherson
Add a helper to allocate a page for use in constructing the guest's page tables. All architectures have identical address and memslot requirements (which appear to be arbitrary anyways). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Unconditionally allocate EPT tables in memslot 0Sean Christopherson
Drop the EPTP memslot param from all EPT helpers and shove the hardcoded '0' down to the vm_phy_page_alloc() calls. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Unconditionally use memslot '0' for page table allocationsSean Christopherson
Drop the memslot param from virt_pg_map() and virt_map() and shove the hardcoded '0' down to the vm_phy_page_alloc() calls. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622200529.3650424-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: selftests: Unconditionally use memslot 0 for vaddr allocationsSean Christopherson
Drop the memslot param(s) from vm_vaddr_alloc() now that all callers directly specific '0' as the memslot. Drop the memslot param from virt_pgd_alloc() as well since vm_vaddr_alloc() is its only user. I.e. shove the hardcoded '0' down to the vm_phy_pages_alloc() calls. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24Merge tag 'objtool-urgent-2021-06-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Address a number of objtool warnings that got reported. No change in behavior intended, but code generation might be impacted by commit 1f008d46f124 ("x86: Always inline task_size_max()")" * tag 'objtool-urgent-2021-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Improve noinstr vs errors x86: Always inline task_size_max() x86/xen: Fix noinstr fail in exc_xen_unknown_trap() x86/xen: Fix noinstr fail in xen_pv_evtchn_do_upcall() x86/entry: Fix noinstr fail in __do_fast_syscall_32() objtool/x86: Ignore __x86_indirect_alt_* symbols
2021-06-24Merge tag 'nvme-5.14-2021-06-22' of git://git.infradead.org/nvme into ↵Jens Axboe
for-5.14/drivers Pull NVMe updates from Christoph "nvme updates for Linux 5.14: - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner)" * tag 'nvme-5.14-2021-06-22' of git://git.infradead.org/nvme: (38 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established nvmet: change sn size and check validity ...
2021-06-24x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() againThomas Gleixner
The change which made copy_xstate_to_uabi_buf() usable for [x]fpregs_get() removed the zeroing of the header which means the header, which is copied to user space later, contains except for the xfeatures member, random stack content. Add the memset() back to zero it before usage. Fixes: eb6f51723f03 ("x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/875yy3wb8h.ffs@nanos.tec.linutronix.de
2021-06-24fs: remove bdev_try_to_free_page callbackZhang Yi
After remove the unique user of sop->bdev_try_to_free_page() callback, we could remove the callback and the corresponding blkdev_releasepage() at all. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-9-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: remove bdev_try_to_free_page() callbackZhang Yi
After we introduce a jbd2 shrinker to release checkpointed buffer's journal head, we could free buffer without bdev_try_to_free_page() under memory pressure. So this patch remove the whole bdev_try_to_free_page() callback directly. It also remove many use-after-free issues relate to it together. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-8-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: simplify journal_clean_one_cp_list()Zhang Yi
Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction when the buffer is not 'busy', which is only called by journal_clean_one_cp_list(). This patch simplify this function by remove __try_to_free_cp_buf() and invoke __cp_buffer_busy() directly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-7-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2,ext4: add a shrinker to release checkpointed buffersZhang Yi
Current metadata buffer release logic in bdev_try_to_free_page() have a lot of use-after-free issues when umount filesystem concurrently, and it is difficult to fix directly because ext4 is the only user of s_op->bdev_try_to_free_page callback and we may have to add more special refcount or lock that is only used by ext4 into the common vfs layer, which is unacceptable. One better solution is remove the bdev_try_to_free_page callback, but the real problem is we cannot easily release journal_head on the checkpointed buffer, so try_to_free_buffers() cannot release buffers and page under memory pressure, which is more likely to trigger out-of-memory. So we cannot remove the callback directly before we find another way to release journal_head. This patch introduce a shrinker to free journal_head on the checkpointed transaction. After the journal_head got freed, try_to_free_buffers() could free buffer properly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: remove redundant buffer io error checksZhang Yi
Now that __jbd2_journal_remove_checkpoint() can detect buffer io error and mark journal checkpoint error, then we abort the journal later before updating log tail to ensure the filesystem works consistently. So we could remove other redundant buffer io error checkes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: don't abort the journal when freeing buffersZhang Yi
Now that we can be sure the journal is aborted once a buffer has failed to be written back to disk, we can remove the journal abort logic in jbd2_journal_try_to_free_buffers() which was introduced in commit c044f3d8360d ("jbd2: abort journal if free a async write error metadata buffer"), because it may cost and propably is not safe. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: ensure abort the journal if detect IO error when writing original ↵Zhang Yi
buffer back Although we merged c044f3d8360 ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still fail to detect the buffer write io error flag which may lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introducing a new atomic flag which only have one JBD2_CHECKPOINT_IO_ERROR bit now, and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer which has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: remove the out label in __jbd2_journal_remove_checkpoint()Zhang Yi
The 'out' lable just return the 'ret' value and seems not required, so remove this label and switch to return appropriate value immediately. This patch also do some minor cleanup, no logical change. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24ext4: no need to verify new add extent blockyangerkun
ext4_ext_grow_indepth will add a new extent block which has init the expected content. We can mark this buffer as verified so to stop a useless check in __read_extent_tree_block. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210609075545.1442160-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24jbd2: clean up misleading comments for jbd2_fc_release_bufsyangerkun
This comments was for jbd2_fc_wait_bufs, not for jbd2_fc_release_bufs. Remove this misleading comments. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210608141236.459441-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24staging: hisi-spmi-controller: rename spmi-channel propertyMauro Carvalho Chehab
The spmi-channel is not used on other drivers. So, rename it, in order to document that this is specific to those devices. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/ed45fc5d84d7b531343ee5d3466ebfac26217da0.1624542940.git.mchehab+huawei@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24staging: phy-hi3670-usb3: do a some minor cleanupsMauro Carvalho Chehab
Before moving this driver out of staging: 1. group some integers altogether; 2. Use: return some_function() instead of: ret = some_function(); return ret; This is just a cleanup. No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/55db419e42fd3af72494acbe0ea0f0d1de8906ac.1624542940.git.mchehab+huawei@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24docs: ABI: testing: sysfs-firmware-memmap: add some memmap types.Jianpeng Ma
Compare with arch/x86/kernel/e820.c, types of memmap omitted some. So add. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Link: https://lore.kernel.org/r/20210623023126.104380-1-jianpeng.ma@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-24ext4: add check to prevent attempting to resize an fs with sparse_super2Josh Triplett
The in-kernel ext4 resize code doesn't support filesystem with the sparse_super2 feature. It fails with errors like this and doesn't finish the resize: EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770) EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize EXT4-fs (loop0): resized filesystem to 2097152 To reproduce: mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M truncate -s 30G ext4.img mount ext4.img /mnt python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)' dmesg | tail e2fsck ext4.img The userspace resize2fs tool has a check for this case: it checks if the filesystem has sparse_super2 set and if the kernel provides /sys/fs/ext4/features/sparse_super2. However, the former check requires manually reading and parsing the filesystem superblock. Detect this case in ext4_resize_begin and error out early with a clear error message. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>