summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-13xfs: use xfs_bmap_longest_free_extent() in filestreamsDave Chinner
The code in xfs_bmap_longest_free_extent() is open coded in xfs_filestream_pick_ag(). Export xfs_bmap_longest_free_extent and call it from the filestreams code instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: get rid of notinit from xfs_bmap_longest_free_extentDave Chinner
It is only set if reading the AGF gets a EAGAIN error. Just return the EAGAIN error and handle that error in the callers. This means we can remove the not_init parameter from xfs_bmap_select_minlen(), too, because the use of not_init there is pessimistic. If we can't read the agf, it won't increase blen. The only time we actually care whether we checked all the AGFs for contiguous free space is when the best length is less than the minimum allocation length. If not_init is set, then we ignore blen and set the minimum alloc length to the absolute minimum, not the best length we know already is present. However, if blen is less than the minimum we're going to ignore it anyway, regardless of whether we scanned all the AGFs or not. Hence not_init can go away, because we only use if blen is good from the scanned AGs otherwise we ignore it altogether and use minlen. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: factor out filestreams from xfs_bmap_btalloc_nullfbDave Chinner
There's many if (filestreams) {} else {} branches in this function. Split it out into a filestreams specific function so that we can then work directly on cleaning up the filestreams code without impacting the rest of the allocation algorithms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: convert trim to use for_each_perag_rangeDave Chinner
To convert it to using active perag references and hence make it shrink safe. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walkerDave Chinner
Now that the AG iteration code in the core allocation code has been cleaned up, we can easily convert it to use a for_each_perag..() variant to use active references and skip AGs that it can't get active references on. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: move the minimum agno checks into xfs_alloc_vextent_check_argsDave Chinner
All of the allocation functions now extract the minimum allowed AG from the transaction and then use it in some way. The allocation functions that are restricted to a single AG all check if the AG requested can be allocated from and return an error if so. These all set args->agno appropriately. All the allocation functions that iterate AGs use it to calculate the scan start AG. args->agno is not set until the iterator starts walking AGs. Hence we can easily set up a conditional check against the minimum AG allowed in xfs_alloc_vextent_check_args() based on whether args->agno contains NULLAGNUMBER or not and move all the repeated setup code to xfs_alloc_vextent_check_args(), further simplifying the allocation functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: fold xfs_alloc_ag_vextent() into callersDave Chinner
We don't need the multiplexing xfs_alloc_ag_vextent() provided anymore - we can just call the exact/near/size variants directly. This allows us to remove args->type completely and stop using args->fsbno as an input to the allocator algorithms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno()Dave Chinner
Move it from xfs_alloc_ag_vextent() so we can get rid of that layer. Rename xfs_alloc_vextent_set_fsbno() to xfs_alloc_vextent_finish() to indicate that it's function is finishing off the allocation that we've run now that it contains much more functionality. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: introduce xfs_alloc_vextent_prepare()Dave Chinner
Now that we have wrapper functions for each type of allocation we can ask for, we can start unravelling xfs_alloc_ag_vextent(). That is essentially just a prepare stage, the allocation multiplexer and a post-allocation accounting step is the allocation proceeded. The current xfs_alloc_vextent*() wrappers all have a prepare stage, the allocation operation and a post-allocation accounting step. We can consolidate this by moving the AG alloc prep code into the wrapper functions, the accounting code in the wrapper accounting functions, and cut out the multiplexer layer entirely. This patch consolidates the AG preparation stage. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: introduce xfs_alloc_vextent_exact_bno()Dave Chinner
Two of the callers to xfs_alloc_vextent_this_ag() actually want exact block number allocation, not anywhere-in-ag allocation. Split this out from _this_ag() as a first class citizen so no external extent allocation code needs to care about args->type anymore. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: introduce xfs_alloc_vextent_near_bno()Dave Chinner
The remaining callers of xfs_alloc_vextent() are all doing NEAR_BNO allocations. We can replace that function with a new xfs_alloc_vextent_near_bno() function that does this explicitly. We also multiplex NEAR_BNO allocations through xfs_alloc_vextent_this_ag via args->type. Replace all of these with direct calls to xfs_alloc_vextent_near_bno(), too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: use xfs_alloc_vextent_start_bno() where appropriateDave Chinner
Change obvious callers of single AG allocation to use xfs_alloc_vextent_start_bno(). Callers no long need to specify XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and removed. While doing this, also pass the allocation target fsb as a parameter rather than encoding it in args->fsbno. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: use xfs_alloc_vextent_first_ag() where appropriateDave Chinner
Change obvious callers of single AG allocation to use xfs_alloc_vextent_first_ag(). This gets rid of XFS_ALLOCTYPE_FIRST_AG as the type used within xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we can remove the setting of args->type from all the callers of _first_ag() and remove the alloctype. While doing this, pass the allocation target fsb as a parameter rather than encoding it in args->fsbno. This starts the process of making args->fsbno an output only variable rather than input/output. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: factor xfs_bmap_btalloc()Dave Chinner
There are several different contexts xfs_bmap_btalloc() handles, and large chunks of the code execute independent allocation contexts. Try to untangle this mess a bit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: use xfs_alloc_vextent_this_ag() where appropriateDave Chinner
Change obvious callers of single AG allocation to use xfs_alloc_vextent_this_ag(). Drive the per-ag grabbing out to the callers, too, so that callers with active references don't need to do new lookups just for an allocation in a context that already has a perag reference. The only remaining caller that does single AG allocation through xfs_alloc_vextent() is xfs_bmap_btalloc() with XFS_ALLOCTYPE_NEAR_BNO. That is going to need more untangling before it can be converted cleanly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextentDave Chinner
There's a bit of a recursive conundrum around xfs_alloc_ag_vextent(). We can't first call xfs_alloc_ag_vextent() without preparing the AGFL for the allocation, and preparing the AGFL calls xfs_alloc_ag_vextent() to prepare the AGFL for the allocation. This "double allocation" requirement is not really clear from the current xfs_alloc_fix_freelist() calls that are sprinkled through the allocation code. It's not helped that xfs_alloc_ag_vextent() can actually allocate from the AGFL itself, but there's special code to prevent AGFL prep allocations from allocating from the free list it's trying to prep. The naming is also not consistent: args->wasfromfl is true when we allocated _from_ the free list, but the indication that we are allocating _for_ the free list is via checking that (args->resv == XFS_AG_RESV_AGFL). So, lets make this "allocation required for allocation" situation clear by moving it all inside xfs_alloc_ag_vextent(). The freelist allocation is a specific XFS_ALLOCTYPE_THIS_AG allocation, which translated directly to xfs_alloc_ag_vextent_size() allocation. This enables us to replace __xfs_alloc_vextent_this_ag() with a call to xfs_alloc_ag_vextent(), and we drive the freelist fixing further into the per-ag allocation algorithm. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()Dave Chinner
The core of the per-ag iteration is effectively doing a "this ag" allocation on one AG at a time. Use the same code to implement the core "this ag" allocation in both xfs_alloc_vextent_this_ag() and xfs_alloc_vextent_iterate_ags(). This means we only call xfs_alloc_ag_vextent() from one place so we can easily collapse the call stack in future patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: rework xfs_alloc_vextent()Dave Chinner
It's a multiplexing mess that can be greatly simplified, and really needs to be simplified to allow active per-ag references to propagate from initial AG selection code the the bmapi code. This splits the code out into separate a parameter checking function, an iterator function, and allocation completion functions and then implements the individual policies using these functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: introduce xfs_for_each_perag_wrap()Dave Chinner
In several places we iterate every AG from a specific start agno and wrap back to the first AG when we reach the end of the filesystem to continue searching. We don't have a primitive for this iteration yet, so add one for conversion of these algorithms to per-ag based iteration. The filestream AG select code is a mess, and this initially makes it worse. The per-ag selection needs to be driven completely into the filestream code to clean this up and it will be done in a future patch that makes the filestream allocator use active per-ag references correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: perags need atomic operational stateDave Chinner
We currently don't have any flags or operational state in the xfs_perag except for the pagf_init and pagi_init flags. And the agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok flags, too. For controlling per-ag operations, we are going to need some atomic state flags. Hence add an opstate field similar to what we already have in the mount and log, and convert all these state flags across to atomic bit operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: convert xfs_ialloc_next_ag() to an atomicDave Chinner
This is currently a spinlock lock protected rotor which can be implemented with a single atomic operation. Change it to be more efficient and get rid of the m_agirotor_lock. Noticed while converting the inode allocation AG selection loop to active perag references. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: inobt can use perags in many more places than it doesDave Chinner
Lots of code in the inobt infrastructure is passed both xfs_mount and perags. We only need perags for the per-ag inode allocation code, so reduce the duplication by passing only the perags as the primary object. This ends up reducing the code size by a bit: text data bss dec hex filename orig 1138878 323979 548 1463405 16546d (TOTALS) patched 1138709 323979 548 1463236 1653c4 (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: use active perag references for inode allocationDave Chinner
Convert the inode allocation routines to use active perag references or references held by callers rather than grab their own. Also drive the perag further inwards to replace xfs_mounts when doing operations on a specific AG. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: convert xfs_imap() to take a peragDave Chinner
Callers have referenced perags but they don't pass it into xfs_imap() so it takes it's own reference. Fix that so we can change inode allocation over to using active references. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: rework the perag trace points to be perag centricDave Chinner
So that they all output the same information in the traces to make debugging refcount issues easier. This means that all the lookup/drop functions no longer need to use the full memory barrier atomic operations (atomic*_return()) so will have less overhead when tracing is off. The set/clear tag tracepoints no longer abuse the reference count to pass the tag - the tag being cleared is obvious from the _RET_IP_ that is recorded in the trace point. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: active perag reference countingDave Chinner
We need to be able to dynamically remove instantiated AGs from memory safely, either for shrinking the filesystem or paging AG state in and out of memory (e.g. supporting millions of AGs). This means we need to be able to safely exclude operations from accessing perags while dynamic removal is in progress. To do this, introduce the concept of active and passive references. Active references are required for high level operations that make use of an AG for a given operation (e.g. allocation) and pin the perag in memory for the duration of the operation that is operating on the perag (e.g. transaction scope). This means we can fail to get an active reference to an AG, hence callers of the new active reference API must be able to handle lookup failure gracefully. Passive references are used in low level code, where we might need to access the perag structure for the purposes of completing high level operations. For example, buffers need to use passive references because: - we need to be able to do metadata IO during operations like grow and shrink transactions where high level active references to the AG have already been blocked - buffers need to pin the perag until they are reclaimed from memory, something that high level code has no direct control over. - unused cached buffers should not prevent a shrink from being started. Hence we have active references that will form exclusion barriers for operations to be performed on an AG, and passive references that will prevent reclaim of the perag until all objects with passive references have been reclaimed themselves. This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API for active AG reference functionality. We also need to convert the for_each_perag*() iterators to use active references, which will start the process of converting high level code over to using active references. Conversion of non-iterator based code to active references will be done in followup patches. Note that the implementation using reference counting is really just a development vehicle for the API to ensure we don't have any leaks in the callers. Once we need to remove perag structures from memory dyanmically, we will need a much more robust per-ag state transition mechanism for preventing new references from being taken while we wait for existing references to drain before removal from memory can occur.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-12Linux 6.2-rc8v6.2-rc8Linus Torvalds
2023-02-12MAINTAINERS: Add myself as maintainer for arch/sh (SUPERH)John Paul Adrian Glaubitz
Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh for a while. As I have been maintaining Debian's sh4 port since 2014, I am interested to keep the architecture alive. Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12Merge tag 'trace-v6.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix showing of TASK_COMM_LEN instead of its value The TASK_COMM_LEN was converted from a macro into an enum so that BTF would have access to it. But this unfortunately caused TASK_COMM_LEN to display in the format fields of trace events, as they are created by the TRACE_EVENT() macro and such, macros convert to their values, where as enums do not. To handle this, instead of using the field itself to be display, save the value of the array size as another field in the trace_event_fields structure, and use that instead. Not only does this fix the issue, but also converts the other trace events that have this same problem (but were not breaking tooling). With this change, the original work around b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well") could be reverted (but that should be done in the merge window)" * tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12i2c: i801: add helper i801_set_hstadd()Heiner Kallweit
Factor out setting SMBHSTADD to a helper. The current code makes the assumption that constant I2C_SMBUS_READ has bit 0 set, that's not ideal. Therefore let the new helper explicitly check for I2C_SMBUS_READ. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12i2c: i801: make FEATURE_BLOCK_PROC dependent on FEATURE_BLOCK_BUFFERHeiner Kallweit
According to the datasheet the block process call requires block buffer mode. The user may disable block buffer mode by module parameter disable_features, in such a case we have to clear FEATURE_BLOCK_PROC. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12i2c: i801: make FEATURE_HOST_NOTIFY dependent on FEATURE_IRQHeiner Kallweit
Host notification uses an interrupt, therefore it makes sense only if interrupts are enabled. Make this dependency explicit by removing FEATURE_HOST_NOTIFY if FEATURE_IRQ isn't set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12i2c: i801: improve interrupt handlerHeiner Kallweit
Not sure if it can happen, but better play safe: If SMBHSTSTS_BYTE_DONE and an error flag is set, then don't trust the result and skip calling i801_isr_byte_done(). In addition clear status bit SMBHSTSTS_BYTE_DONE in the main interrupt handler, this allows to simplify the code a little. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12i2c: st: use pm_sleep_ptr to avoid ifdef CONFIG_PM_SLEEPAlain Volmat
Rely on pm_sleep_ptr when setting the pm ops and get rid of the ifdef CONFIG_PM_SLEEP around suspend/resume functions. Signed-off-by: Alain Volmat <avolmat@me.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12dt-bindings: i2c: uniphier: Add resets propertyKunihiko Hayashi
UniPhier I2C controller allows reset control support. Add resets property to the controller as optional. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-12remoteproc: mediatek: Check the SCP image formatTinghan Shen
Do a sanity check on the SCP image before loading it to avoid driver crashes. Signed-off-by: Tinghan Shen <tinghan.shen@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230210031354.1335-1-tinghan.shen@mediatek.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2023-02-12Merge tag 'for-6.2-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - one more fix for a tree-log 'write time corruption' report, update the last dir index directly and don't keep in the log context - do VFS-level inode lock around FIEMAP to prevent a deadlock with concurrent fsync, the extent-level lock is not sufficient - don't cache a single-device filesystem device to avoid cases when a loop device is reformatted and the entry gets stale * tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: free device in btrfs_close_devices for a single device filesystem btrfs: lock the inode in shared mode before starting fiemap btrfs: simplify update of last_dir_index_offset when logging a directory
2023-02-12Merge tag 'usb-6.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are 2 small USB driver fixes that resolve some reported regressions and one new device quirk. Specifically these are: - new quirk for Alcor Link AK9563 smartcard reader - revert of u_ether gadget change in 6.2-rc1 that caused problems - typec pin probe fix All of these have been in linux-next with no reported problems" * tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: core: add quirk for Alcor Link AK9563 smartcard reader usb: typec: altmodes/displayport: Fix probe pin assign check Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
2023-02-12Merge tag 'efi-fixes-for-v6.2-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "A fix from Darren to widen the SMBIOS match for detecting Ampere Altra machines with problematic firmware. In the mean time, we are working on a more precise check, but this is still work in progress" * tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
2023-02-12Merge tag 'powerpc-6.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix interrupt exit race with security mitigation switching. - Don't select ARCH_WANTS_NO_INSTR until warnings are fixed. - Build fix for CONFIG_NUMA=n. Thanks to Nicholas Piggin, Randy Dunlap, and Sachin Sant. * tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch powerpc/kexec_file: fix implicit decl error powerpc: Don't select ARCH_WANTS_NO_INSTR
2023-02-12Fix page corruption caused by racy check in __free_pagesDavid Chen
When we upgraded our kernel, we started seeing some page corruption like the following consistently: BUG: Bad page state in process ganesha.nfsd pfn:1304ca page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca flags: 0x17ffffc0000000() raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000 raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Call Trace: dump_stack+0x74/0x96 bad_page.cold+0x63/0x94 check_new_page_bad+0x6d/0x80 rmqueue+0x46e/0x970 get_page_from_freelist+0xcb/0x3f0 ? _cond_resched+0x19/0x40 __alloc_pages_nodemask+0x164/0x300 alloc_pages_current+0x87/0xf0 skb_page_frag_refill+0x84/0x110 ... Sometimes, it would also show up as corruption in the free list pointer and cause crashes. After bisecting the issue, we found the issue started from commit e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages"): if (put_page_testzero(page)) free_the_page(page, order); else if (!PageHead(page)) while (order-- > 0) free_the_page(page + (1 << order), order); So the problem is the check PageHead is racy because at this point we already dropped our reference to the page. So even if we came in with compound page, the page can already be freed and PageHead can return false and we will end up freeing all the tail pages causing double free. Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages") Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12RDMA/umem: Remove unused 'work' member from struct ib_umemJason Gunthorpe
It is not used now. Fixes: b95df5e3e459 ("drivers/IB,core: reduce scope of mmap_sem") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-22a2667fa089+a3-umem_work_jgg@nvidia.com Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-12tracing: Fix TASK_COMM_LEN in trace event format fileYafang Shao
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Kajetan Puchalski <kajetan.puchalski@arm.com> CC: Qais Yousef <qyousef@layalina.io> Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <jstultz@google.com> Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-12hwmon: (mlxreg-fan) Return zero speed for broken fanVadim Pasternak
Currently for broken fan driver returns value calculated based on error code (0xFF) in related fan speed register. Thus, for such fan user gets fan{n}_fault to 1 and fan{n}_input with misleading value. Add check for fan fault prior return speed value and return zero if fault is detected. Fixes: 65afb4c8e7e4 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20230212145730.24247-1-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-02-12dm table: check that a dm device doesn't reference itselfBenjamin Marzinski
If a DM device's table references itself, it will crash the kernel with an infinite recursion. Check for a self-reference in dm_get_device(). This is a quick check, but it won't catch more complicated circular references. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-12dm raid: fix some spelling mistakes in commentsYu Zhe
Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-12watchdog: wdat_wdt: Avoid unimplemented get_timeleftThomas Weißschuh
As per the specification the action QUERY_COUNTDOWN_PERIOD is optional. If the action is not implemented by the physical device the driver would always report "0" from get_timeleft(). Avoid confusing userspace by only providing get_timeleft() when implemented by the hardware. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20221221-wdat_wdt-timeleft-v1-1-8e8a314c36cc@weissschuh.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2023-02-12watchdog: apple: Use devm_clk_get_enabled() helperChristophe JAILLET
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code and avoids the need of a dedicated function used with devm_add_action_or_reset(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/6f312af6160d1e10b616c9adbd1fd8f822db964d.1672473415.git.christophe.jaillet@wanadoo.fr Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2023-02-12watchdog: visconti: Use devm_clk_get_enabled() helperChristophe JAILLET
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code and avoids the need of a dedicated function used with devm_add_action_or_reset(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/13e8cdf17556da111d1d98a8fe0b1dc1c78007e2.1672417940.git.christophe.jaillet@wanadoo.fr Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2023-02-12watchdog: rzn1: Use devm_clk_get_enabled() helperChristophe JAILLET
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code and avoids the need of a dedicated function used with devm_add_action_or_reset(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b1f8b5453791035ad534bd5ed36b49798ff4d9b2.1672418166.git.christophe.jaillet@wanadoo.fr Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>