summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-22selftests/landlock: Add tests to check unknown rule's access rightsMickaël Salaün
Add two tests to make sure that we cannot add a rule with access rights that are unknown: * fs: layout0.rule_with_unknown_access * net: mini.rule_with_unknown_access Rename unknown_access_rights tests to ruleset_with_unknown_access . Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Reviewed-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20231130093616.67340-2-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-12-22octeontx2-af: Fix a double free issueSuman Ghosh
There was a memory leak during error handling in function npc_mcam_rsrcs_init(). Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22eventfs: Fix file and directory uid and gid ownershipSteven Rostedt (Google)
It was reported that when mounting the tracefs file system with a gid other than root, the ownership did not carry down to the eventfs directory due to the dynamic nature of it. A fix was done to solve this, but it had two issues. (a) if the attr passed into update_inode_attr() was NULL, it didn't do anything. This is true for files that have not had a chown or chgrp done to itself or any of its sibling files, as the attr is allocated for all children when any one needs it. # umount /sys/kernel/tracing # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt # ls -ld /mnt/events/sched drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/ # ls -ld /mnt/events/sched/sched_switch drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/ But when checking the files: # ls -l /mnt/events/sched/sched_switch total 0 -rw-r----- 1 root root 0 Dec 21 13:12 enable -rw-r----- 1 root root 0 Dec 21 13:12 filter -r--r----- 1 root root 0 Dec 21 13:12 format -r--r----- 1 root root 0 Dec 21 13:12 hist -r--r----- 1 root root 0 Dec 21 13:12 id -rw-r----- 1 root root 0 Dec 21 13:12 trigger (b) When the attr does not denote the UID or GID, it defaulted to using the parent uid or gid. This is incorrect as changing the parent uid or gid will automatically change all its children. # chgrp tracing /mnt/events/timer # ls -ld /mnt/events/timer drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer # ls -l /mnt/events/timer total 0 -rw-r----- 1 root root 0 Dec 21 14:35 enable -rw-r----- 1 root root 0 Dec 21 14:35 filter drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start At first it was thought that this could be easily fixed by just making the default ownership of the superblock when it was mounted. But this does not handle the case of: # chgrp tracing instances # mkdir instances/foo If the superblock was used, then the group ownership would be that of what it was when it was mounted, when it should instead be "tracing". Instead, set a flag for the top level eventfs directory ("events") to flag which eventfs_inode belongs to it. Since the "events" directory's dentry and inode are never freed, it does not need to use its attr field to restore its mode and ownership. Use the this eventfs_inode's attr as the default ownership for all the files and directories underneath it. When the events eventfs_inode is created, it sets its ownership to its parent uid and gid. As the events directory is created at boot up before it gets mounted, this will always be uid=0 and gid=0. If it's created via an instance, then it will take the ownership of the instance directory. When the file system is mounted, it will update all the gids if one is specified. This will have a callback to update the events evenfs_inode's default entries. When a file or directory is created under the events directory, it will walk the ei->dentry parents until it finds the evenfs_inode that belongs to the events directory to retrieve the default uid and gid values. Link: https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=Gtw+AnwScyLhpFswrQ84PjhoGsg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20231221190757.7eddbca9@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Dongliang Cui <cuidongliang390@gmail.com> Cc: Hongyu Jin <hongyu.jin@unisoc.com> Fixes: 0dfc852b6fe3 ("eventfs: Have event files and directories default to parent uid and gid") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-22Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== intel: use bitfield operations Jesse Brandeburg says: After repeatedly getting review comments on new patches, and sporadic patches to fix parts of our drivers, we should just convert the Intel code to use FIELD_PREP() and FIELD_GET(). It's then "common" in the code and hopefully future change-sets will see the context and do-the-right-thing. This conversion was done with a coccinelle script which is mentioned in the commit messages. Generally there were only a couple conversions that were "undone" after the automatic changes because they tried to convert a non-contiguous mask. Patch 1 is required at the beginning of this series to fix a "forever" issue in the e1000e driver that fails the compilation test after conversion because the shift / mask was out of range. The second patch just adds all the new #includes in one go. The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the use of the FIELD_* macros and fix up the unexpected "shifts included" defines found while creating this series. The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the occasional leXX_{get,set,encode}_bits() call, as suggested by Alex. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22Merge tag 'nand/for-6.8' into mtd/nextMiquel Raynal
* Raw NAND The most meaningful change being the conversion of the brcmnand driver to the ->exec_op() API, this series brought additional changes to the core in order to help controller drivers to handle themselves the WP pin during destructive operations when relevant. As always, there is as well a whole bunch of miscellaneous W=1 fixes, together with a few runtime fixes (double free, timeout value, OOB layout, missing register initialization) and the usual load of remove callbacks turned into void (which led to switch the txx9ndfmc driver to use module_platform_driver()).
2023-12-22Merge tag 'spi-nor/for-6.8' into mtd/nextMiquel Raynal
SPI NOR comes with die erase support for multi die flashes, with new octal protocols (1-1-8 and 1-8-8) parsed from SFDP and with an updated documentation about what the contributors shall consider when proposing flash additions or updates. Michael Walle stepped out from the reviewer role to maintainer.
2023-12-22mtd: rawnand: Clarify conditions to enable continuous readsMiquel Raynal
The current logic is probably fine but is a bit convoluted. Plus, we don't want partial pages to be part of the sequential operation just in case the core would optimize the page read with a subpage read (which would break the sequence). This may happen on the first and last page only, so if the start offset or the end offset is not aligned with a page boundary, better avoid them to prevent any risk. Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Martin Hundebøll <martin@geanix.com> Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-5-miquel.raynal@bootlin.com
2023-12-22mtd: rawnand: Prevent sequential reads with on-die ECC enginesMiquel Raynal
Some devices support sequential reads when using the on-die ECC engines, some others do not. It is a bit hard to know which ones will break other than experimentally, so in order to avoid such a difficult and painful task, let's just pretend all devices should avoid using this optimization when configured like this. Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Martin Hundebøll <martin@geanix.com> Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-4-miquel.raynal@bootlin.com
2023-12-22mtd: rawnand: Fix core interference with sequential readsMiquel Raynal
A couple of reports pointed at some strange failures happening a bit randomly since the introduction of sequential page reads support. After investigation it turned out the most likely reason for these issues was the fact that sometimes a (longer) read might happen, starting at the same page that was read previously. This is optimized by the raw NAND core, by not sending the READ_PAGE command to the NAND device and just reading out the data in a local cache. When this page is also flagged as being the starting point for a sequential read, it means the page right next will be accessed without the right instructions. The NAND chip will be confused and will not output correct data. In order to avoid such situation from happening anymore, we can however handle this case with a bit of additional logic, to postpone the initialization of the read sequence by one page. Reported-by: Alexander Shiyan <eagle.alexander923@gmail.com> Closes: https://lore.kernel.org/linux-mtd/CAP1tNvS=NVAm-vfvYWbc3k9Cx9YxMc2uZZkmXk8h1NhGX877Zg@mail.gmail.com/ Reported-by: Måns Rullgård <mans@mansr.com> Closes: https://lore.kernel.org/linux-mtd/yw1xfs6j4k6q.fsf@mansr.com/ Reported-by: Martin Hundebøll <martin@geanix.com> Closes: https://lore.kernel.org/linux-mtd/9d0c42fcde79bfedfe5b05d6a4e9fdef71d3dd52.camel@geanix.com/ Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Martin Hundebøll <martin@geanix.com> Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-3-miquel.raynal@bootlin.com
2023-12-22mtd: rawnand: Prevent crossing LUN boundaries during sequential readsMiquel Raynal
The ONFI specification states that devices do not need to support sequential reads across LUN boundaries. In order to prevent such event from happening and possibly failing, let's introduce the concept of "pause" in the sequential read to handle these cases. The first/last pages remain the same but any time we cross a LUN boundary we will end and restart (if relevant) the sequential read operation. Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Martin Hundebøll <martin@geanix.com> Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-2-miquel.raynal@bootlin.com
2023-12-22mtd: Fix gluebi NULL pointer dereference caused by ftl notifierZhaoLong Wang
If both ftl.ko and gluebi.ko are loaded, the notifier of ftl triggers NULL pointer dereference when trying to access ‘gluebi->desc’ in gluebi_read(). ubi_gluebi_init ubi_register_volume_notifier ubi_enumerate_volumes ubi_notify_all gluebi_notify nb->notifier_call() gluebi_create mtd_device_register mtd_device_parse_register add_mtd_device blktrans_notify_add not->add() ftl_add_mtd tr->add_mtd() scan_header mtd_read mtd_read_oob mtd_read_oob_std gluebi_read mtd->read() gluebi->desc - NULL Detailed reproduction information available at the Link [1], In the normal case, obtain gluebi->desc in the gluebi_get_device(), and access gluebi->desc in the gluebi_read(). However, gluebi_get_device() is not executed in advance in the ftl_add_mtd() process, which leads to NULL pointer dereference. The solution for the gluebi module is to run jffs2 on the UBI volume without considering working with ftl or mtdblock [2]. Therefore, this problem can be avoided by preventing gluebi from creating the mtdblock device after creating mtd partition of the type MTD_UBIVOLUME. Fixes: 2ba3d76a1e29 ("UBI: make gluebi a separate module") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217992 [1] Link: https://lore.kernel.org/lkml/441107100.23734.1697904580252.JavaMail.zimbra@nod.at/ [2] Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231220024619.2138625-1-wangzhaolong1@huawei.com
2023-12-22dt-bindings: mtd: partitions: u-boot: Fix typoStefan Wahren
The initial description contained a typo. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231218130656.9020-1-wahrenst@gmx.net
2023-12-22netfilter: nf_tables: validate chain type update if availablePablo Neira Ayuso
Parse netlink attribute containing the chain type in this update, to bail out if this is different from the existing type. Otherwise, it is possible to define a chain with the same name, hook and priority but different type, which is silently ignored. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: ctnetlink: support filtering by zoneFelix Huettner
conntrack zones are heavily used by tools like openvswitch to run multiple virtual "routers" on a single machine. In this context each conntrack zone matches to a single router, thereby preventing overlapping IPs from becoming issues. In these systems it is common to operate on all conntrack entries of a given zone, e.g. to delete them when a router is deleted. Previously this required these tools to dump the full conntrack table and filter out the relevant entries in userspace potentially causing performance issues. To do this we reuse the existing CTA_ZONE attribute. This was previous parsed but not used during dump and flush requests. Now if CTA_ZONE is set we filter these operations based on the provided zone. However this means that users that previously passed CTA_ZONE will experience a difference in functionality. Alternatively CTA_FILTER could have been used for the same functionality. However it is not yet supported during flush requests and is only available when using AF_INET or AF_INET6. Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> Co-developed-by: Max Lamprecht <max.lamprecht@mail.schwarz> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: nf_tables: mark newset as dead on transaction abortFlorian Westphal
If a transaction is aborted, we should mark the to-be-released NEWSET dead, just like commit path does for DEL and DESTROYSET commands. In both cases all remaining elements will be released via set->ops->destroy(). The existing abort code does NOT post the actual release to the work queue. Also the entire __nf_tables_abort() function is wrapped in gc_seq begin/end pair. Therefore, async gc worker will never try to release the pending set elements, as gc sequence is always stale. It might be possible to speed up transaction aborts via work queue too, this would result in a race and a possible use-after-free. So fix this before it becomes an issue. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: flowtable: reorder nf_flowtable struct membersFlorian Westphal
Place the read-mostly parts accessed by the datapath first. In particular, we do access ->flags member (to see if HW offload is enabled) for every single packet, but this is placed in the 5th cacheline. priority could stay where it is, but move it too to cover a hole. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: nft_set_pipapo: prefer gfp_kernel allocationFlorian Westphal
No need to use GFP_ATOMIC here. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requestsPhil Sutter
Set expressions' dump callbacks are not concurrency-safe per-se with reset bit set. If two CPUs reset the same element at the same time, values may underrun at least with element-attached counters and quotas. Prevent this by introducing dedicated callbacks for nfnetlink and the asynchronous dump handling to serialize access. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: nf_tables: Introduce nft_set_dump_ctx_init()Phil Sutter
This is a wrapper around nft_ctx_init() for use in nf_tables_getsetelem() and a resetting equivalent introduced later. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22netfilter: nf_tables: Pass const set to nft_get_set_elemPhil Sutter
The function is not supposed to alter the set, passing the pointer as const is fine and merely requires to adjust signatures of two called functions as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22efi: memmap: fix kernel-doc warningsRandy Dunlap
Correct all kernel-doc notation to repair warnings that are reported by scripts/kernel-doc: memmap.c:38: warning: No description found for return value of '__efi_memmap_init' memmap.c:82: warning: No description found for return value of 'efi_memmap_init_early' memmap.c:132: warning: Function parameter or member 'addr' not described in 'efi_memmap_init_late' memmap.c:132: warning: Excess function parameter 'phys_addr' description in 'efi_memmap_init_late' memmap.c:132: warning: No description found for return value of 'efi_memmap_init_late' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-efi@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-12-22Merge tag 'usb-serial-6.7-rc6' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial device ids for 6.7-rc6 Here are some new modem device ids and a rename of a few ftdi product id defines. All have been in linux-next with no reported issues. * tag 'usb-serial-6.7-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add Quectel EG912Y module support USB: serial: ftdi_sio: update Actisense PIDs constant names USB: serial: option: add Quectel RM500Q R13 firmware support USB: serial: option: add Foxconn T99W265 with new baseline
2023-12-22debugfs: initialize cancellations earlierJohannes Berg
Tetsuo Handa pointed out that in the (now reverted) lockdep commit I initialized the data too late. The same is true for the cancellation data, it must be initialized before the cmpxchg(), otherwise it may be done twice and possibly even overwriting data in there already when there's a race. Fix that, which also requires destroying the mutex in case we lost the race. Fixes: 8c88a474357e ("debugfs: add API to allow debugfs operations cancellation") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20231221150444.1e47a0377f80.If7e8ba721ba2956f12c6e8405e7d61e154aa7ae7@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-22xfs: fold xfs_rtallocate_extent into xfs_bmap_rtallocChristoph Hellwig
There isn't really much left in xfs_rtallocate_extent now, fold it into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: simplify and optimize the RT allocation fallback cascadeChristoph Hellwig
There are currently multiple levels of fall back if an RT allocation can not be satisfied: 1) xfs_rtallocate_extent extends the minlen and reduces the maxlen due to the extent size hint. If that can't be done, it return -ENOSPC and let's xfs_bmap_rtalloc retry, which then not only drops the extent size hint based alignment, but also the minlen adjustment 2) if xfs_rtallocate_extent gets -ENOSPC from the underlying functions, it only drops the extent size hint based alignment and retries 3) if that still does not succeed, xfs_rtallocate_extent drops the extent size hint (which is a complex no-op at this point) and the minlen using the same code as (1) above 4) if that still doesn't success and the caller wanted an allocation near a blkno, drop that blkno hint. The handling in 1 is rather inefficient as we could just drop the alignment and continue, and 2/3 interact in really weird ways due to the duplicate policy. Move aligning the min and maxlen out of xfs_rtallocate_extent and into a helper called directly by xfs_bmap_rtalloc. This allows just continuing with the allocation if we have to drop the alignment instead of going through the retry loop and also dropping the perfectly usable minlen adjustment that didn't cause the problem, and then just use a single retry that drops both the minlen and alignment requirement when we really are out of space, thus consolidating cases (2) and (3) above. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: reorder the minlen and prod calculations in xfs_bmap_rtallocChristoph Hellwig
xfs_bmap_rtalloc is a bit of a mess in terms of calculating the locally need variables. Reorder them a bit so that related code is located next to each other - the raminlen calculation moves up next to where the maximum len is calculated, and all the prod calculation is move into a single place and rearranged so that the real prod calculation only happens when it actually is needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: remove XFS_RTMIN/XFS_RTMAXChristoph Hellwig
Use the kernel min/max helpers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: remove rt-wrappers from xfs_format.hChristoph Hellwig
xfs_format.h has a bunch odd wrappers for helper functions and mount structure access using RT* prefixes. Replace them with their open coded versions (for those that weren't entirely unused) and remove the wrappers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: factor out a xfs_rtalloc_sumlevel helperChristoph Hellwig
xfs_rtallocate_extent_size has two loops with nearly identical logic in them. Split that logic into a separate xfs_rtalloc_sumlevel helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: tidy up xfs_rtallocate_extent_exactChristoph Hellwig
Use common code for both xfs_rtallocate_range calls by moving the !isfree logic into the non-default branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_blockChristoph Hellwig
Use a goto to use a common tail for the case of being able to allocate an extent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: reflow the tail end of xfs_rtallocate_extent_blockChristoph Hellwig
Change polarity of a check so that the successful case of being able to allocate an extent is in the main path of the function and error handling is on a branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: invert a check in xfs_rtallocate_extent_blockChristoph Hellwig
Doing a break in the else side of a conditional is rather silly. Invert the check, break ASAP and unindent the other leg. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: split xfs_rtmodify_summary_intChristoph Hellwig
Inline the logic of xfs_rtmodify_summary_int into xfs_rtmodify_summary and xfs_rtget_summary instead of having a somewhat awkward helper to share a little bit of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: move xfs_rtget_summary to xfs_rtbitmap.cChristoph Hellwig
xfs_rtmodify_summary_int is only used inside xfs_rtbitmap.c and to implement xfs_rtget_summary. Move xfs_rtget_summary to xfs_rtbitmap.c as the exported API and mark xfs_rtmodify_summary_int static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: cleanup picking the start extent hint in xfs_bmap_rtallocChristoph Hellwig
Clean up the logical in xfs_bmap_rtalloc that tries to find a rtextent to start the search from by using a separate variable for the hint, not calling xfs_bmap_adjacent when we want to ignore the locality and avoid an extra roundtrip converting between block numbers and RT extent numbers. As a side-effect this doesn't pointlessly call xfs_rtpick_extent and increment the start rtextent hint if we are going to ignore the result anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: indicate if xfs_bmap_adjacent changed ap->blknoChristoph Hellwig
Add a return value to xfs_bmap_adjacent to indicate if it did change ap->blkno or not. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: reflow the tail end of xfs_bmap_rtallocChristoph Hellwig
Reorder the tail end of xfs_bmap_rtalloc so that the successfully allocation is in the main path, and the error handling is on a branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: return -ENOSPC from xfs_rtallocate_*Christoph Hellwig
Just return -ENOSPC instead of returning 0 and setting the return rt extent number to NULLRTEXTNO. This is turn removes all users of NULLRTEXTNO, so remove that as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: move xfs_bmap_rtalloc to xfs_rtalloc.cChristoph Hellwig
xfs_bmap_rtalloc is currently in xfs_bmap_util.c, which is a somewhat odd spot for it, given that is only called from xfs_bmap.c and calls into xfs_rtalloc.c to do the actual work. Move xfs_bmap_rtalloc to xfs_rtalloc.c and mark xfs_rtpick_extent xfs_rtallocate_extent and xfs_rtallocate_extent static now that they aren't called from outside of xfs_rtalloc.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: also use xfs_bmap_btalloc_accounting for RT allocationsChristoph Hellwig
Make xfs_bmap_btalloc_accounting more generic by handling the RT quota reservations and then also use it from xfs_bmap_rtalloc instead of open coding the accounting logic there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: remove the xfs_alloc_arg argument to xfs_bmap_btalloc_accountingChristoph Hellwig
xfs_bmap_btalloc_accounting only uses the len field from args, but that has just been propagated to ap->length field by the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: turn the xfs_trans_mod_dquot_byino stub into an inline functionChristoph Hellwig
Without this upcoming change can cause an unused variable warning, when adding a local variable for the fields field passed to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: consider minlen sized extents in xfs_rtallocate_extent_blockChristoph Hellwig
minlen is the lower bound on the extent length that the caller can accept, and maxlen is at this point the maximal available length. This means a minlen extent is perfectly fine to use, so do it. This matches the equivalent logic in xfs_rtallocate_extent_exact that also accepts a minlen sized extent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs/health: cleanup, remove duplicated includingWang Jinchao
remove the second ones: \#include "xfs_trans_resv.h" \#include "xfs_mount.h" Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: fix perag leak when growfs failsLong Li
During growfs, if new ag in memory has been initialized, however sb_agcount has not been updated, if an error occurs at this time it will cause perag leaks as follows, these new AGs will not been freed during umount , because of these new AGs are not visible(that is included in mp->m_sb.sb_agcount). unreferenced object 0xffff88810be40200 (size 512): comm "xfs_growfs", pid 857, jiffies 4294909093 hex dump (first 32 bytes): 00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 381741e2): [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0 [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0 [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810 [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0 [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0 [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0 [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0 [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a unreferenced object 0xffff88810be40800 (size 512): comm "xfs_growfs", pid 857, jiffies 4294909093 hex dump (first 32 bytes): 20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W....... 10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................ backtrace (crc bde50e2d): [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540 [<ffffffff81814489>] kvmalloc_node+0x99/0x160 [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400 [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760 [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810 [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0 [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0 [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0 [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0 [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a Factor out xfs_free_unused_perag_range() from xfs_initialize_perag(), used for freeing unused perag within a specified range in error handling, included in the error path of the growfs failure. Fixes: 1c1c6ebcf528 ("xfs: Replace per-ag array with a radix tree") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: add lock protection when remove perag from radix treeLong Li
Take mp->m_perag_lock for deletions from the perag radix tree in xfs_initialize_perag to prevent racing with tagging operations. Lookups are fine - they are RCU protected so already deal with the tree changing shape underneath the lookup - but tagging operations require the tree to be stable while the tags are propagated back up to the root. Right now there's nothing stopping radix tree tagging from operating while a growfs operation is progress and adding/removing new entries into the radix tree. Hence we can have traversals that require a stable tree occurring at the same time we are removing unused entries from the radix tree which causes the shape of the tree to change. Likely this hasn't caused a problem in the past because we are only doing append addition and removal so the active AG part of the tree is not changing shape, but that doesn't mean it is safe. Just making the radix tree modifications serialise against each other is obviously correct. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-21bcachefs: Fix leakage of internal error codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-21bcachefs: Fix insufficient disk reservation with compression + snapshotsKent Overstreet
When overwriting and splitting existing extents, we weren't correctly accounting for a 3 way split of a compressed extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-22crypto: skcipher - Pass statesize for simple lskcipher instancesHerbert Xu
When ecb is used to wrap an lskcipher, the statesize isn't set correctly. Fix this by making the simple instance creator set the statesize. Reported-by: syzbot+8ffb0839a24e9c6bfa76@syzkaller.appspotmail.com Reported-by: Edward Adam Davis <eadavis@qq.com> Fixes: 662ea18d089b ("crypto: skcipher - Make use of internal state") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>