linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-09-05	Merge tag 'usb-5.15-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull more USB updates from Greg KH: "Here are some straggler USB-serial changes for 5.15-rc1. These were not included in the first pull request as they came in "late" from Johan and I had missed them in my pull request earlier this week. Nothing big in here, just some USB to serial driver updates and fixes. All of these were in linux-next before I pulled them into my tree, and have been in linux-next all this week from my tree with no reported problems" * tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: pl2303: fix GL type detection USB: serial: replace symbolic permissions by octal permissions USB: serial: cp210x: determine fw version for CP2105 and CP2108 USB: serial: cp210x: clean up type detection USB: serial: cp210x: clean up set-chars request USB: serial: cp210x: clean up control-request timeout USB: serial: cp210x: fix flow-control error handling USB: serial: cp210x: fix control-characters error handling USB: serial: io_edgeport: drop unused descriptor helper
2021-09-05	net: dsa: b53: Fix IMP port setup on BCM5301x	Rafał Miłecki
	Broadcom's b53 switches have one IMP (Inband Management Port) that needs to be programmed using its own designed register. IMP port may be different than CPU port - especially on devices with multiple CPU ports. For that reason it's required to explicitly note IMP port index and check for it when choosing a register to use. This commit fixes BCM5301x support. Those switches use CPU port 5 while their IMP port is 8. Before this patch b53 was trying to program port 5 with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5). It may be possible to also replace "cpu_port" usages with dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix. Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05	ip_gre: validate csum_start only on pull	Willem de Bruijn
	The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05	Merge tag 'mtd/for-5.15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD updates from Miquel Raynal: "MTD changes: - blkdevs: - Simplify the refcounting in blktrans_{open, release} - Simplify blktrans_getgeo - Remove blktrans_ref_mutex - Simplify blktrans_dev_get - Use lockdep_assert_held - Don't hold del_mtd_blktrans_dev in blktrans_{open, release} - ftl: - Don't cast away the type when calling add_mtd_blktrans_dev - Don't cast away the type when calling add_mtd_blktrans_dev - Use container_of() rather than cast - Fix use-after-free - Add discard support - Allow use of MTD_RAM for testing purposes - concat: - Check _read, _write callbacks existence before assignment - Judge callback existence based on the master - maps: - Maps: remove dead MTD map driver for PMC-Sierra MSP boards - mtdblock: - Warn if added for a NAND device - Add comment about UBI block devices - Update old JFFS2 mention in Kconfig - partitions: - Redboot: convert to YAML NAND core changes: - Repair Miquel Raynal's email address in MAINTAINERS - Fix a couple of spelling mistakes in Kconfig - bbt: Skip bad blocks when searching for the BBT in NAND - Remove never changed ret variable Raw NAND changes: - cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()' - intel: Fix error handling in probe - omap: Fix kernel doc warning on 'calcuate' typo - gpmc: Fix the ECC bytes vs. OOB bytes equation SPI-NAND core changes: - Properly fill the OOB area. - Fix comment SPI-NAND drivers changes: - macronix: Add Quad support for serial NAND flash" * tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (30 commits) mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()' mtd_blkdevs: simplify the refcounting in blktrans_{open, release} mtd_blkdevs: simplify blktrans_getgeo mtd_blkdevs: remove blktrans_ref_mutex mtd_blkdevs: simplify blktrans_dev_get mtd/rfd_ftl: don't cast away the type when calling add_mtd_blktrans_dev mtd/ftl: don't cast away the type when calling add_mtd_blktrans_dev mtd_blkdevs: use lockdep_assert_held mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release} mtd: rawnand: intel: Fix error handling in probe mtd: mtdconcat: Check _read, _write callbacks existence before assignment mtd: mtdconcat: Judge callback existence based on the master mtd: maps: remove dead MTD map driver for PMC-Sierra MSP boards mtd: rfd_ftl: use container_of() rather than cast mtd: rfd_ftl: fix use-after-free mtd: rfd_ftl: add discard support mtd: rfd_ftl: allow use of MTD_RAM for testing purposes mtdblock: Warn if added for a NAND device mtd: spinand: macronix: Add Quad support for serial NAND flash mtdblock: Add comment about UBI block devices ...
2021-09-05	binfmt: a.out: Fix bogus semicolon	Geert Uytterhoeven
	fs/binfmt_aout.c: In function ‘load_aout_library’: fs/binfmt_aout.c:311:27: error: expected ‘)’ before ‘;’ token 311 \| MAP_FIXED \| MAP_PRIVATE; \| ^ fs/binfmt_aout.c:309:10: error: too few arguments to function ‘vm_mmap’ 309 \| error = vm_mmap(file, start_addr, ex.a_text + ex.a_data, \| ^~~~~~~ In file included from fs/binfmt_aout.c:12: include/linux/mm.h:2626:35: note: declared here 2626 \| extern unsigned long __must_check vm_mmap(struct file *, unsigned long, \| ^~~~~~~ Fix this by reverting the accidental replacement of a comma by a semicolon. Fixes: 42be8b42535183f8 ("binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()") Reported-by: noreply@ellerman.id.au Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-05	bonding: complain about missing route only once for A/B ARP probes	David Decotigny
	On configs where there is no confirgured direct route to the target of the ARP probes, these probes are still sent and may be replied to properly, so no need to repeatedly complain about the missing route. Signed-off-by: David Decotigny <ddecotig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05	ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address	Antonio Quartulli
	GRE interfaces are not Ether-like and therefore it is not possible to generate the v6LL address the same way as (for example) GRETAP devices. With default settings, a GRE interface will attempt generating its v6LL address using the EUI64 approach, but this will fail when the local endpoint of the GRE tunnel is set to "any". In this case the GRE interface will end up with no v6LL address, thus violating RFC4291. SIT interfaces already implement a different logic to ensure that a v6LL address is always computed. Change the GRE v6LL generation logic to follow the same approach as SIT. This way GRE interfaces will always have a v6LL address as well. Behaviour of GRETAP interfaces has not been changed as they behave like classic Ether-like interfaces. To avoid code duplication sit_add_v4_addrs() has been renamed to add_v4_addrs() and adapted to handle also the IP6GRE/GRE cases. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05	net: stmmac: Fix overall budget calculation for rxtx_napi	Song Yoong Siang
	tx_done is not used for napi_complete_done(). Thus, NAPI busy polling mechanism by gro_flush_timeout and napi_defer_hard_irqs will not able be triggered after a packet is transmitted when there is no receive packet. Fix this by taking the maximum value between tx_done and rx_done as overall budget completed by the rxtx NAPI poll to ensure XDP Tx ZC operation is continuously polling for next Tx frame. This gives benefit of lower packet submission processing latency and jitter under XDP Tx ZC mode. Performance of tx-only using xdp-sock on Intel ADL-S platform is the same with and without this patch. root@intel-corei7-64:~# ./xdpsock -i enp0s30f4 -t -z -q 1 -n 10 sock0@enp0s30f4:1 txonly xdp-drv pps pkts 10.00 rx 0 0 tx 511630 8659520 sock0@enp0s30f4:1 txonly xdp-drv pps pkts 10.00 rx 0 0 tx 511625 13775808 sock0@enp0s30f4:1 txonly xdp-drv pps pkts 10.00 rx 0 0 tx 511619 18892032 Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Cc: <stable@vger.kernel.org> # 5.13.x Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05	iwlwifi: pnvm: Fix a memory leak in 'iwl_pnvm_get_from_fs()'	Christophe JAILLET
	A firmware is requested but never released in this function. This leads to a memory leak in the normal execution path. Add the missing 'release_firmware()' call. Also introduce a temp variable (new_len) in order to keep the value of 'pnvm->size' after the firmware has been released. Fixes: cdda18fbbefa ("iwlwifi: pnvm: move file loading code to a separate function") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Luca Coelho <luca@coelho.fi> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1b5d80f54c1dbf85710fd285243932943b498fe7.1630614969.git.christophe.jaillet@wanadoo.fr
2021-09-05	net/9p: increase default msize to 128k	Christian Schoenebeck
	Let's raise the default msize value to 128k. The 'msize' option defines the maximum message size allowed for any message being transmitted (in both directions) between 9p server and 9p client during a 9p session. Currently the default 'msize' is just 8k, which is way too conservative. Such a small 'msize' value has quite a negative performance impact, because individual 9p messages have to be split up far too often into numerous smaller messages to fit into this message size limitation. A default value of just 8k also has a much higher probablity of hitting short-read issues like: https://gitlab.com/qemu-project/qemu/-/issues/409 Unfortunately user feedback showed that many 9p users are not aware that this option even exists, nor the negative impact it might have if it is too low. Link: http://lkml.kernel.org/r/61ea0f0faaaaf26dd3c762eabe4420306ced21b9.1630770829.git.linux_oss@crudebyte.com Link: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-05	net/9p: use macro to define default msize	Christian Schoenebeck
	Use a macro to define the default value for the 'msize' option at one place instead of using two separate integer literals. Link: http://lkml.kernel.org/r/28bb651ae0349a7d57e8ddc92c1bd5e62924a912.1630770829.git.linux_oss@crudebyte.com Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-05	net/9p: increase tcp max msize to 1MB	Dominique Martinet
	Historically TCP has been limited to 64K buffers, but increasing msize provides huge performance benefits especially as latency increase so allow for bigger buffers. Ideally further improvements could change the allocation from the current contiguous chunk in slab (kmem_cache) to some scatter-gather compatible API... Note this only increases the max possible setting, not the default value. Link: http://lkml.kernel.org/r/YTQB5jCbvhmCWzNd@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-04	NTB: perf: Fix an error code in perf_setup_inbuf()	Yang Li
	When the function IS_ALIGNED() returns false, the value of ret is 0. So, we set ret to -EINVAL to indicate this error. Clean up smatch warning: drivers/ntb/test/ntb_perf.c:602 perf_setup_inbuf() warn: missing error code 'ret'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04	NTB: Fix an error code in ntb_msit_probe()	Yang Li
	When the value of nm->isr_ctx is false, the value of ret is 0. So, we set ret to -ENOMEM to indicate this error. Clean up smatch warning: drivers/ntb/test/ntb_msi_test.c:373 ntb_msit_probe() warn: missing error code 'ret'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04	ntb: intel: remove invalid email address in header comment	Dave Jiang
	Remove Jon's old email address. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04	Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux	Linus Torvalds
	Pull MAP_DENYWRITE removal from David Hildenbrand: "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove VM_DENYWRITE. There are some (minor) user-visible changes: - We no longer deny write access to shared libaries loaded via legacy uselib(); this behavior matches modern user space e.g. dlopen(). - We no longer deny write access to the elf interpreter after exec completed, treating it just like shared libraries (which it often is). - We always deny write access to the file linked via /proc/pid/exe: sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the file cannot be denied, and write access to the file will remain denied until the link is effectivel gone (exec, termination, sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file. Cross-compiled for a bunch of architectures (alpha, microblaze, i386, s390x, ...) and verified via ltp that especially the relevant tests (i.e., creat07 and execve04) continue working as expected" * tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux: fs: update documentation of get_write_access() and friends mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() mm: remove VM_DENYWRITE binfmt: remove in-tree usage of MAP_DENYWRITE kernel/fork: always deny write access to current MM exe_file kernel/fork: factor out replacing the current MM exe_file binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
2021-09-04	Merge git://github.com/Paragon-Software-Group/linux-ntfs3	Linus Torvalds
	Merge NTFSv3 filesystem from Konstantin Komarov: "This patch adds NTFS Read-Write driver to fs/ntfs3. Having decades of expertise in commercial file systems development and huge test coverage, we at Paragon Software GmbH want to make our contribution to the Open Source Community by providing implementation of NTFS Read-Write driver for the Linux Kernel. This is fully functional NTFS Read-Write driver. Current version works with NTFS (including v3.1) and normal/compressed/sparse files and supports journal replaying. We plan to support this version after the codebase once merged, and add new features and fix bugs. For example, full journaling support over JBD will be added in later updates" Link: https://lore.kernel.org/lkml/20210729134943.778917-1-almaz.alexandrovich@paragon-software.com/ Link: https://lore.kernel.org/lkml/aa4aa155-b9b2-9099-b7a2-349d8d9d8fbd@paragon-software.com/ * git://github.com/Paragon-Software-Group/linux-ntfs3: (35 commits) fs/ntfs3: Change how module init/info messages are displayed fs/ntfs3: Remove GPL boilerplates from decompress lib files fs/ntfs3: Remove unnecessary condition checking from ntfs_file_read_iter fs/ntfs3: Fix integer overflow in ni_fiemap with fiemap_prep() fs/ntfs3: Restyle comments to better align with kernel-doc fs/ntfs3: Rework file operations fs/ntfs3: Remove fat ioctl's from ntfs3 driver for now fs/ntfs3: Restyle comments to better align with kernel-doc fs/ntfs3: Fix error handling in indx_insert_into_root() fs/ntfs3: Potential NULL dereference in hdr_find_split() fs/ntfs3: Fix error code in indx_add_allocate() fs/ntfs3: fix an error code in ntfs_get_acl_ex() fs/ntfs3: add checks for allocation failure fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc fs/ntfs3: Do not use driver own alloc wrappers fs/ntfs3: Use kernel ALIGN macros over driver specific fs/ntfs3: Restyle comment block in ni_parse_reparse() fs/ntfs3: Remove unused including <linux/version.h> fs/ntfs3: Fix fall-through warnings for Clang fs/ntfs3: Fix one none utf8 char in source file ...
2021-09-04	Merge tag 'f2fs-for-5.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we've addressed some performance issues such as lock contention, misbehaving compress_cache, allowing extent_cache for compressed files, and new sysfs to adjust ra_size for fadvise. In order to diagnose the performance issues quickly, we also added an iostat which shows the IO latencies periodically. On the stability side, we've found two memory leakage cases in the error path in compression flow. And, we've also fixed various corner cases in fiemap, quota, checkpoint=disable, zstd, and so on. Enhancements: - avoid long checkpoint latency by releasing nat_tree_lock - collect and show iostats periodically - support extent_cache for compressed files - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL) - report f2fs GC status via sysfs - add discard_unit=%s in mount option to handle zoned device Bug fixes: - fix two memory leakages when an error happens in the compressed IO flow - fix commpress_cache to get the right LBA - fix fiemap to deal with compressed case correctly - fix wrong EIO returns due to SBI_NEED_FSCK - fix missing writes when enabling checkpoint back - fix quota deadlock - fix zstd level mount option In addition to the above major updates, we've cleaned up several code paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity check, and typos" * tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits) f2fs: should put a page beyond EOF when preparing a write f2fs: deallocate compressed pages when error happens f2fs: enable realtime discard iff device supports discard f2fs: guarantee to write dirty data when enabling checkpoint back f2fs: fix to unmap pages from userspace process in punch_hole() f2fs: fix unexpected ENOENT comes from f2fs_map_blocks() f2fs: fix to account missing .skipped_gc_rwsem f2fs: adjust unlock order for cleanup f2fs: Don't create discard thread when device doesn't support realtime discard f2fs: rebuild nat_bits during umount f2fs: introduce periodic iostat io latency traces f2fs: separate out iostat feature f2fs: compress: do sanity check on cluster f2fs: fix description about main_blkaddr node f2fs: convert S_IRUGO to 0444 f2fs: fix to keep compatibility of fault injection interface f2fs: support fault injection for f2fs_kmem_cache_alloc() f2fs: compress: allow write compress released file after truncate to zero f2fs: correct comment in segment.h f2fs: improve sbi status info in debugfs/f2fs/status ...
2021-09-04	cdrom: update uniform CD-ROM maintainership in MAINTAINERS file	Phillip Potter
	Update maintainership for the uniform CD-ROM driver from Jens Axboe to Phillip Potter in MAINTAINERS file, to reflect the attempt to pass on maintainership of this driver to a different individual. Also remove URL to site which is no longer active. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20210904174030.1103-1-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-04	Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs	Linus Torvalds
	Pull NFS client updates from Anna Schumaker: "New Features: - Better client responsiveness when server isn't replying - Use refcount_t in sunrpc rpc_client refcount tracking - Add srcaddr and dst_port to the sunrpc sysfs info files - Add basic support for connection sharing between servers with multiple NICs` Bugfixes and Cleanups: - Sunrpc tracepoint cleanups - Disconnect after ib_post_send() errors to avoid deadlocks - Fix for tearing down rpcrdma_reps - Fix a potential pNFS layoutget livelock loop - pNFS layout barrier fixes - Fix a potential memory corruption in rpc_wake_up_queued_task_set_status() - Fix reconnection locking - Fix return value of get_srcport() - Remove rpcrdma_post_sends() - Remove pNFS dead code - Remove copy size restriction for inter-server copies - Overhaul the NFS callback service - Clean up sunrpc TCP socket shutdowns - Always provide aligned buffers to RPC read layers" * tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits) NFS: Always provide aligned buffers to the RPC read layers NFSv4.1 add network transport when session trunking is detected SUNRPC enforce creation of no more than max_connect xprts NFSv4 introduce max_connect mount options SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs SUNRPC keep track of number of transports to unique addresses NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox SUNRPC: Tweak TCP socket shutdown in the RPC client SUNRPC: Simplify socket shutdown when not reusing TCP ports NFSv4.2: remove restriction of copy size for inter-server copy. NFS: Clean up the synopsis of callback process_op() NFS: Extract the xdr_init_encode/decode() calls from decode_compound NFS: Remove unused callback void decoder NFS: Add a private local dispatcher for NFSv4 callback operations SUNRPC: Eliminate the RQ_AUTHERR flag SUNRPC: Set rq_auth_stat in the pg_authenticate() callout SUNRPC: Add svc_rqst::rq_auth_stat SUNRPC: Add dst_port to the sysfs xprt info file SUNRPC: Add srcaddr as a file in sysfs sunrpc: Fix return value of get_srcport() ...
2021-09-04	octeontx2-af: Fix some memory leaks in the error handling path of ↵	Christophe JAILLET
	'cgx_lmac_init()' Memory allocated before 'lmac' is stored in 'cgx->lmac_idmap[]' must be freed explicitly. Otherwise, in case of error, it will leak. Rename the 'err_irq' label to better describe what is done at this place in the error handling path. Fixes: 6f14078e3ee5 ("octeontx2-af: DMAC filter support in MAC block") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04	octeontx2-af: Add a 'rvu_free_bitmap()' function	Christophe JAILLET
	In order to match 'rvu_alloc_bitmap()', add a 'rvu_free_bitmap()' function Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04	ethtool: Fix an error code in cxgb2.c	Yang Li
	When adapter->registered_device_map is NULL, the value of err is uncertain, we set err to -EINVAL to avoid ambiguity. Clean up smatch warning: drivers/net/ethernet/chelsio/cxgb/cxgb2.c:1114 init_one() warn: missing error code 'err' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04	qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom	Dinghao Liu
	Previous commit 68233c583ab4 removes the qlcnic_rom_lock() in qlcnic_pinit_from_rom(), but remains its corresponding unlock function, which is odd. I'm not very sure whether the lock is missing, or the unlock is redundant. This bug is suggested by a static analysis tool, please advise. Fixes: 68233c583ab4 ("qlcnic: updated reset sequence") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04	fq_codel: reject silly quantum parameters	Eric Dumazet
	syzbot found that forcing a big quantum attribute would crash hosts fast, essentially using this: tc qd replace dev eth0 root fq_codel quantum 4294967295 This is because fq_codel_dequeue() would have to loop ~2^31 times in : if (flow->deficit <= 0) { flow->deficit += q->quantum; list_move_tail(&flow->flowchain, &q->old_flows); goto begin; } SFQ max quantum is 2^19 (half a megabyte) Lets adopt a max quantum of one megabyte for FQ_CODEL. Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04	mm, slub: convert kmem_cpu_slab protection to local_lock	Vlastimil Babka
	Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's equivalent, with better lockdep visibility. On PREEMPT_RT that means better preemption. However, the cost on PREEMPT_RT is the loss of lockless fast paths which only work with cpu freelist. Those are designed to detect and recover from being preempted by other conflicting operations (both fast or slow path), but the slow path operations assume they cannot be preempted by a fast path operation, which is guaranteed naturally with disabled irqs. With local locks on PREEMPT_RT, the fast paths now also need to take the local lock to avoid races. In the allocation fastpath slab_alloc_node() we can just defer to the slowpath __slab_alloc() which also works with cpu freelist, but under the local lock. In the free fastpath do_slab_free() we have to add a new local lock protected version of freeing to the cpu freelist, as the existing slowpath only works with the page freelist. Also update the comment about locking scheme in SLUB to reflect changes done by this series. [ Mike Galbraith <efault@gmx.de>: use local_lock() without irq in PREEMPT_RT scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ] Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04	mm, slub: use migrate_disable() on PREEMPT_RT	Vlastimil Babka
	We currently use preempt_disable() (directly or via get_cpu_ptr()) to stabilize the pointer to kmem_cache_cpu. On PREEMPT_RT this would be incompatible with the list_lock spinlock. We can use migrate_disable() instead, but that increases overhead on !PREEMPT_RT as it's an unconditional function call. In order to get the best available mechanism on both PREEMPT_RT and !PREEMPT_RT, introduce private slub_get_cpu_ptr() and slub_put_cpu_ptr() wrappers and use them. Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04	mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg	Vlastimil Babka
	Jann Horn reported [1] the following theoretically possible race: task A: put_cpu_partial() calls preempt_disable() task A: oldpage = this_cpu_read(s->cpu_slab->partial) interrupt: kfree() reaches unfreeze_partials() and discards the page task B (on another CPU): reallocates page as page cache task A: reads page->pages and page->pobjects, which are actually halves of the pointer page->lru.prev task B (on another CPU): frees page interrupt: allocates page as SLUB page and places it on the percpu partial list task A: this_cpu_cmpxchg() succeeds which would cause page->pages and page->pobjects to end up containing halves of pointers that would then influence when put_cpu_partial() happens and show up in root-only sysfs files. Maybe that's acceptable, I don't know. But there should probably at least be a comment for now to point out that we're reading union fields of a page that might be in a completely different state. Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter disables irqs, otherwise a __slab_free() in an irq handler could call put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial and corrupt it. This becomes an issue on RT after a local_lock is introduced in later patch. The fix means taking the local_lock also in put_cpu_partial() on RT. After debugging this issue, Mike Galbraith suggested [2] that to avoid different locking schemes on RT and !RT, we can just protect put_cpu_partial() with disabled irqs (to be converted to local_lock_irqsave() later) everywhere. This should be acceptable as it's not a fast path, and moving the actual partial unfreezing outside of the irq disabled section makes it short, and with the retry loop gone the code can be also simplified. In addition, the race reported by Jann should no longer be possible. [1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/ [2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/ Reported-by: Jann Horn <jannh@google.com> Suggested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04	mm, slub: make slab_lock() disable irqs with PREEMPT_RT	Vlastimil Babka
	We need to disable irqs around slab_lock() (a bit spinlock) to make it irq-safe. Most calls to slab_lock() are nested under spin_lock_irqsave() which doesn't disable irqs on PREEMPT_RT, so add explicit disabling with PREEMPT_RT. The exception is cmpxchg_double_slab() which already disables irqs, so use a __slab_[un]lock() variant without irq disable there. slab_[un]lock() thus needs a flags pointer parameter, which is unused on !RT. free_debug_processing() now has two flags variables, which looks odd, but only one is actually used - the one used in spin_lock_irqsave() on !RT and the one used in slab_lock() on RT. As a result, __cmpxchg_double_slab() and cmpxchg_double_slab() become effectively identical on RT, as both will disable irqs, which is necessary on RT as most callers of this function also rely on irqsaving lock operations. Thus, assert that irqs are already disabled in __cmpxchg_double_slab() only on !RT and also change the VM_BUG_ON assertion to the more standard lockdep_assert one. Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04	mm: slub: make object_map_lock a raw_spinlock_t	Sebastian Andrzej Siewior
	The variable object_map is protected by object_map_lock. The lock is always acquired in debug code and within already atomic context Make object_map_lock a raw_spinlock_t. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-03	Input: adc-keys - drop bogus __refdata annotation	Geert Uytterhoeven
	As the ADC ladder input driver does not have any code or data located in initmem, there is no need to annotate the adc_keys_driver structure with __refdata. Drop the annotation, to avoid suppressing future section warnings. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/7091e8213602be64826fd689a7337246d218f3b1.1626255421.git.geert+renesas@glider.be Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03	Input: Fix spelling mistake in Kconfig "useable" -> "usable"	Colin Ian King
	There is a spelling mistake in the Kconfig text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210705100230.7583-1-colin.king@canonical.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03	Input: Fix spelling mistake in Kconfig "Modul" -> "Module"	Colin Ian King
	There is a spelling mistake in the Kconfig text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210704095702.37567-1-colin.king@canonical.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03	ksmbd: add validation for ndr read/write functions	Namjae Jeon
	If ndr->length is smaller than expected size, ksmbd can access invalid access in ndr->data. This patch add validation to check ndr->offset is over ndr->length. and added exception handling to check return value of ndr read/write function. Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: remove unused ksmbd_file_table_flush function	Namjae Jeon
	ksmbd_file_table_flush is a leftover from SMB1. This function is no longer needed as SMB1 has been removed from ksmbd. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: smbd: fix dma mapping error in smb_direct_post_send_data	Hyunchul Lee
	Becase smb direct header is mapped and msg->num_sge already is incremented, the decrement should be removed from the condition. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: Reduce error log 'speed is unknown' to debug	Per Forlin
	This log happens on servers with a network bridge since the bridge does not have a specified link speed. This is not a real error so change the error log to debug instead. Signed-off-by: Per Forlin <perfn@axis.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: defer notify_change() call	Christian Brauner
	When ownership is changed we might in certain scenarios loose the ability to alter the inode after we changed ownership. This can e.g. happen when we are on an idmapped mount where uid 0 is mapped to uid 1000 and uid 1000 is mapped to uid 0. A caller with fsid 1000 will be able to create files as id 1000 on disk. They will also be able to change ownership of files owned by id 0 to id 1000 but they won't be able to change ownership in the other direction. This means acl operations following notify_change() would fail. Move the notify_change() call after the acls have been updated. This guarantees that we don't end up with spurious "hash value diff" warnings later on because we managed to change ownership but didn't manage to alter acls. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: remove setattr preparations in set_file_basic_info()	Christian Brauner
	Permission checking and copying over ownership information is the task of the underlying filesystem not ksmbd. The order is also wrong here. This modifies the inode before notify_change(). If notify_change() fails this will have changed ownership nonetheless. All of this is unnecessary though since the underlying filesystem's ->setattr handler will do all this (if required) by itself. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: ensure error is surfaced in set_file_basic_info()	Christian Brauner
	It seems the error was accidently ignored until now. Make sure it is surfaced. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ndr: fix translation in ndr_encode_posix_acl()	Christian Brauner
	The sid_to_id() helper encodes raw ownership information suitable for sid handling. This is conceptually equivalent to reporting ownership information via stat to userspace. In this case the consumer is ksmbd instead of a regular user. So when encoding raw ownership information suitable for sid handling later we need to map the id up according to the user namespace of ksmbd itself taking any idmapped mounts into account. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix translation in sid_to_id()	Christian Brauner
	The sid_to_id() functions is relevant when changing ownership of filesystem objects based on acl information. In this case we need to first translate the relevant sids into kids in ksmbd's user namespace and account for any idmapped mounts. Requesting a change in ownership requires the inverse translation to be applied when we would report ownership to userspace. So k*id_from_mnt() must be used here. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix subauth 0 handling in sid_to_id()	Christian Brauner
	It's not obvious why subauth 0 would be excluded from translation. This would lead to wrong results whenever a non-identity idmapping is used. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix translation in acl entries	Christian Brauner
	The ksmbd server performs translation of posix acls to smb acls. Currently the translation is wrong since the idmapping of the mount is used to map the ids into raw userspace ids but what is relevant is the user namespace of ksmbd itself. The user namespace of ksmbd itself which is the initial user namespace. The operation is similar to asking "What ids would a userspace process see given that kid in the relevant user namespace?". Before the final translation we need to apply the idmapping of the mount in case any is used. Add two simple helpers for ksmbd. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix translation in ksmbd_acls_fattr()	Christian Brauner
	When creating new filesystem objects ksmbd translates between kids and sids. For this it often uses struct smb_fattr and stashes the kids in cf_uid and cf_gid. Let cf_uid and cf_gid always contain the final information taking any potential idmapped mounts into account. When finally translation cf_id into s*ids translate them into the user namespace of ksmbd since that is the relevant user namespace here. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix translation in create_posix_rsp_buf()	Christian Brauner
	When transferring ownership information to the client the kids are translated into raw ids before they are sent over the wire. The function currently erroneously translates the kids according to the mount's idmapping. Instead, reporting the owning ids to userspace the underlying kids need to be mapped up in the caller's user namespace. This is how stat() works. The caller in this instance is ksmbd itself and ksmbd always runs in the initial user namespace. Translate according to that taking any potential idmapped mounts into account. Switch to from_kid_munged() which ensures that the overflowid is returned instead of the (id_t)-1 when the k*id can't be translated. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix translation in smb2_populate_readdir_entry()	Christian Brauner
	When transferring ownership information to the client the kids are translated into raw ids before they are sent over the wire. The function currently erroneously translates the kids according to the mount's idmapping. Instead, reporting the owning ids to userspace the underlying kids need to be mapped up in the caller's user namespace. This is how stat() works. The caller in this instance is ksmbd itself and ksmbd always runs in the initial user namespace. Translate according to that. The idmapping of the mount is already taken into account by the lower filesystem and so kstat->id will contain the mapped kids. Switch to from_kid_munged() which ensures that the overflowid is returned instead of the (id_t)-1 when the k*id can't be translated. Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	ksmbd: fix lookup on idmapped mounts	Christian Brauner
	It's great that the new in-kernel ksmbd server will support idmapped mounts out of the box! However, lookup is currently broken. Lookup helpers such as lookup_one_len() call inode_permission() internally to ensure that the caller is privileged over the inode of the base dentry they are trying to lookup under. So the permission checking here is currently wrong. Linux v5.15 will gain a new lookup helper lookup_one() that does take idmappings into account. I've added it as part of my patch series to make btrfs support idmapped mounts. The new helper is in linux-next as part of David's (Sterba) btrfs for-next branch as commit c972214c133b ("namei: add mapping aware lookup helper"). I've said it before during one of my first reviews: I would very much recommend adding fstests to [1]. It already seems to have very rudimentary cifs support. There is a completely generic idmapped mount testsuite that supports idmapped mounts. [1]: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/ Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve French <stfrench@microsoft.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: David Sterba <dsterba@suse.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03	loop: reduce the loop_ctl_mutex scope	Tetsuo Handa
	syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit a160c6159d4a0cf8 ("block: add an optional probe callback to major_names") is calling the module's probe function with major_names_lock held. Fortunately, since commit 990e78116d38059c ("block: loop: fix deadlock between open and remove") stopped holding loop_ctl_mutex in lo_open(), current role of loop_ctl_mutex is to serialize access to loop_index_idr and loop_add()/loop_remove(); in other words, management of id for IDR. To avoid holding loop_ctl_mutex during whole add/remove operation, use a bool flag to indicate whether the loop device is ready for use. loop_unregister_transfer() which is called from cleanup_cryptoloop() currently has possibility of use-after-free problem due to lack of serialization between kfree() from loop_remove() from loop_control_remove() and mutex_lock() from unregister_transfer_cb(). But since lo->lo_encryption should be already NULL when this function is called due to module unload, and commit 222013f9ac30b9ce ("cryptoloop: add a deprecation warning") indicates that we will remove this function shortly, this patch updates this function to emit warning instead of checking lo->lo_encryption. Holding loop_ctl_mutex in loop_exit() is pointless, for all users must close /dev/loop-control and /dev/loop$num (in order to drop module's refcount to 0) before loop_exit() starts, and nobody can open /dev/loop-control or /dev/loop$num afterwards. Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1] Reported-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/adb1e792-fc0e-ee81-7ea0-0906fc36419d@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-03	tracing: Add migrate-disabled counter to tracing output.	Thomas Gleixner
	migrate_disable() forbids task migration to another CPU. It is available since v5.11 and has already users such as highmem or BPF. It is useful to observe this task state in tracing which already has other states like the preemption counter. Instead of adding the migrate disable counter as a new entry to struct trace_entry, which would extend the whole struct by four bytes, it is squashed into the preempt-disable counter. The lower four bits represent the preemption counter, the upper four bits represent the migrate disable counter. Both counter shouldn't exceed 15 but if they do, there is a safety net which caps the value at 15. Add the migrate-disable counter to the trace entry so it shows up in the trace. Due to the users mentioned above, it is already possible to observe it: \| bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B \| bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50 \| bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3) The last value is the migrate-disable counter. Things that popped up: - trace_print_lat_context() does not print the migrate counter. Not sure if it should. It is used in "verbose" mode and uses 8 digits and I'm not sure ther is something processing the value. - trace_define_common_fields() now defines a different variable. This probably breaks things. No ide what to do in order to preserve the old behaviour. Since this is used as a filter it should be split somehow to be able to match both nibbles here. Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: patch description.] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ SDR: Removed change to common_preempt_count field name ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>