linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-26	vfio/pci: Rename vfio_pci_device to vfio_pci_core_device	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. The new vfio_pci_core_device structure will be the main structure of the core driver and later on vfio_pci_device structure will be the main structure of the generic vfio_pci driver. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci.c to vfio_pci_core.c	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	Merge tag 'timers-v5.15' of ↵	Thomas Gleixner
	https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull timer driver updates from Daniel Lezcano: - Prioritize the ARM architected timer on Exynos platform when the architecture is ARM64 (Will Deacon) - Mark the Exynos timer as a per CPU timer (Will Deacon) - DT conversion to yaml for the rockchip platform (Ezequiel Garcia) - Fix IRQ setup if there are two channels on the sh_cmt timer (Phong Hoang) - Use bitfield helper macros in the Ingenic timer (Zhou Yanjie) - Clear any pending interrupt to prevent an abort of the suspend on the Mediatek platform (Fengquan Chen) - Add DT bindings for new Ingenic SoCs (Zhou Yanjie) Link: https://lore.kernel.org/r/c14ad27a-b1c6-6043-0f5e-71dd984bb4ba@linaro.org
2021-08-26	iomap: standardize tracepoint formatting and storage	Darrick J. Wong
	Print all the offset, pos, and length quantities in hexadecimal. While we're at it, update the types of the tracepoint structure fields to match the types of the values being recorded in them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-08-26	md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard	Xiao Ni
	We are seeing the following warning in raid10_handle_discard. [ 695.110751] ============================= [ 695.131439] WARNING: suspicious RCU usage [ 695.151389] 4.18.0-319.el8.x86_64+debug #1 Not tainted [ 695.174413] ----------------------------- [ 695.192603] drivers/md/raid10.c:1776 suspicious rcu_dereference_check() usage! [ 695.225107] other info that might help us debug this: [ 695.260940] rcu_scheduler_active = 2, debug_locks = 1 [ 695.290157] no locks held by mkfs.xfs/10186. In the first loop of function raid10_handle_discard. It already determines which disk need to handle discard request and add the rdev reference count rdev->nr_pending. So the conf->mirrors will not change until all bios come back from underlayer disks. It doesn't need to use rcu_dereference to get rdev. Cc: stable@vger.kernel.org Fixes: d30588b2731f ('md/raid10: improve raid10 discard request') Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-08-26	ALSA: hda: Disable runtime resume at shutdown	Takashi Iwai
	Although we modified the codec shutdown callback to perform runtime-suspend, it's still not fully effective, as this may be resumed again at any time later. For fixing such an unwanted resume, this patch replaces pm_runtime_suspend() with pm_runtime_force_suspend(), and call pm_runtime_disable() afterward. It assures to keep the device suspended. Also for code simplification, we apply the code unconditionally; when it's been already suspended, nothing would happen by calls of snd_pcm_suspend_all() and pm_runtime_force_suspend(), just proceed to pm_runtime_disable(). Fixes: b98444ed597d ("ALSA: hda: Suspend codec at shutdown") Reported-and-tested-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://lore.kernel.org/r/20210826154752.25674-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-26	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Jakub Kicinski
	Alexei Starovoitov says: ==================== bpf 2021-08-26 We've added 1 non-merge commit during the last 1 day(s): 1) Fix ringbuf helper function compatibility, from Daniel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix ringbuf helper function compatibility ==================== Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	signal/seccomp: Refactor seccomp signal and coredump generation	Eric W. Biederman
	Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes a parameter force_coredump to indicate that the sigaction field should be reset to SIGDFL so that a coredump will be generated when the signal is delivered. force_sig_seccomp is then used to replace both seccomp_send_sigsys and seccomp_init_siginfo. force_sig_info_to_task gains an extra parameter to force using the default signal action. With this change seccomp is no longer a special case and there becomes exactly one place do_coredump is called from. Further it no longer becomes necessary for __seccomp_filter to call do_group_exit. Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-26	RDMA/hns: Delete unnecessary blank lines.	Xinhao Liu
	Just delete unnecessary blank lines. Link: https://lore.kernel.org/r/1629985056-57004-8-git-send-email-liangwenpeng@huawei.com Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Encapsulate the qp db as a function	Yixing Liu
	Encapsulate qp db into two functions: user and kernel. Link: https://lore.kernel.org/r/1629985056-57004-7-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Adjust the order in which irq are requested and enabled	Wenpeng Liang
	It should first alloc workqueue and request irq, and finally enable irq. Link: https://lore.kernel.org/r/1629985056-57004-6-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Remove RST2RST error prints for hw v1	Weihang Li
	There is no need to prints error for hw_v1. Link: https://lore.kernel.org/r/1629985056-57004-5-git-send-email-liangwenpeng@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Remove dqpn filling when modify qp from Init to Init	Wenpeng Liang
	According to the IB specification, the destination qpn is allowed to be filled into the qpc only when the qp transitions from Init to RTR, so this code is unused. Link: https://lore.kernel.org/r/1629985056-57004-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Fix QP's resp incomplete assignment	Wenpeng Liang
	The resp passed to the user space represents the enable flag of qp, incomplete assignment will cause some features of the user space to be disabled. Fixes: 90ae0b57e4a5 ("RDMA/hns: Combine enable flags of qp") Fixes: aba457ca890c ("RDMA/hns: Support owner mode doorbell") Link: https://lore.kernel.org/r/1629985056-57004-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Fix query destination qpn	Wenpeng Liang
	The bit width of dqpn is 24 bits, using u8 will cause truncation error. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1629985056-57004-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die	Eric W. Biederman
	In the fpsp040 code when copyin or copyout fails call force_sigsegv(SIGSEGV) instead of do_exit(SIGSEGV). This solves a couple of problems. Because do_exit embeds the ptrace stop PTRACE_EVENT_EXIT a complete stack frame needs to be present for that to work correctly. There is always the information needed for a ptrace stop where get_signal is called. So exiting with a signal solves the ptrace issue. Further exiting with a signal ensures that all of the threads in a process are killed not just the thread that malfunctioned. Which avoids confusing userspace. To make force_sigsegv(SIGSEGV) work in fpsp040_die modify the code to save all of the registers and jump to ret_from_exception (which ultimately calls get_signal) after fpsp040_die returns. v2: Updated the branches to use gas's pseudo ops that automatically calculate the best branch instruction to use for the purpose. v1: https://lkml.kernel.org/r/87a6m8kgtx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87tukghjfs.fsf_-_@disp2133 Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-27	powerpc/pseries/iommu: Rename "direct window" to "dma window"	Leonardo Bras
	A previous change introduced the usage of DDW as a bigger indirect DMA mapping when the DDW available size does not map the whole partition. As most of the code that manipulates direct mappings was reused for indirect mappings, it's necessary to rename all names and debug/info messages to reflect that it can be used for both kinds of mapping. This should cause no behavioural change, just adjust naming. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-12-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Make use of DDW for indirect mapping	Leonardo Bras
	So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. By using DDW, indirect mapping can get more TCEs than available for the default DMA window, and also get access to using much larger pagesizes (16MB as implemented in qemu vs 4k from default DMA window), causing a significant increase on the maximum amount of memory that can be IOMMU mapped at the same time. Indirect mapping will only be used if direct mapping is not a possibility. For indirect mapping, it's necessary to re-create the iommu_table with the new DMA window parameters, so iommu_alloc() can use it. Removing the default DMA window for using DDW with indirect mapping is only allowed if there is no current IOMMU memory allocated in the iommu_table. enable_ddw() is aborted otherwise. Even though there won't be both direct and indirect mappings at the same time, we can't reuse the DIRECT64_PROPNAME property name, or else an older kexec()ed kernel can assume direct mapping, and skip iommu_alloc(), causing undesirable behavior. So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info" was created to represent a DDW that does not allow direct mapping. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-11-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Find existing DDW with given property name	Leonardo Bras
	At the moment pseries stores information about created directly mapped DDW window in DIRECT64_PROPNAME. With the objective of implementing indirect DMA mapping with DDW, it's necessary to have another propriety name to make sure kexec'ing into older kernels does not break, as it would if we reuse DIRECT64_PROPNAME. In order to have this, find_existing_ddw_windows() needs to be able to look for different property names. Extract find_existing_ddw_windows() into find_existing_ddw_windows_named() and calls it with current property name. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-10-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Update remove_dma_window() to accept property name	Leonardo Bras
	Update remove_dma_window() so it can be used to remove DDW with a given property name. This enables the creation of new property names for DDW, so we can have different usage for it, like indirect mapping. Also, add return values to it so we can check if the property was found while removing the active DDW. This allows skipping the remaining property names while reducing the impact of multiple property names. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-9-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper	Leonardo Bras
	Add a new helper _iommu_table_setparms(), and use it in iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated code. Also, setting tbl->it_ops was happening outsite iommu_table_setparms(), so move it to the new helper. Since we need the iommu_table_ops to be declared before used, declare iommu_table_lpar_multi_ops and iommu_table_pseries_ops to before their respective iommu_table_setparms(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-8-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()	Leonardo Bras
	Code used to create a ddw property that was previously scattered in enable_ddw() is now gathered in ddw_property_create(), which deals with allocation and filling the property, letting it ready for of_property_add(), which now occurs in sequence. This created an opportunity to reorganize the second part of enable_ddw(): Without this patch enable_ddw() does, in order: kzalloc() property & members, create_ddw(), fill ddwprop inside property, ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, of_add_property(), and list_add(). With this patch enable_ddw() does, in order: create_ddw(), ddw_property_create(), of_add_property(), ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, and list_add(). This change requires of_remove_property() in case anything fails after of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk in all memory, which looks the most expensive operation, only if everything else succeeds. Also, the error path got remove_ddw() replaced by a new helper __remove_dma_window(), which only removes the new DDW with an rtas-call. For this, a new helper clean_dma_window() was needed to clean anything that could left if walk_system_ram_range() fails. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-7-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Allow DDW windows starting at 0x00	Leonardo Bras
	enable_ddw() currently returns the address of the DMA window, which is considered invalid if has the value 0x00. Also, it only considers valid an address returned from find_existing_ddw if it's not 0x00. Changing this behavior makes sense, given the users of enable_ddw() only need to know if direct mapping is possible. It can also allow a DMA window starting at 0x00 to be used. This will be helpful for using a DDW with indirect mapping, as the window address will be different than 0x00, but it will not map the whole partition. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-6-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add ddw_list_new_entry() helper	Leonardo Bras
	There are two functions creating direct_window_list entries in a similar way, so create a ddw_list_new_entry() to avoid duplicity and simplify those functions. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-5-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper	Leonardo Bras
	Creates a helper to allow allocating a new iommu_table without the need to reallocate the iommu_group. This will be helpful for replacing the iommu_table for the new DMA window, after we remove the old one with iommu_tce_table_put(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-4-leobras.c@gmail.com
2021-08-27	powerpc/kernel/iommu: Add new iommu_table_in_use() helper	Leonardo Bras
	Having a function to check if the iommu table has any allocation helps deciding if a tbl can be reset for using a new DMA window. It should be enough to replace all instances of !bitmap_empty(tbl...). iommu_table_in_use() skips reserved memory, so we don't need to worry about releasing it before testing. This causes iommu_table_release_pages() to become unnecessary, given it is only used to remove reserved memory for testing. Also, only allow storing reserved memory values in tbl if they are valid in the table, so there is no need to check it in the new helper. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-3-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Replace hard-coded page shift	Leonardo Bras
	Some functions assume IOMMU page size can only be 4K (pageshift == 12). Update them to accept any page size passed, so we can use 64K pages. In the process, some defines like TCE_SHIFT were made obsolete, and then removed. IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show a RPN of 52-bit, and considers a 12-bit pageshift, so there should be no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn. It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and tce_buildmulti_pSeriesLP(). Most places had a tbl struct, so using tbl->it_page_shift was simple. tce_free_pSeriesLP() was a special case, since callers not always have a tbl struct, so adding a tceshift parameter seems the right thing to do. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-2-leobras.c@gmail.com
2021-08-27	powerpc/numa: Update cpu_cpu_map on CPU online/offline	Srikar Dronamraju
	cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when onlining/offlining of CPUs, this mask doesn't get updated. This mask is however updated when CPUs are added/removed. So when both operations like online/offline of CPUs and adding/removing of CPUs are done simultaneously, then cpumaps end up broken. WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898 build_sched_domains+0xd48/0x1720 Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec REGS: c00000005596f220 TRAP: 0700 Not tainted (5.13.0-rc6+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48828222 XER: 00000009 CFAR: c0000000001ea698 IRQMASK: 0 GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036 GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90 GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8 GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800 GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18 GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400 GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060 GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720 LR [c0000000001caac4] build_sched_domains+0xd44/0x1720 Call Trace: [c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable) [c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0 [c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0 [c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70 [c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10 [c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590 [c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620 [c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0 [c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120 7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000 Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU online/offline. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/numa: Print debug statements only when required	Srikar Dronamraju
	Currently, a debug message gets printed every time an attempt to add(remove) a CPU. However this is redundant if the CPU is already added (removed) from the node. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-4-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/numa: convert printk to pr_xxx	Srikar Dronamraju
	Convert the remaining printk to pr_xxx One advantage would be all prints will now have prefix "numa:" from pr_fmt(). [ convert printk(KERN_ERR) to pr_warn : Suggested by Laurent Dufour ] Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Rebase onto powerpc/next, s/WARNING/Warning/] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-3-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/numa: Drop dbg in favour of pr_debug	Srikar Dronamraju
	powerpc supported numa=debug which is not documented. This option was used to print early debug output. However something more flexible can be achieved by using CONFIG_DYNAMIC_DEBUG. Hence drop dbg (and numa=debug) in favour of pr_debug Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Rebase on to powerpc/next form2 affinity changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-2-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/smp: Enable CACHE domain for shared processor	Srikar Dronamraju
	Currently CACHE domain is not enabled on shared processor mode PowerVM LPARS. On PowerVM systems, 'ibm,thread-group' device-tree property 2 under cpu-device-node indicates which all CPUs share L2-cache. However 'ibm,thread-group' device-tree property 2 is a relatively new property. In absence of 'ibm,thread-group' property 2, 'l2-cache' device property under cpu-device-node could help system to identify CPUs sharing L2-cache. However this property is not exposed by PhyP in shared processor mode configurations. In absence of properties that inform OS about which CPUs share L2-cache, fallback on core boundary. Here are some stats from Power9 shared LPAR with the changes. $ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 3 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 16-23 NUMA node1 CPU(s): 0-15,24-31 Physical sockets: 2 Physical chips: 1 Physical cores/chip: 10 Before patch $ grep -r . /sys/kernel/debug/sched/domains/cpu0/domain*/name Before /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT /sys/kernel/debug/sched/domains/cpu0/domain1/name:DIE /sys/kernel/debug/sched/domains/cpu0/domain2/name:NUMA After /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT /sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE /sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE /sys/kernel/debug/sched/domains/cpu0/domain3/name:NUMA $ awk '/domain/{print $1, $2}' /proc/schedstat \| sort -u \| sed -e 's/00000000,//g' Before domain0 00000055 domain0 000000aa domain0 00005500 domain0 0000aa00 domain0 00550000 domain0 00aa0000 domain0 55000000 domain0 aa000000 domain1 00ff0000 domain1 ff00ffff domain2 ffffffff After domain0 00000055 domain0 000000aa domain0 00005500 domain0 0000aa00 domain0 00550000 domain0 00aa0000 domain0 55000000 domain0 aa000000 domain1 000000ff domain1 0000ff00 domain1 00ff0000 domain1 ff000000 domain2 ff00ffff domain2 ffffffff domain3 ffffffff (Lower is better) perf stat -a -r 5 -n perf bench sched pipe \| tail -n 2 Before 153.798 +- 0.142 seconds time elapsed ( +- 0.09% ) After 111.545 +- 0.652 seconds time elapsed ( +- 0.58% ) which is an improvement of 27.47% Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100401.412519-4-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/smp: Update cpu_core_map on all PowerPc systems	Srikar Dronamraju
	lscpu() uses core_siblings to list the number of sockets in the system. core_siblings is set using topology_core_cpumask. While optimizing the powerpc bootup path, Commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask"). it was found that updating cpu_core_mask() ended up taking a lot of time. It was thought that on Powerpc, cpu_core_mask() would always be same as cpu_cpu_mask() i.e number of sockets will always be equal to number of nodes. As an optimization, cpu_core_mask() was made a snapshot of cpu_cpu_mask(). However that was found to be false with PowerPc KVM guests, where each node could have more than one socket. So with Commit c47f892d7aa6 ("powerpc/smp: Reintroduce cpu_core_mask"), cpu_core_mask was updated based on chip_id but in an optimized way using some mask manipulations and chip_id caching. However on non-PowerNV and non-pseries KVM guests (i.e not implementing cpu_to_chip_id(), continued to use a copy of cpu_cpu_mask(). There are two issues that were noticed on such systems 1. lscpu would report one extra socket. On a IBM,9009-42A (aka zz system) which has only 2 chips/ sockets/ nodes, lscpu would report Architecture: ppc64le Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0-159 Thread(s) per core: 8 Core(s) per socket: 6 Socket(s): 3 <-------------- NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-79 NUMA node1 CPU(s): 80-159 2. Currently cpu_cpu_mask is updated when a core is added/removed. However its not updated when smt mode switching or on CPUs are explicitly offlined. However all other percpu masks are updated to ensure only active/online CPUs are in the masks. This results in build_sched_domain traces since there will be CPUs in cpu_cpu_mask() but those CPUs are not present in SMT / CACHE / MC / NUMA domains. A loop of threads running smt mode switching and core add/remove will soon show this trace. Hence cpu_cpu_mask has to be update at smt mode switch. This will have impact on cpu_core_mask(). cpu_core_mask() is a snapshot of cpu_cpu_mask. Different CPUs within the same socket will end up having different cpu_core_masks since they are snapshots at different points of time. This means when lscpu will start reporting many more sockets than the actual number of sockets/ nodes / chips. Different ways to handle this problem: A. Update the snapshot aka cpu_core_mask for all CPUs whenever cpu_cpu_mask is updated. This would a non-optimal solution. B. Instead of a cpumask_var_t, make cpu_core_map a cpumask pointer pointing to cpu_cpu_mask. However percpu cpumask pointer is frowned upon and we need a clean way to handle PowerPc KVM guest which is not a snapshot. C. Update cpu_core_masks all PowerPc systems like in PowerPc KVM guests using mask manipulations. This approach is relatively simple and unifies with the existing code. D. On top of 3, we could also resurrect get_physical_package_id which could return a nid for the said CPU. However this is not needed at this time. Option C is the preferred approach for now. While this is somewhat a revert of Commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask"). 1. Plain revert has some conflicts 2. For chip_id == -1, the cpu_core_mask is made identical to cpu_cpu_mask, unlike previously where cpu_core_mask was set to a core if chip_id doesn't exist. This goes by the principle that if chip_id is not exposed, then sockets / chip / node share the same set of CPUs. With the fix, lscpu o/p would be Architecture: ppc64le Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0-159 Thread(s) per core: 8 Core(s) per socket: 6 Socket(s): 2 <-------------- NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-79 NUMA node1 CPU(s): 80-159 Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100401.412519-3-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/smp: Fix a crash while booting kvm guest with nr_cpus=2	Srikar Dronamraju
	Aneesh reported a crash with a fairly recent upstream kernel when booting kernel whose commandline was appended with nr_cpus=2 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c000000008a67bd0] pc: c00000000002557c: cpu_to_chip_id+0x3c/0x100 lr: c000000000058380: start_secondary+0x460/0xb00 sp: c000000008a67e70 msr: 8000000000001033 dar: 10 dsisr: 80000 current = 0xc00000000891bb00 paca = 0xc0000018ff981f80 irqmask: 0x03 irq_happened: 0x01 pid = 0, comm = swapper/1 Linux version 5.13.0-rc3-15704-ga050a6d2b7e8 (kvaneesh@ltc-boston8) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #433 SMP Tue May 25 02:38:49 CDT 2021 1:mon> t [link register ] c000000000058380 start_secondary+0x460/0xb00 [c000000008a67e70] c000000008a67eb0 (unreliable) [c000000008a67eb0] c0000000000589d4 start_secondary+0xab4/0xb00 [c000000008a67f90] c00000000000c654 start_secondary_prolog+0x10/0x14 Current code assumes that num_possible_cpus() is always greater than threads_per_core. However this may not be true when using nr_cpus=2 or similar options. Handle the case where num_possible_cpus() is not an exact multiple of threads_per_core. Fixes: c1e53367dab1 ("powerpc/smp: Cache CPU to chip lookup") Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Debugged-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100401.412519-2-srikar@linux.vnet.ibm.com
2021-08-27	powerpc/configs/microwatt: Enable options for systemd	Joel Stanley
	When booting with systemd these options are required. This increases the image by about 50KB, or 2%. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826122653.3236867-4-joel@jms.id.au
2021-08-27	powerpc/configs/microwattt: Enable Liteeth	Joel Stanley
	Liteeth is the network device used by Microwatt. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826122653.3236867-3-joel@jms.id.au
2021-08-27	powerpc/microwatt: Add Ethernet to device tree	Joel Stanley
	The liteeth network device is used in the Microwatt soc. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826122653.3236867-2-joel@jms.id.au
2021-08-27	powerpc: Redefine HMT_xxx macros as empty on PPC32	Christophe Leroy
	HMT_xxx macros are macros for adjusting thread priority (hardware multi-threading) are macros inherited from PPC64 via commit 5f7c690728ac ("[PATCH] powerpc: Merged ppc_asm.h") Those instructions are pointless on PPC32, but some common fonctions like arch_cpu_idle() use them. So make them empty on PPC32 to avoid those instructions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c5a07fadea33d640ad10cecf0ac8faaec1c524e0.1629898474.git.christophe.leroy@csgroup.eu
2021-08-27	powerpc/doc: Fix htmldocs errors	Aneesh Kumar K.V
	Fix make htmldocs related errors with the newly added associativity.rst doc file. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build test Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210825042447.106219-1-aneesh.kumar@linux.ibm.com
2021-08-27	Merge changes from Paul Gortmaker	Michael Ellerman
	Merge the changes to retire the legacy WR sbc8548 and sbc8641 platforms from Paul. These were sent as a pull request, but I rebased them onto rc2 so as not to pull too many unrelated changes in to my next. Description from Paul's pull request follows: In v2.6.27 (2008, 917f0af9e5a9) the sbc8260 support was implicitly retired by not being carried forward through the ppc --> powerpc device tree transition. Then, in v3.6 (2012, b048b4e17cbb) we retired the support for the sbc8560 boards. Next, in v4.18 (2017, 3bc6cf5a86e5) we retired the support for the 2006 vintage sbc834x boards. The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the sbc834x boards, but it is also 3+ years later, so it makes sense to now retire them as well - which is what is done here. These two remaining WR boards were based on the Freescale MPC8548-CDS and the MPC8641D-HPCN reference board implementations. Having had the chance to use these and many other Fsl ref boards, I know this: The Freescale reference boards were typically produced in limited quantity and primarily available to BSP developers and hardware designers, and not likely to have found a 2nd life with hobbyists and/or collectors. It was good to have that BSP code subjected to mainline review and hence also widely available back in the day. But given the above, we should probably also be giving serious consideration to retiring additional similar age/type reference board platforms as well. I've always felt it is important for us to be proactive in retiring old code, since it has a genuine non-zero carrying cost, as described in the 930d52c012b8 merge log. But for the here and now, we just clean up the remaining BSP code that I had added for SBC platforms. Link: https://lore.kernel.org/r/20210824174209.GB160508@windriver.com
2021-08-27	MAINTAINERS: update for Paul Gortmaker	Paul Gortmaker
	Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-08-27	powerpc: retire sbc8641d board support	Paul Gortmaker
	The support was for this was added to mainline over 12 years ago, in v2.6.26 [4e8aae89a35d] just around the ppc --> powerpc migration. I believe the board was introduced shortly after the sbc8548 board, making it roughly a 14 year old platform - with the CPU speed and memory size typical for that era. I haven't had one of these boards for several years, and availability was discontinued several years before that. Given that, there is no point in adding a burden to testing coverage that builds all possible defconfigs, so it makes sense to remove it. Of course it will remain in the git history forever, for anyone who happens to find a functional board and wants to tinker with it. Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-08-27	powerpc: retire sbc8548 board support	Paul Gortmaker
	The support was for this was mainlined 13 years ago, in v2.6.25 [0e0fffe88767] just around the ppc --> powerpc migration. I believe the board was introduced a year or two before that, so it is roughly a 15 year old platform - with the CPU speed and memory size that was typical for that era. I haven't had one of these boards for several years, and availability was discontinued several years before that. Given that, there is no point in adding a burden to testing coverage that builds all possible defconfigs, so it makes sense to remove it. Of course it will remain in the git history forever, for anyone who happens to find a functional board and wants to tinker with it. Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-08-26	Merge branch 'net-hns3-add-some-fixes-for-net'	Jakub Kicinski
	Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Link: https://lore.kernel.org/r/1629976921-43438-1-git-send-email-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix get wrong pfc_en when query PFC configuration	Guangbin Huang
	Currently, when query PFC configuration by dcbtool, driver will return PFC enable status based on TC. As all priorities are mapped to TC0 by default, if TC0 is enabled, then all priorities mapped to TC0 will be shown as enabled status when query PFC setting, even though some priorities have never been set. for example: $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off $ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on To fix this problem, just returns user's PFC config parameter saved in driver. Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix GRO configuration error after reset	Yufeng Mo
	The GRO configuration is enabled by default after reset. This is incorrect and should be restored to the user-configured value. So this restoration is added during reset initialization. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: change the method of getting cmd index in debugfs	Yufeng Mo
	Currently, the cmd index is obtained in debugfs by comparing file names. However, this method may cause errors when processing more complex file names. So, change this method by saving cmd in private data and comparing it when getting cmd index in debugfs for optimization. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix duplicate node in VLAN list	Guojia Liao
	VLAN list should not be added duplicate VLAN node, otherwise it would cause "add failed" when restore VLAN from VLAN list, so this patch adds VLAN ID check before adding node into VLAN list. Fixes: c6075b193462 ("net: hns3: Record VF vlan tables") Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix speed unknown issue in bond 4	Yonglong Liu
	In bond 4, when the link goes down and up repeatedly, the bond may get an unknown speed, and then this port can not work. The driver notify netif_carrier_on() before update the link state, when the bond receive carrier on, will query the speed of the port, if the query operation happens before updating the link state, will get an unknown speed. So need to notify netif_carrier_on() after update the link state. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>