summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-03srcu: Drop needless initialization of sdp in srcu_gp_start()Lukas Bulwahn
Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without snp_node array") initializes the local variable sdp differently depending on the srcu's state in srcu_gp_start(). Either way, this initialization overwrites the value used when sdp is defined. This commit therefore drops this pointless definition-time initialization. Although there is no functional change, compiler code generation may be affected. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Prevent expedited GPs and blocking readers from consuming CPUPaul E. McKenney
If an SRCU reader blocks while a synchronize_srcu_expedited() waits for that same reader, then that grace period will spawn an endless series of workqueue handlers, consuming a full CPU. This quickly gets pointless because consuming more CPU isn't going to make that reader get done faster, especially if it is blocked waiting for an external event. This commit therefore spawns at most one pair of back-to-back workqueue handlers per expedited grace period phase, instead inserting increasing delays as that grace period phase grows older, but capped at 10 jiffies. In any case, if there have been at least 100 back-to-back workqueue handlers within a single jiffy, regardless of grace period or grace-period phase, then a one-jiffy delay is inserted. [ paulmck: Apply feedback from kernel test robot. ] Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Song Liu <song@kernel.org> Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Add contention check to call_srcu() srcu_data ->lock acquisitionPaul E. McKenney
This commit increases the sensitivity of contention detection by adding checks to the acquisition of the srcu_data structure's lock on the call_srcu() code path. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Automatically determine size-transition strategy at bootPaul E. McKenney
This commit adds a srcutree.convert_to_big option of zero that causes SRCU to decide at boot whether to wait for contention (small systems) or immediately expand to large (large systems). A new srcutree.big_cpu_lim (defaulting to 128) defines how many CPUs constitute a large system. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03tools/memory-model/README: Update klitmus7 compat tableAkira Yokosawa
EXPORT_SYMBOL of do_exec() was removed in v5.17. Unfortunately, kernel modules from klitmus7 7.56 have do_exec() at the end of each kthread. herdtools7 7.56.1 has addressed the issue. Update the compatibility table accordingly. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03Merge tag 'hwmon-for-v5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Work around a hardware problem in the delta-ahe50dc-fan driver - Explicitly disable PEC in PMBus core if not enabled - Fix negative temperature values in f71882fg driver - Fix warning on removal of adt7470 driver - Fix CROSSHAIR VI HERO name in asus_wmi_sensors driver - Fix build warning seen in xdpe12284 driver if CONFIG_SENSORS_XDPE122_REGULATOR is disabled - Fix type of 'ti,n-factor' in ti,tmp421 driver bindings * tag 'hwmon-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (pmbus) delta-ahe50dc-fan: work around hardware quirk hwmon: (pmbus) disable PEC if not enabled hwmon: (f71882fg) Fix negative temperature dt-bindings: hwmon: ti,tmp421: Fix type for 'ti,n-factor' hwmon: (adt7470) Fix warning on module removal hwmon: (asus_wmi_sensors) Fix CROSSHAIR VI HERO name hwmon: (xdpe12284) Fix build warning seen if CONFIG_SENSORS_XDPE122_REGULATOR is disabled
2022-05-03rnbd-srv: use bdev_discard_alignmentChristoph Hellwig
Use bdev_discard_alignment to calculate the correct discard alignment offset even for partitions instead of just looking at the queue limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03nvme: remove a spurious clear of discard_alignmentChristoph Hellwig
The nvme driver never sets a discard_alignment, so it also doens't need to clear it to zero. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03loop: remove a spurious clear of discard_alignmentChristoph Hellwig
The loop driver never sets a discard_alignment, so it also doens't need to clear it to zero. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03dasd: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to PAGE_SIZE while the discard granularity is the block size that is smaller or the same as PAGE_SIZE as done by dasd is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03raid5: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by raid5 is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03dm-zoned: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by dm-zoned is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03virtio_blk: fix the discard_granularity and discard_alignment queue limitsChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. On the other hand the discard_sector_alignment from the virtio 1.1 looks similar to what Linux uses as discard granularity (even if not very well described): "discard_sector_alignment can be used by OS when splitting a request based on alignment. " And at least qemu does set it to the discard granularity. So stop setting the discard_alignment and use the virtio discard_sector_alignment to set the discard granularity. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03null_blk: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by null_blk is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03nbd: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by nbd is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03ubd: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by ubd is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03fbdev: Make fb_release() return -ENODEV if fbdev was unregisteredJavier Martinez Canillas
A reference to the framebuffer device struct fb_info is stored in the file private data, but this reference could no longer be valid and must not be accessed directly. Instead, the file_fb_info() accessor function must be used since it does sanity checking to make sure that the fb_info is valid. This can happen for example if the registered framebuffer device is for a driver that just uses a framebuffer provided by the system firmware. In that case, the fbdev core would unregister the framebuffer device when a real video driver is probed and ask to remove conflicting framebuffers. The bug has been present for a long time but commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") unmasked it since the fbdev core started unregistering the framebuffers' devices associated. Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Maxime Ripard <maxime@cerno.tech> Reported-by: Junxiao Chang <junxiao.chang@intel.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220502135014.377945-1-javierm@redhat.com
2022-05-03Merge tag 'aspeed-v5.18-fixes' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc into arm/fixes ASPEED device tree fixes for v5.18 - Quad SPI device tree corrections - Reinstate GFX node that was removed - romed8hm3 machine fixes * tag 'aspeed-v5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc: ARM: dts: aspeed: Add video engine to g6 ARM: dts: aspeed: romed8hm3: Fix GPIOB0 name ARM: dts: aspeed: romed8hm3: Add lm25066 sense resistor values ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group ARM: dts: aspeed-g6: add FWQSPI group in pinctrl dtsi dt-bindings: pinctrl: aspeed-g6: add FWQSPI function/group pinctrl: pinctrl-aspeed-g6: add FWQSPI function-group dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi Link: https://lore.kernel.org/r/CACPK8XdhLfafOfqvR0r7p6V6AhtNXD4uZGaz7Y+Y4P-rc9p0tQ@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-03hwmon: (tmp401) Add OF device ID tableCamel Guo
This driver doesn't have of_match_table. This makes the kernel module tmp401.ko lack alias patterns (e.g: of:N*T*Cti,tmp411) to match DT node of the supported devices hence this kernel module will not be automatically loaded. After adding of_match_table to this driver, the folllowing alias will be added into tmp401.ko. $ modinfo drivers/hwmon/tmp401.ko filename: drivers/hwmon/tmp401.ko ...... author: Hans de Goede <hdegoede@redhat.com> alias: of:N*T*Cti,tmp435C* alias: of:N*T*Cti,tmp435 alias: of:N*T*Cti,tmp432C* alias: of:N*T*Cti,tmp432 alias: of:N*T*Cti,tmp431C* alias: of:N*T*Cti,tmp431 alias: of:N*T*Cti,tmp411C* alias: of:N*T*Cti,tmp411 alias: of:N*T*Cti,tmp401C* alias: of:N*T*Cti,tmp401 ...... Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices registered via OF") Signed-off-by: Camel Guo <camel.guo@axis.com> Link: https://lore.kernel.org/r/20220503114333.456476-1-camel.guo@axis.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-05-03Merge tag 'maintainers-signed-take2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes A maintainer update for omaps A patch from Rajendra to remove his contact information for omap PM framework. * tag 'maintainers-signed-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: MAINTAINERS: omap: remove me as a maintainer Link: https://lore.kernel.org/r/pull-1651061256-836848@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-03efi/arm64: libstub: run image in place if randomized by the loaderArd Biesheuvel
If the loader has already placed the EFI kernel image randomly in physical memory, and indicates having done so by installing the 'fixed placement' protocol onto the image handle, don't bother randomizing the placement again in the EFI stub. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-03efi: libstub: pass image handle to handle_kernel_image()Ard Biesheuvel
In a future patch, arm64's implementation of handle_kernel_image() will omit randomizing the placement of the kernel if the load address was chosen randomly by the loader. In order to do this, it needs to locate a protocol on the image handle, so pass it to handle_kernel_image(). Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-03efi: x86: Set the NX-compatibility flag in the PE headerPeter Jones
Following Baskov Evgeniy's "Handle UEFI NX-restricted page tables" patches, it's safe to set this compatibility flag to let loaders know they don't need to make special accommodations for kernel to load if pre-boot NX is enabled. Signed-off-by: Peter Jones <pjones@redhat.com> Link: https://lore.kernel.org/all/20220329184743.798513-1-pjones@redhat.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-03efi: libstub: ensure allocated memory to be executableBaskov Evgeniy
There are UEFI versions that restrict execution of memory regions, preventing the kernel from booting. Parts that needs to be executable are: * Area used for trampoline placement. * All memory regions that the kernel may be relocated before and during extraction. Use DXE services to ensure aforementioned address ranges to be executable. Only modify attributes that does not have appropriate attributes. Signed-off-by: Baskov Evgeniy <baskov@ispras.ru> Link: https://lore.kernel.org/r/20220303142120.1975-3-baskov@ispras.ru Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-03efi: libstub: declare DXE services tableBaskov Evgeniy
UEFI DXE services are not yet used in kernel code but are required to manipulate page table memory protection flags. Add required declarations to use DXE services functions. Signed-off-by: Baskov Evgeniy <baskov@ispras.ru> Link: https://lore.kernel.org/r/20220303142120.1975-2-baskov@ispras.ru [ardb: ignore absent DXE table but warn if the signature check fails] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-03KVM: s390: vsie/gmap: reduce gmap_rmap overheadChristian Borntraeger
there are cases that trigger a 2nd shadow event for the same vmaddr/raddr combination. (prefix changes, reboots, some known races) This will increase memory usages and it will result in long latencies when cleaning up, e.g. on shutdown. To avoid cases with a list that has hundreds of identical raddrs we check existing entries at insert time. As this measurably reduces the list length this will be faster than traversing the list at shutdown time. In the long run several places will be optimized to create less entries and a shrinker might be necessary. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20220429151526.1560-1-borntraeger@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-03regulator: pca9450: Enable DVS control via PMIC_STBY_REQRickard x Andersson
When DVS is enabled via the devicetree properties "nxp,dvs-run-voltage" and "nxp,dvs-standby-voltage" then also the bit that enables DVS control via PMIC_STBY_REQ pin should be set. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Link: https://lore.kernel.org/r/20220429072211.24957-5-rickaran@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-03regulator: pca9450: Make warm reset on WDOG_B assertionRickard x Andersson
The default configuration of the PMIC behavior makes the PMIC power cycle most regulators on WDOG_B assertion. This power cycling causes the memory contents of OCRAM to be lost. Some systems neeeds some memory that survives reset and reboot, therefore this patch is created. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Link: https://lore.kernel.org/r/20220429072211.24957-4-rickaran@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-03regulator: Add property for WDOG_B warm resetRickard x Andersson
Make it possible to do warm reset on WDOG_B assertion. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Link: https://lore.kernel.org/r/20220429072211.24957-3-rickaran@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-03regulator: pca9450: Make I2C Level Translator configurablePer-Daniel Olsson
Make the I2C Level Translator included in PCA9450 configurable from devicetree. The reset state is off. By setting nxp,i2c-lt-enable, the I2C Level Translator will be enabled while in STANDBY or RUN state. Signed-off-by: Per-Daniel Olsson <perdo@axis.com> Signed-off-by: Rickard x Andersson <rickaran@axis.com> Link: https://lore.kernel.org/r/20220429072211.24957-2-rickaran@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-03regulator: Add property for I2C level shifterPer-Daniel Olsson
By setting nxp,i2c-lt-enable the I2C level translator is enabled. Signed-off-by: Per-Daniel Olsson <perdo@axis.com> Signed-off-by: Rickard x Andersson <rickaran@axis.com> Link: https://lore.kernel.org/r/20220429072211.24957-1-rickaran@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-03Merge branch 'kvm-amd-pmu-fixes' into HEADPaolo Bonzini
2022-05-03kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMUSandipan Das
On some x86 processors, CPUID leaf 0xA provides information on Architectural Performance Monitoring features. It advertises a PMU version which Qemu uses to determine the availability of additional MSRs to manage the PMCs. Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for the same, the kernel constructs return values based on the x86_pmu_capability irrespective of the vendor. This leaf and the additional MSRs are not supported on AMD and Hygon processors. If AMD PerfMonV2 is detected, the PMU version is set to 2 and guest startup breaks because of an attempt to access a non-existent MSR. Return zeros to avoid this. Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf") Reported-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_idKyle Huey
Zen renumbered some of the performance counters that correspond to the well known events in perf_hw_id. This code in KVM was never updated for that, so guest that attempt to use counters on Zen that correspond to the pre-Zen perf_hw_id values will silently receive the wrong values. This has been observed in the wild with rr[0] when running in Zen 3 guests. rr uses the retired conditional branch counter 00d1 which is incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND. [0] https://rr-project.org/ Signed-off-by: Kyle Huey <me@kylehuey.com> Message-Id: <20220503050136.86298-1-khuey@kylehuey.com> Cc: stable@vger.kernel.org [Check guest family, not host. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03Merge branch 'kvm-tdp-mmu-atomicity-fix' into HEADPaolo Bonzini
We are dropping A/D bits (and W bits) in the TDP MMU. Even if mmu_lock is held for write, as volatile SPTEs can be written by other tasks/vCPUs outside of mmu_lock. Attempting to prove that bug exposed another notable goof, which has been lurking for a decade, give or take: KVM treats _all_ MMU-writable SPTEs as volatile, even though KVM never clears WRITABLE outside of MMU lock. As a result, the legacy MMU (and the TDP MMU if not fixed) uses XCHG to update writable SPTEs. The fix does not seem to have an easily-measurable affect on performance; page faults are so slow that wasting even a few hundred cycles is dwarfed by the base cost.
2022-05-03net: rds: acquire refcount on TCP socketsTetsuo Handa
syzbot is reporting use-after-free read in tcp_retransmit_timer() [1], for TCP socket used by RDS is accessing sock_net() without acquiring a refcount on net namespace. Since TCP's retransmission can happen after a process which created net namespace terminated, we need to explicitly acquire a refcount. Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1] Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com> Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com> Link: https://lore.kernel.org/r/a5fb1fc4-2284-3359-f6a0-e4e390239d7b@I-love.SAKURA.ne.jp Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bitsSean Christopherson
Use an atomic XCHG to write TDP MMU SPTEs that have volatile bits, even if mmu_lock is held for write, as volatile SPTEs can be written by other tasks/vCPUs outside of mmu_lock. If a vCPU uses the to-be-modified SPTE to write a page, the CPU can cache the translation as WRITABLE in the TLB despite it being seen by KVM as !WRITABLE, and/or KVM can clobber the Accessed/Dirty bits and not properly tag the backing page. Exempt non-leaf SPTEs from atomic updates as KVM itself doesn't modify non-leaf SPTEs without holding mmu_lock, they do not have Dirty bits, and KVM doesn't consume the Accessed bit of non-leaf SPTEs. Dropping the Dirty and/or Writable bits is most problematic for dirty logging, as doing so can result in a missed TLB flush and eventually a missed dirty page. In the unlikely event that the only dirty page(s) is a clobbered SPTE, clear_dirty_gfn_range() will see the SPTE as not dirty (based on the Dirty or Writable bit depending on the method) and so not update the SPTE and ultimately not flush. If the SPTE is cached in the TLB as writable before it is clobbered, the guest can continue writing the associated page without ever taking a write-protect fault. For most (all?) file back memory, dropping the Dirty bit is a non-issue. The primary MMU write-protects its PTEs on writeback, i.e. KVM's dirty bit is effectively ignored because the primary MMU will mark that page dirty when the write-protection is lifted, e.g. when KVM faults the page back in for write. The Accessed bit is a complete non-issue. Aside from being unused for non-leaf SPTEs, KVM doesn't do a TLB flush when aging SPTEs, i.e. the Accessed bit may be dropped anyways. Lastly, the Writable bit is also problematic as an extension of the Dirty bit, as KVM (correctly) treats the Dirty bit as volatile iff the SPTE is !DIRTY && WRITABLE. If KVM fixes an MMU-writable, but !WRITABLE, SPTE out of mmu_lock, then it can allow the CPU to set the Dirty bit despite the SPTE being !WRITABLE when it is checked by KVM. But that all depends on the Dirty bit being problematic in the first place. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits()Sean Christopherson
Move the is_shadow_present_pte() check out of spte_has_volatile_bits() and into its callers. Well, caller, since only one of its two callers doesn't already do the shadow-present check. Opportunistically move the helper to spte.c/h so that it can be used by the TDP MMU, which is also the primary motivation for the shadow-present change. Unlike the legacy MMU, the TDP MMU uses a single path for clear leaf and non-leaf SPTEs, and to avoid unnecessary atomic updates, the TDP MMU will need to check is_last_spte() prior to calling spte_has_volatile_bits(), and calling is_last_spte() without first calling is_shadow_present_spte() is at best odd, and at worst a violation of KVM's loosely defines SPTE rules. Note, mmu_spte_clear_track_bits() could likely skip the write entirely for SPTEs that are not shadow-present. Leave that cleanup for a future patch to avoid introducing a functional change, and because the shadow-present check can likely be moved further up the stack, e.g. drop_large_spte() appears to be the only path that doesn't already explicitly check for a shadow-present SPTE. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)Sean Christopherson
Don't treat SPTEs that are truly writable, i.e. writable in hardware, as being volatile (unless they're volatile for other reasons, e.g. A/D bits). KVM _sets_ the WRITABLE bit out of mmu_lock, but never _clears_ the bit out of mmu_lock, so if the WRITABLE bit is set, it cannot magically get cleared just because the SPTE is MMU-writable. Rename the wrapper of MMU-writable to be more literal, the previous name of spte_can_locklessly_be_made_writable() is wrong and misleading. Fixes: c7ba5b48cc8d ("KVM: MMU: fast path of handling guest page fault") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03selftests/net: so_txtime: usage(): fix documentation of default clockMarc Kleine-Budde
The program uses CLOCK_TAI as default clock since it was added to the Linux repo. In commit: | 040806343bb4 ("selftests/net: so_txtime multi-host support") a help text stating the wrong default clock was added. This patch fixes the help text. Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas <cmllamas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systemsMarc Kleine-Budde
This patch fixes the parsing of the cmd line supplied start time on 32 bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot hold a timestamp in nano second resolution. Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support") Cc: Carlos Llamas <cmllamas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGSLai Jiangshan
XENPV doesn't use swapgs_restore_regs_and_return_to_usermode(), error_entry() and the code between entry_SYSENTER_compat() and entry_SYSENTER_compat_after_hwframe. Change the PV-compatible SWAPGS to the ASM instruction swapgs in these places. Also remove the definition of SWAPGS since no more users. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220503032107.680190-7-jiangshanlai@gmail.com
2022-05-03x86/entry: Don't call error_entry() for XENPVLai Jiangshan
XENPV guests enter already on the task stack and they can't fault for native_iret() nor native_load_gs_index() since they use their own pvop for IRET and load_gs_index(). A CR3 switch is not needed either. So there is no reason to call error_entry() in XENPV. [ bp: Massage commit message. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220503032107.680190-6-jiangshanlai@gmail.com
2022-05-03x86/entry: Move CLD to the start of the idtentry macroLai Jiangshan
Move it after CLAC. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-5-jiangshanlai@gmail.com
2022-05-03x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()Lai Jiangshan
The macro idtentry() (through idtentry_body()) calls error_entry() unconditionally even on XENPV. But XENPV needs to only push and clear regs. PUSH_AND_CLEAR_REGS in error_entry() makes the stack not return to its original place when the function returns, which means it is not possible to convert it to a C function. Carve out PUSH_AND_CLEAR_REGS out of error_entry() and into a separate function and call it before error_entry() in order to avoid calling error_entry() on XENPV. It will also allow for error_entry() to be converted to C code that can use inlined sync_regs() and save a function call. [ bp: Massage commit message. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220503032107.680190-4-jiangshanlai@gmail.com
2022-05-03x86/entry: Switch the stack after error_entry() returnsLai Jiangshan
error_entry() calls fixup_bad_iret() before sync_regs() if it is a fault from a bad IRET, to copy pt_regs to the kernel stack. It switches to the kernel stack directly after sync_regs(). But error_entry() itself is also a function call, so it has to stash the address it is going to return to, in %r12 which is unnecessarily complicated. Move the stack switching after error_entry() and get rid of the need to handle the return address. [ bp: Massage commit message. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-3-jiangshanlai@gmail.com
2022-05-03selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is ↵Ido Schimmel
operational In emulated environments, the bridge ports enslaved to br1 get a carrier before changing br1's PVID. This means that by the time the PVID is changed, br1 is already operational and configured with an IPv6 link-local address. When the test is run with netdevs registered by mlxsw, changing the PVID is vetoed, as changing the VID associated with an existing L3 interface is forbidden. This restriction is similar to the 8021q driver's restriction of changing the VID of an existing interface. Fix this by taking br1 down and bringing it back up when it is fully configured. With this fix, the test reliably passes on top of both the SW and HW data paths (emulated or not). Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03x86/traps: Use pt_regs directly in fixup_bad_iret()Lai Jiangshan
Always stash the address error_entry() is going to return to, in %r12 and get rid of the void *error_entry_ret; slot in struct bad_iret_stack which was supposed to account for it and pt_regs pushed on the stack. After this, both fixup_bad_iret() and sync_regs() can work on a struct pt_regs pointer directly. [ bp: Rewrite commit message, touch ups. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-2-jiangshanlai@gmail.com
2022-05-03Merge branch 'emaclite-improve-error-handling-and-minor-cleanup'Paolo Abeni
Radhey Shyam Pandey says: ==================== emaclite: improve error handling and minor cleanup This patchset does error handling for of_address_to_resource() and also removes "Don't advertise 1000BASE-T" and auto negotiation. Changes for v3: - Resolve git apply conflicts for 2/2 patch. Changes for v2: - Added Andrew's reviewed by tag in 1/2 patch. - Move ret to down to align with reverse xmas tree style in 2/2 patch. - Also add fixes tag in 2/2 patch. - Specify tree name in subject prefix. ==================== Link: https://lore.kernel.org/r/1651476470-23904-1-git-send-email-radhey.shyam.pandey@xilinx.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03net: emaclite: Add error handling for of_address_to_resource()Shravya Kumbham
check the return value of of_address_to_resource() and also add missing of_node_put() for np and npp nodes. Fixes: e0a3bc65448c ("net: emaclite: Support multiple phys connected to one MDIO bus") Addresses-Coverity: Event check_return value. Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>