Age | Commit message (Collapse) | Author |
|
Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without
snp_node array") initializes the local variable sdp differently depending
on the srcu's state in srcu_gp_start(). Either way, this initialization
overwrites the value used when sdp is defined.
This commit therefore drops this pointless definition-time initialization.
Although there is no functional change, compiler code generation may
be affected.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
If an SRCU reader blocks while a synchronize_srcu_expedited() waits for
that same reader, then that grace period will spawn an endless series of
workqueue handlers, consuming a full CPU. This quickly gets pointless
because consuming more CPU isn't going to make that reader get done
faster, especially if it is blocked waiting for an external event.
This commit therefore spawns at most one pair of back-to-back workqueue
handlers per expedited grace period phase, instead inserting increasing
delays as that grace period phase grows older, but capped at 10 jiffies.
In any case, if there have been at least 100 back-to-back workqueue
handlers within a single jiffy, regardless of grace period or grace-period
phase, then a one-jiffy delay is inserted.
[ paulmck: Apply feedback from kernel test robot. ]
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Song Liu <song@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit increases the sensitivity of contention detection by adding
checks to the acquisition of the srcu_data structure's lock on the
call_srcu() code path.
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems). A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
EXPORT_SYMBOL of do_exec() was removed in v5.17. Unfortunately,
kernel modules from klitmus7 7.56 have do_exec() at the end of
each kthread.
herdtools7 7.56.1 has addressed the issue.
Update the compatibility table accordingly.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Work around a hardware problem in the delta-ahe50dc-fan driver
- Explicitly disable PEC in PMBus core if not enabled
- Fix negative temperature values in f71882fg driver
- Fix warning on removal of adt7470 driver
- Fix CROSSHAIR VI HERO name in asus_wmi_sensors driver
- Fix build warning seen in xdpe12284 driver if
CONFIG_SENSORS_XDPE122_REGULATOR is disabled
- Fix type of 'ti,n-factor' in ti,tmp421 driver bindings
* tag 'hwmon-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus) delta-ahe50dc-fan: work around hardware quirk
hwmon: (pmbus) disable PEC if not enabled
hwmon: (f71882fg) Fix negative temperature
dt-bindings: hwmon: ti,tmp421: Fix type for 'ti,n-factor'
hwmon: (adt7470) Fix warning on module removal
hwmon: (asus_wmi_sensors) Fix CROSSHAIR VI HERO name
hwmon: (xdpe12284) Fix build warning seen if CONFIG_SENSORS_XDPE122_REGULATOR is disabled
|
|
Use bdev_discard_alignment to calculate the correct discard alignment
offset even for partitions instead of just looking at the queue limit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The nvme driver never sets a discard_alignment, so it also doens't need
to clear it to zero.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The loop driver never sets a discard_alignment, so it also doens't need
to clear it to zero.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to PAGE_SIZE while the discard granularity is the block size
that is smaller or the same as PAGE_SIZE as done by dasd is mostly
harmless but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by raid5 is mostly
harmless but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by dm-zoned is mostly
harmless but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
On the other hand the discard_sector_alignment from the virtio 1.1 looks
similar to what Linux uses as discard granularity (even if not very well
described):
"discard_sector_alignment can be used by OS when splitting a request
based on alignment. "
And at least qemu does set it to the discard granularity.
So stop setting the discard_alignment and use the virtio
discard_sector_alignment to set the discard granularity.
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by null_blk is mostly
harmless but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by nbd is mostly harmless
but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by ubd is mostly harmless
but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A reference to the framebuffer device struct fb_info is stored in the file
private data, but this reference could no longer be valid and must not be
accessed directly. Instead, the file_fb_info() accessor function must be
used since it does sanity checking to make sure that the fb_info is valid.
This can happen for example if the registered framebuffer device is for a
driver that just uses a framebuffer provided by the system firmware. In
that case, the fbdev core would unregister the framebuffer device when a
real video driver is probed and ask to remove conflicting framebuffers.
The bug has been present for a long time but commit 27599aacbaef ("fbdev:
Hot-unplug firmware fb devices on forced removal") unmasked it since the
fbdev core started unregistering the framebuffers' devices associated.
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Reported-by: Maxime Ripard <maxime@cerno.tech>
Reported-by: Junxiao Chang <junxiao.chang@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220502135014.377945-1-javierm@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc into arm/fixes
ASPEED device tree fixes for v5.18
- Quad SPI device tree corrections
- Reinstate GFX node that was removed
- romed8hm3 machine fixes
* tag 'aspeed-v5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/bmc:
ARM: dts: aspeed: Add video engine to g6
ARM: dts: aspeed: romed8hm3: Fix GPIOB0 name
ARM: dts: aspeed: romed8hm3: Add lm25066 sense resistor values
ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group
ARM: dts: aspeed-g6: add FWQSPI group in pinctrl dtsi
dt-bindings: pinctrl: aspeed-g6: add FWQSPI function/group
pinctrl: pinctrl-aspeed-g6: add FWQSPI function-group
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl
ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi
Link: https://lore.kernel.org/r/CACPK8XdhLfafOfqvR0r7p6V6AhtNXD4uZGaz7Y+Y4P-rc9p0tQ@mail.gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This driver doesn't have of_match_table. This makes the kernel module
tmp401.ko lack alias patterns (e.g: of:N*T*Cti,tmp411) to match DT node
of the supported devices hence this kernel module will not be
automatically loaded.
After adding of_match_table to this driver, the folllowing alias will be
added into tmp401.ko.
$ modinfo drivers/hwmon/tmp401.ko
filename: drivers/hwmon/tmp401.ko
......
author: Hans de Goede <hdegoede@redhat.com>
alias: of:N*T*Cti,tmp435C*
alias: of:N*T*Cti,tmp435
alias: of:N*T*Cti,tmp432C*
alias: of:N*T*Cti,tmp432
alias: of:N*T*Cti,tmp431C*
alias: of:N*T*Cti,tmp431
alias: of:N*T*Cti,tmp411C*
alias: of:N*T*Cti,tmp411
alias: of:N*T*Cti,tmp401C*
alias: of:N*T*Cti,tmp401
......
Fixes: af503716ac14 ("i2c: core: report OF style module alias for devices registered via OF")
Signed-off-by: Camel Guo <camel.guo@axis.com>
Link: https://lore.kernel.org/r/20220503114333.456476-1-camel.guo@axis.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
A maintainer update for omaps
A patch from Rajendra to remove his contact information for omap
PM framework.
* tag 'maintainers-signed-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
MAINTAINERS: omap: remove me as a maintainer
Link: https://lore.kernel.org/r/pull-1651061256-836848@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
If the loader has already placed the EFI kernel image randomly in
physical memory, and indicates having done so by installing the 'fixed
placement' protocol onto the image handle, don't bother randomizing the
placement again in the EFI stub.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In a future patch, arm64's implementation of handle_kernel_image() will
omit randomizing the placement of the kernel if the load address was
chosen randomly by the loader. In order to do this, it needs to locate a
protocol on the image handle, so pass it to handle_kernel_image().
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Following Baskov Evgeniy's "Handle UEFI NX-restricted page tables"
patches, it's safe to set this compatibility flag to let loaders know
they don't need to make special accommodations for kernel to load if
pre-boot NX is enabled.
Signed-off-by: Peter Jones <pjones@redhat.com>
Link: https://lore.kernel.org/all/20220329184743.798513-1-pjones@redhat.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
There are UEFI versions that restrict execution of memory regions,
preventing the kernel from booting. Parts that needs to be executable
are:
* Area used for trampoline placement.
* All memory regions that the kernel may be relocated before
and during extraction.
Use DXE services to ensure aforementioned address ranges
to be executable. Only modify attributes that does not
have appropriate attributes.
Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
Link: https://lore.kernel.org/r/20220303142120.1975-3-baskov@ispras.ru
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
UEFI DXE services are not yet used in kernel code
but are required to manipulate page table memory
protection flags.
Add required declarations to use DXE services functions.
Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
Link: https://lore.kernel.org/r/20220303142120.1975-2-baskov@ispras.ru
[ardb: ignore absent DXE table but warn if the signature check fails]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
there are cases that trigger a 2nd shadow event for the same
vmaddr/raddr combination. (prefix changes, reboots, some known races)
This will increase memory usages and it will result in long latencies
when cleaning up, e.g. on shutdown. To avoid cases with a list that has
hundreds of identical raddrs we check existing entries at insert time.
As this measurably reduces the list length this will be faster than
traversing the list at shutdown time.
In the long run several places will be optimized to create less entries
and a shrinker might be necessary.
Fixes: 4be130a08420 ("s390/mm: add shadow gmap support")
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20220429151526.1560-1-borntraeger@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When DVS is enabled via the devicetree properties
"nxp,dvs-run-voltage" and "nxp,dvs-standby-voltage" then
also the bit that enables DVS control via PMIC_STBY_REQ pin
should be set.
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Link: https://lore.kernel.org/r/20220429072211.24957-5-rickaran@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The default configuration of the PMIC behavior makes the PMIC
power cycle most regulators on WDOG_B assertion. This power
cycling causes the memory contents of OCRAM to be lost.
Some systems neeeds some memory that survives reset and
reboot, therefore this patch is created.
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Link: https://lore.kernel.org/r/20220429072211.24957-4-rickaran@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Make it possible to do warm reset on WDOG_B assertion.
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Link: https://lore.kernel.org/r/20220429072211.24957-3-rickaran@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Make the I2C Level Translator included in PCA9450 configurable from
devicetree. The reset state is off. By setting nxp,i2c-lt-enable, the
I2C Level Translator will be enabled while in STANDBY or RUN state.
Signed-off-by: Per-Daniel Olsson <perdo@axis.com>
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Link: https://lore.kernel.org/r/20220429072211.24957-2-rickaran@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
By setting nxp,i2c-lt-enable the I2C level translator is
enabled.
Signed-off-by: Per-Daniel Olsson <perdo@axis.com>
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Link: https://lore.kernel.org/r/20220429072211.24957-1-rickaran@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
|
|
On some x86 processors, CPUID leaf 0xA provides information
on Architectural Performance Monitoring features. It
advertises a PMU version which Qemu uses to determine the
availability of additional MSRs to manage the PMCs.
Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
the same, the kernel constructs return values based on the
x86_pmu_capability irrespective of the vendor.
This leaf and the additional MSRs are not supported on AMD
and Hygon processors. If AMD PerfMonV2 is detected, the PMU
version is set to 2 and guest startup breaks because of an
attempt to access a non-existent MSR. Return zeros to avoid
this.
Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
Reported-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Zen renumbered some of the performance counters that correspond to the
well known events in perf_hw_id. This code in KVM was never updated for
that, so guest that attempt to use counters on Zen that correspond to the
pre-Zen perf_hw_id values will silently receive the wrong values.
This has been observed in the wild with rr[0] when running in Zen 3
guests. rr uses the retired conditional branch counter 00d1 which is
incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
[0] https://rr-project.org/
Signed-off-by: Kyle Huey <me@kylehuey.com>
Message-Id: <20220503050136.86298-1-khuey@kylehuey.com>
Cc: stable@vger.kernel.org
[Check guest family, not host. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We are dropping A/D bits (and W bits) in the TDP MMU. Even if mmu_lock
is held for write, as volatile SPTEs can be written by other tasks/vCPUs
outside of mmu_lock.
Attempting to prove that bug exposed another notable goof, which has been
lurking for a decade, give or take: KVM treats _all_ MMU-writable SPTEs
as volatile, even though KVM never clears WRITABLE outside of MMU lock.
As a result, the legacy MMU (and the TDP MMU if not fixed) uses XCHG to
update writable SPTEs.
The fix does not seem to have an easily-measurable affect on performance;
page faults are so slow that wasting even a few hundred cycles is dwarfed
by the base cost.
|
|
syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
for TCP socket used by RDS is accessing sock_net() without acquiring a
refcount on net namespace. Since TCP's retransmission can happen after
a process which created net namespace terminated, we need to explicitly
acquire a refcount.
Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/r/a5fb1fc4-2284-3359-f6a0-e4e390239d7b@I-love.SAKURA.ne.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use an atomic XCHG to write TDP MMU SPTEs that have volatile bits, even
if mmu_lock is held for write, as volatile SPTEs can be written by other
tasks/vCPUs outside of mmu_lock. If a vCPU uses the to-be-modified SPTE
to write a page, the CPU can cache the translation as WRITABLE in the TLB
despite it being seen by KVM as !WRITABLE, and/or KVM can clobber the
Accessed/Dirty bits and not properly tag the backing page.
Exempt non-leaf SPTEs from atomic updates as KVM itself doesn't modify
non-leaf SPTEs without holding mmu_lock, they do not have Dirty bits, and
KVM doesn't consume the Accessed bit of non-leaf SPTEs.
Dropping the Dirty and/or Writable bits is most problematic for dirty
logging, as doing so can result in a missed TLB flush and eventually a
missed dirty page. In the unlikely event that the only dirty page(s) is
a clobbered SPTE, clear_dirty_gfn_range() will see the SPTE as not dirty
(based on the Dirty or Writable bit depending on the method) and so not
update the SPTE and ultimately not flush. If the SPTE is cached in the
TLB as writable before it is clobbered, the guest can continue writing
the associated page without ever taking a write-protect fault.
For most (all?) file back memory, dropping the Dirty bit is a non-issue.
The primary MMU write-protects its PTEs on writeback, i.e. KVM's dirty
bit is effectively ignored because the primary MMU will mark that page
dirty when the write-protection is lifted, e.g. when KVM faults the page
back in for write.
The Accessed bit is a complete non-issue. Aside from being unused for
non-leaf SPTEs, KVM doesn't do a TLB flush when aging SPTEs, i.e. the
Accessed bit may be dropped anyways.
Lastly, the Writable bit is also problematic as an extension of the Dirty
bit, as KVM (correctly) treats the Dirty bit as volatile iff the SPTE is
!DIRTY && WRITABLE. If KVM fixes an MMU-writable, but !WRITABLE, SPTE
out of mmu_lock, then it can allow the CPU to set the Dirty bit despite
the SPTE being !WRITABLE when it is checked by KVM. But that all depends
on the Dirty bit being problematic in the first place.
Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the is_shadow_present_pte() check out of spte_has_volatile_bits()
and into its callers. Well, caller, since only one of its two callers
doesn't already do the shadow-present check.
Opportunistically move the helper to spte.c/h so that it can be used by
the TDP MMU, which is also the primary motivation for the shadow-present
change. Unlike the legacy MMU, the TDP MMU uses a single path for clear
leaf and non-leaf SPTEs, and to avoid unnecessary atomic updates, the TDP
MMU will need to check is_last_spte() prior to calling
spte_has_volatile_bits(), and calling is_last_spte() without first
calling is_shadow_present_spte() is at best odd, and at worst a violation
of KVM's loosely defines SPTE rules.
Note, mmu_spte_clear_track_bits() could likely skip the write entirely
for SPTEs that are not shadow-present. Leave that cleanup for a future
patch to avoid introducing a functional change, and because the
shadow-present check can likely be moved further up the stack, e.g.
drop_large_spte() appears to be the only path that doesn't already
explicitly check for a shadow-present SPTE.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't treat SPTEs that are truly writable, i.e. writable in hardware, as
being volatile (unless they're volatile for other reasons, e.g. A/D bits).
KVM _sets_ the WRITABLE bit out of mmu_lock, but never _clears_ the bit
out of mmu_lock, so if the WRITABLE bit is set, it cannot magically get
cleared just because the SPTE is MMU-writable.
Rename the wrapper of MMU-writable to be more literal, the previous name
of spte_can_locklessly_be_made_writable() is wrong and misleading.
Fixes: c7ba5b48cc8d ("KVM: MMU: fast path of handling guest page fault")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The program uses CLOCK_TAI as default clock since it was added to the
Linux repo. In commit:
| 040806343bb4 ("selftests/net: so_txtime multi-host support")
a help text stating the wrong default clock was added.
This patch fixes the help text.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch fixes the parsing of the cmd line supplied start time on 32
bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot
hold a timestamp in nano second resolution.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
XENPV doesn't use swapgs_restore_regs_and_return_to_usermode(),
error_entry() and the code between entry_SYSENTER_compat() and
entry_SYSENTER_compat_after_hwframe.
Change the PV-compatible SWAPGS to the ASM instruction swapgs in these
places.
Also remove the definition of SWAPGS since no more users.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220503032107.680190-7-jiangshanlai@gmail.com
|
|
XENPV guests enter already on the task stack and they can't fault for
native_iret() nor native_load_gs_index() since they use their own pvop
for IRET and load_gs_index(). A CR3 switch is not needed either.
So there is no reason to call error_entry() in XENPV.
[ bp: Massage commit message. ]
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220503032107.680190-6-jiangshanlai@gmail.com
|
|
Move it after CLAC.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220503032107.680190-5-jiangshanlai@gmail.com
|
|
The macro idtentry() (through idtentry_body()) calls error_entry()
unconditionally even on XENPV. But XENPV needs to only push and clear
regs.
PUSH_AND_CLEAR_REGS in error_entry() makes the stack not return to its
original place when the function returns, which means it is not possible
to convert it to a C function.
Carve out PUSH_AND_CLEAR_REGS out of error_entry() and into a separate
function and call it before error_entry() in order to avoid calling
error_entry() on XENPV.
It will also allow for error_entry() to be converted to C code that can
use inlined sync_regs() and save a function call.
[ bp: Massage commit message. ]
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220503032107.680190-4-jiangshanlai@gmail.com
|
|
error_entry() calls fixup_bad_iret() before sync_regs() if it is a fault
from a bad IRET, to copy pt_regs to the kernel stack. It switches to the
kernel stack directly after sync_regs().
But error_entry() itself is also a function call, so it has to stash
the address it is going to return to, in %r12 which is unnecessarily
complicated.
Move the stack switching after error_entry() and get rid of the need to
handle the return address.
[ bp: Massage commit message. ]
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220503032107.680190-3-jiangshanlai@gmail.com
|
|
operational
In emulated environments, the bridge ports enslaved to br1 get a carrier
before changing br1's PVID. This means that by the time the PVID is
changed, br1 is already operational and configured with an IPv6
link-local address.
When the test is run with netdevs registered by mlxsw, changing the PVID
is vetoed, as changing the VID associated with an existing L3 interface
is forbidden. This restriction is similar to the 8021q driver's
restriction of changing the VID of an existing interface.
Fix this by taking br1 down and bringing it back up when it is fully
configured.
With this fix, the test reliably passes on top of both the SW and HW
data paths (emulated or not).
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Always stash the address error_entry() is going to return to, in %r12
and get rid of the void *error_entry_ret; slot in struct bad_iret_stack
which was supposed to account for it and pt_regs pushed on the stack.
After this, both fixup_bad_iret() and sync_regs() can work on a struct
pt_regs pointer directly.
[ bp: Rewrite commit message, touch ups. ]
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220503032107.680190-2-jiangshanlai@gmail.com
|
|
Radhey Shyam Pandey says:
====================
emaclite: improve error handling and minor cleanup
This patchset does error handling for of_address_to_resource() and also
removes "Don't advertise 1000BASE-T" and auto negotiation.
Changes for v3:
- Resolve git apply conflicts for 2/2 patch.
Changes for v2:
- Added Andrew's reviewed by tag in 1/2 patch.
- Move ret to down to align with reverse xmas tree style in 2/2 patch.
- Also add fixes tag in 2/2 patch.
- Specify tree name in subject prefix.
====================
Link: https://lore.kernel.org/r/1651476470-23904-1-git-send-email-radhey.shyam.pandey@xilinx.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
check the return value of of_address_to_resource() and also add
missing of_node_put() for np and npp nodes.
Fixes: e0a3bc65448c ("net: emaclite: Support multiple phys connected to one MDIO bus")
Addresses-Coverity: Event check_return value.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|