Age | Commit message (Collapse) | Author |
|
When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register. If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.
In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:
- hardware receives ERR_COR message
- hardware sets PCI_ERR_ROOT_COR_RCV
- aer_irq() entered
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
- hardware receives second ERR_COR message
- hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
- PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
- aer_irq() entered again
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
- PCI_ERR_ROOT_MULTI_COR_RCV is still set
The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.
Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:
PCI_ERR_ROOT_COR_RCV
PCI_ERR_ROOT_UNCOR_RCV
PCI_ERR_ROOT_MULTI_COR_RCV
PCI_ERR_ROOT_MULTI_UNCOR_RCV
See the bugzilla link for details from Eric about how to reproduce this
problem.
[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
|
|
After contributing it, I'll volunteer to maintain it.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-6-miquel.raynal@bootlin.com
|
|
The RZN1 RTC can compensate the imprecision of the oscillator up to
approximately 190ppm.
Seconds can last slightly shorter or longer depending on the
configuration.
Below ~65ppm of correction, we can change the time spent in a second
every minute, which is the most accurate compensation that the RTC can
offer. Above, the compensation will be active every 20s.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-5-miquel.raynal@bootlin.com
|
|
The RZN1 RTC can trigger an interrupt when reaching a particular date up
to 7 days ahead. Bring support for this alarm.
One drawback though, the granularity is about a minute.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-4-miquel.raynal@bootlin.com
|
|
Add a basic RTC driver for the RZ/N1.
Signed-off-by: Michel Pollet <michel.pollet@bp.renesas.com>
Co-developed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-3-miquel.raynal@bootlin.com
|
|
Add new binding file for this RTC.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-2-miquel.raynal@bootlin.com
|
|
The sun6i RTC provides 32 bytes of general-purpose data registers.
They can be used to save data in the always-on RTC power domain.
The registers are writable via 32-bit MMIO accesses only.
Expose them with a NVMEM provider so they can be used by other drivers.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220413231731.56709-1-samuel@sholland.org
|
|
Fix the following coccicheck warning:
drivers/i3c/master/svc-i3c-master.c:1600:5-8:
Unneeded variable: "ret". Return "0" on line 1605.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220504164901.9622-1-guozhengkui@vivo.com
|
|
Simplify the return expression.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220505021954.54524-1-chi.minghao@zte.com.cn
|
|
drm_dp_mst_get_edid call kmemdup to create mst_edid. So mst_edid need to be
freed after use.
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220516032042.13166-1-hbh25y@gmail.com
|
|
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes.
The whole cache flush in flush_kernel_vmap_range is only possible
when interrupts are enabled on SMP machines. Since __patch_text_multiple
calls flush_kernel_vmap_range with interrupts disabled, it is better
to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm.
3) The final call to flush_icache_range is unnecessary.
Tested with `[PATCH, V3] parisc: Rewrite cache flush code for
PA8800/PA8900' change on rp3440, c8000 and c3750 (32 and 64-bit).
Note by Helge:
This patch had been temporarily reverted shortly before v5.18-rc6 in order
to fix boot issues. Now it can be re-applied.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Originally, I was convinced that we needed to use tmpalias flushes
everwhere, for both user and kernel flushes. However, when I modified
flush_kernel_dcache_page_addr, to use a tmpalias flush, my c8000
would crash quite early when booting.
The PDC returns alias values of 0 for the icache and dcache. This
indicates that either the alias boundary is greater than 16MB or
equivalent aliasing doesn't work. I modified the tmpalias code to
make it easy to try alternate boundaries. I tried boundaries up to
128MB but still kernel tmpalias flushes didn't work on c8000.
This led me to conclude that tmpalias flushes don't work on PA8800
and PA8900 machines, and that we needed to flush directly using the
virtual address of user and kernel pages. This is likely the major
cause of instability on the c8000 and rp34xx machines.
Flushing user pages requires doing a temporary context switch as we
have to flush pages that don't belong to the current context. Further,
we have to deal with pages that aren't present. If a page isn't
present, the flush instructions fault on every line.
Other code has been rearranged and simplified based on testing. For
example, I introduced a flush_cache_dup_mm routine. flush_cache_mm
and flush_cache_dup_mm differ in that flush_cache_mm calls
purge_cache_pages and flush_cache_dup_mm calls flush_cache_pages.
In some implementations, pdc is more efficient than fdc. Based on
my testing, I don't believe there's any performance benefit on the
c8000.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cache_page_if_present.
The pte value extracted for the "from" page in copy_user_highpage is racy and
occasionally the pte is cleared before the flush is complete. I assume that
the page is simultaneously flushed by flush_cache_mm before the pte is cleared
as nullifying the fdc doesn't seem to cause problems.
I investigated various locking scenarios but I wasn't able to find a way to
sequence the flushes. This code is called for every COW break and locks impact
performance.
This patch is related to the bigger cache flush patch because we need the pte
on PA8800/PA8900 to flush using the vma context.
I have also seen this from copy_to_user_page and copy_from_user_page.
The messages appear infrequently when enabled.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux into clk-imx
Pull i.MX clk driver updates from Abel Vesa:
- Add 27 MHz phy PLL ref clock
- Add mcore_booted module parameter to tell kernel M core has already booted
- Remove snvs clock
- Add bindings for i.MX8MN GPT
- Add check for kcalloc
- Fix for a potential memory leak in __imx_clk_gpr_sync
- Add DISP2 pixel clock for i.MX8MP
- Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
- Add clkout1/2 for i.MX8MP
- Fix parent clock of ubs_root_clk for i.MX8MP
* tag 'clk-imx-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux:
clk: imx8mp: fix usb_root_clk parent
clk: imx8mp: add clkout1/2 support
clk: imx: scu: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
clk: imx8mp: Add DISP2 pixel clock
clk: imx: scu: fix a potential memory leak in __imx_clk_gpr_scu()
clk: imx: Add check for kcalloc
clk: imx8mn: add GPT support
dt-bindings: imx: add clock bindings for i.MX8MN GPT
clk: imx: Remove the snvs clock
clk: imx8m: check mcore_booted before register clk
clk: imx: add mcore_booted module paratemter
clk: imx8mq: add 27m phy pll ref clock
|
|
It turns out that networking.pdf has exceeded 100 chapters and
titles of chapters >= 100 collide with their counts in its table
of contents (TOC).
Increase relevant params by 0.6em in the preamble to avoid such
ugly collisions.
While at it, fix a typo in comment (subsection).
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/r/bdb60ba3-7813-47d0-74f9-7c31dd912d95@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
clk_generated_best_diff() helps in finding the parent and the divisor to
compute a rate closest to the required one. However, it doesn't take into
account the request's range for the new rate. Make sure the new rate
is within the required range.
Fixes: 8a8f4bf0c480 ("clk: at91: clk-generated: create function to find best_diff")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchip.com
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Pass updated i_size in fscache_unuse_cookie() when called
from nfs_fscache_release_file(), which ensures the size of
an fscache object gets written to the cache storage. Failing
to do so results in unnessary reads from the NFS server, even
when the data is cached, due to a cachefiles object coherency
check failing with a trace similar to the following:
cachefiles_coherency: o=0000000e BAD osiz B=afbb3 c=0
This problem can be reproduced as follows:
#!/bin/bash
v=4.2; NFS_SERVER=127.0.0.1
set -e; trap cleanup EXIT; rc=1
function cleanup {
umount /mnt/nfs > /dev/null 2>&1
RC_STR="TEST PASS"
[ $rc -eq 1 ] && RC_STR="TEST FAIL"
echo "$RC_STR on $(uname -r) with NFSv$v and server $NFS_SERVER"
}
mount -o vers=$v,fsc $NFS_SERVER:/export /mnt/nfs
rm -f /mnt/nfs/file1.bin > /dev/null 2>&1
dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1 > /dev/null 2>&1
echo 3 > /proc/sys/vm/drop_caches
echo Read file 1st time from NFS server into fscache
dd if=/mnt/nfs/file1.bin of=/dev/null > /dev/null 2>&1
umount /mnt/nfs && mount -o vers=$v,fsc $NFS_SERVER:/export /mnt/nfs
echo 3 > /proc/sys/vm/drop_caches
echo Read file 2nd time from fscache
dd if=/mnt/nfs/file1.bin of=/dev/null > /dev/null 2>&1
echo Check mountstats for NFS read
grep -q "READ: 0" /proc/self/mountstats # (1st number) == 0
[ $? -eq 0 ] && rc=0
Fixes: a6b5a28eb56c "nfs: Convert to new fscache volume/cookie API"
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
cpufreq_offline() calls offline() and exit() under the policy rwsem
But they are called outside the rwsem in cpufreq_online().
Make cpufreq_online() call offline() and exit() as well as online() and
init() under the policy rwsem to achieve a clear lock relationship.
All of the init() and online() implementations in the tree only
initialize the policy object without attempting to acquire the policy
rwsem and they won't call cpufreq APIs attempting to acquire it.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If policy initialization fails after the sysfs files are created,
there is a possibility to end up running show()/store() callbacks
for half-initialized policies, which may have unpredictable
outcomes.
Abort show()/store() in such a case by making sure the policy is active.
Also dectivate the policy on such failures.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Translate dev-tools/gdb-kernel-debugging.rst into Chinese.
Signed-off-by: gaochao <gaochao49@huawei.com>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Wu XiangCheng <bobwxc@email.cn>
Link: https://lore.kernel.org/r/20220514100046.1683-1-gaochao49@huawei.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
To enable NFSv4 to work correctly, NFSv4 client identifiers have
to be globally unique and persistent over client reboots. We
believe that in many cases, a good default identifier can be
chosen and set when a client system is imaged.
Because there are many different ways a system can be imaged,
provide an explanation of how NFSv4 client identifiers and
principals can be set by install scripts and imaging tools.
Additional cases, such as NFSv4 clients running in containers, also
need unique and persistent identifiers. The Linux NFS community
sets forth this explanation to aid those who create and manage
container environments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The documentation for nfs4_unique_id is out-of-date. In particular it
claim that when nfs4_unique_id is set, the host name is not used. since
Commit 55b592933b7d ("NFSv4: Fix nfs4_init_uniform_client_string for net
namespaces") both the unique_id AND the host name are used.
Update the documentation to match the code.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Fix a typo in ntrig.rst (found with 'codespell').
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: linux-input@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20220516002047.11395-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Fix 2 "MOSE" typos in atarikbd.rst (found with 'codespell').
a. s/MOSE/MODE/
b. s/MOSE/MOUSE/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: linux-input@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20220516002055.12000-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
NFSv4 can lose locks if, for example there is a network partition for
longer than the lease period. When this happens a warning message
NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
is generated, possibly once for each lock (though rate limited).
This is potentially misleading as is can be read as suggesting that lock
reclaim was attempted. However the default behaviour is to not attempt
to recover locks (except due to server report).
This patch changes the reporting to produce at most one message for each
attempt to recover all state from a given server. The message reports
the server name and the number of locks lost if that number is non-zero.
It reports that locks were lost and give no suggestion as to whether
there was an attempt or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
It's time to become a maintainer of Chinese documentation, and Yanteng's plan
is to help everyone with the utmost enthusiasm and patience.
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/b0c1324d1d63846d700ab354446a6deaf30754c0.1652712771.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Now that everything is fully locked there is no need for container_users
to remain as an atomic, change it to an unsigned int.
Use 'if (group->container)' as the test to determine if the container is
present or not instead of using container_users.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Once userspace opens a group FD it is prevented from opening another
instance of that same group FD until all the prior group FDs and users of
the container are done.
The first is done trivially by checking the group->opened during group FD
open.
However, things get a little weird if userspace creates a device FD and
then closes the group FD. The group FD still cannot be re-opened, but this
time it is because the group->container is still set and container_users
is elevated by the device FD.
Due to this mismatched lifecycle we have the
vfio_group_try_dissolve_container() which tries to auto-free a container
after the group FD is closed but the device FD remains open.
Instead have the device FD hold onto a reference to the single group
FD. This directly prevents vfio_group_fops_release() from being called
when any device FD exists and makes the lifecycle model more
understandable.
vfio_group_try_dissolve_container() is removed as the only place a
container is auto-deleted is during vfio_group_fops_release(). At this
point the container_users is either 1 or 0 since all device FDs must be
closed.
Change group->opened to group->opened_file which points to the single
struct file * that is open for the group. If the group->open_file is
NULL then group->container == NULL.
If all device FDs have closed then the group's notifier list must be
empty.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
This is necessary to avoid various user triggerable races, for instance
racing SET_CONTAINER/UNSET_CONTAINER:
ioctl(VFIO_GROUP_SET_CONTAINER)
ioctl(VFIO_GROUP_UNSET_CONTAINER)
vfio_group_unset_container
int users = atomic_cmpxchg(&group->container_users, 1, 0);
// users == 1 container_users == 0
__vfio_group_unset_container(group);
container = group->container;
vfio_group_set_container()
if (!atomic_read(&group->container_users))
down_write(&container->group_lock);
group->container = container;
up_write(&container->group_lock);
down_write(&container->group_lock);
group->container = NULL;
up_write(&container->group_lock);
vfio_container_put(container);
/* woops we lost/leaked the new container */
This can then go on to NULL pointer deref since container == 0 and
container_users == 1.
Wrap all touches of container, except those on a performance path with a
known open device, with the group_rwsem.
The only user of vfio_group_add_container_user() holds the user count for
a simple operation, change it to just hold the group_lock over the
operation and delete vfio_group_add_container_user(). Containers now only
gain a user when a device FD is opened.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The split follows the pairing with the destroy functions:
- vfio_group_get_device_fd() destroyed by close()
- vfio_device_open() destroyed by vfio_device_fops_release()
- vfio_device_assign_container() destroyed by
vfio_group_try_dissolve_container()
The next patch will put a lock around vfio_device_assign_container().
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
This is not a performance path, just use the group_rwsem to protect the
value.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Without locking userspace can trigger a UAF by racing
KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD:
CPU1 CPU2
ioctl(KVM_DEV_VFIO_GROUP_DEL)
ioctl(VFIO_GROUP_GET_DEVICE_FD)
vfio_group_get_device_fd
open_device()
intel_vgpu_open_device()
vfio_register_notifier()
vfio_register_group_notifier()
blocking_notifier_call_chain(&group->notifier,
VFIO_GROUP_NOTIFY_SET_KVM, group->kvm);
set_kvm()
group->kvm = NULL
close()
kfree(kvm)
intel_vgpu_group_notifier()
vdev->kvm = data
[..]
kvm_get_kvm(vgpu->kvm);
// UAF!
Add a simple rwsem in the group to protect the kvm while the notifier is
using it.
Note this doesn't fix the race internal to i915 where userspace can
trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of
vgpu->kvm and trigger this same UAF, it just makes the notifier
self-consistent.
Fixes: ccd46dbae77d ("vfio: support notifier chain in vfio_group")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Fix following coccicheck warning:
./virt/kvm/vfio.c:258:1-7: preceding lock on line 236
If kvm_vfio_file_iommu_group() failed, code would goto err_fdput with
mutex_lock acquired and then return ret. It might cause potential
deadlock. Move mutex_unlock bellow err_fdput tag to fix it.
Fixes: d55d9e7a45721 ("kvm/vfio: Store the struct file in the kvm_vfio_group")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220517023441.4258-1-wanjiabing@vivo.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Convert rockchip,rk3368-cru.txt to YAML.
Changes against original bindings:
- Add clocks and clock-names because the device has to have
at least one input clock.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220329180550.31043-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Convert rockchip,rk3228-cru.txt to YAML.
Changes against original bindings:
Add clocks and clock-names because the device has to have
at least one input clock.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220330121923.24240-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Convert rockchip,rk3036-cru.txt to YAML.
Changes against original bindings:
Add clocks and clock-names because the device has to have
at least one input clock.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220330114847.18633-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Convert rockchip,rk3308-cru.txt to YAML.
Changes against original bindings:
- Add clocks and clock-names because the device has to have
at least one input clock.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220329184339.1134-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
If the device is a detachable (and therefore lacks full keyboard), we may
still want to load this driver because the device might have some other
buttons or switches (e.g. volume and power buttons or a tablet mode
switch). In such case we do not want to register the "main" keyboard device
to allow userspace detect when the detachable keyboard is disconnected and
adjust the system behavior for the tablet mode.
Originally it was suggested to simply skip keyboard registration if row and
columns properties didn't exist, but that approach did not convey the
intent strongly enough and also had a slight problem for migrating existing
DTBs without updating the kernel first, so it was decided to introduce new
google,cros-ec-keyb-switches to explicitly mark devices that only have
axillary buttons and switches.
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20220516183452.942008-3-swboyd@chromium.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
If the ChromeOS board is a detachable, this cros-ec-keyb device won't
have a matrix keyboard but it may have some button switches, e.g. volume
buttons and power buttons. The driver still registers a keyboard though
and that leads to userspace confusion around where the keyboard is.
We tried to work around this by only registering the keyboard device when
rows/columns properties were specified for the device, but that led to
another problem where removing the rows/columns properties breaks the
existing binding. Technically before that commit the rows/columns
properties were required, otherwise the driver would fail to probe.
Removing the properties from devicetrees makes the driver fail to probe
unless the corresponding driver patch is present. Furthermore, this makes
requiring matrix keyboard properties for devices that really have a
keyboard impossible because the compatible drives the schema and now the
properties are optional.
Add a more specific compatible for this type of device that indicates to
the OS that there are only switches and no matrix keyboard present.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20220516183452.942008-2-swboyd@chromium.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Convert rockchip,px30-cru.txt to YAML.
Changes against original bindings:
Use compatible string: "rockchip,px30-pmucru"
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220330103923.11063-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Current dts files with RK3188/RK3066 'cru' nodes are manually verified.
In order to automate this process rockchip,rk3188-cru.txt has to be
converted to YAML.
Changed:
Add properties to fix notifications by clocks.yaml for example:
clocks
clock-names
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220329111323.3569-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
Current dts files with RK3288 'cru' nodes are manually verified.
In order to automate this process rockchip,rk3288-cru.txt has to be
converted to YAML.
Changed:
Add properties to fix notifications by clocks.yaml for example:
clocks
clock-names
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220329113657.4567-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
smb2_compound_op
There is a race condition in smb2_compound_op:
after_close:
num_rqst++;
if (cfile) {
cifsFileInfo_put(cfile); // sends SMB2_CLOSE to the server
cfile = NULL;
This is triggered by smb2_query_path_info operation that happens during
revalidate_dentry. In smb2_query_path_info, get_readable_path is called to
load the cfile, increasing the reference counter. If in the meantime, this
reference becomes the very last, this call to cifsFileInfo_put(cfile) will
trigger a SMB2_CLOSE request sent to the server just before sending this compound
request – and so then the compound request fails either with EBADF/EIO depending
on the timing at the server, because the handle is already closed.
In the first scenario, the race seems to be happening between smb2_query_path_info
triggered by the rename operation, and between “cleanup” of asynchronous writes – while
fsync(fd) likely waits for the asynchronous writes to complete, releasing the writeback
structures can happen after the close(fd) call. So the EBADF/EIO errors will pop up if
the timing is such that:
1) There are still outstanding references after close(fd) in the writeback structures
2) smb2_query_path_info successfully fetches the cfile, increasing the refcounter by 1
3) All writeback structures release the same cfile, reducing refcounter to 1
4) smb2_compound_op is called with that cfile
In the second scenario, the race seems to be similar – here open triggers the
smb2_query_path_info operation, and if all other threads in the meantime decrease the
refcounter to 1 similarly to the first scenario, again SMB2_CLOSE will be sent to the
server just before issuing the compound request. This case is harder to reproduce.
See https://bugzilla.samba.org/show_bug.cgi?id=15051
Cc: stable@vger.kernel.org
Fixes: 8de9e86c67ba ("cifs: create a helper to find a writeable handle by path name")
Signed-off-by: Ondrej Hubsch <ohubsch@purestorage.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We gate whether to IOPOLL for a request on whether the opcode is allowed
on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING
is the only one that allows a file yet isn't pollable, it's merely
supported to allow communication on an IOPOLL ring, not because we can
poll for completion of it.
Put the assigned file early and clear it, so we don't attempt to poll
for it.
Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com
Fixes: 3f1d52abf098 ("io_uring: defer msg-ring file validity check until command issue")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Tryng to rename a directory that has all following properties fails with
EINVAL and triggers the 'WARN_ON_ONCE(!fscrypt_has_encryption_key(dir))'
in f2fs_match_ci_name():
- The directory is casefolded
- The directory is encrypted
- The directory's encryption key is not yet set up
- The parent directory is *not* encrypted
The problem is incorrect handling of the lookup of ".." to get the
parent reference to update. fscrypt_setup_filename() treats ".." (and
".") specially, as it's never encrypted. It's passed through as-is, and
setting up the directory's key is not attempted. As the name isn't a
no-key name, f2fs treats it as a "normal" name and attempts a casefolded
comparison. That breaks the assumption of the WARN_ON_ONCE() in
f2fs_match_ci_name() which assumes that for encrypted directories,
casefolded comparisons only happen when the directory's key is set up.
We could just remove this WARN_ON_ONCE(). However, since casefolding is
always a no-op on "." and ".." anyway, let's instead just not casefold
these names. This results in the standard bytewise comparison.
Fixes: 7ad08a58bf67 ("f2fs: Handle casefolding with Encryption")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling
chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty
segment as a victim all the time, resulting in checkpoint=disable failure,
for example. Let's pick another one, if we fail to collect it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Symlink's external data block is one kind of metadata block, and now
that almost all ext4 metadata block's page cache (e.g. directory blocks,
quota blocks...) belongs to bdev backing inode except the symlink. It
is essentially worked in data=journal mode like other regular file's
data block because probably in order to make it simple for generic VFS
code handling symlinks or some other historical reasons, but the logic
of creating external data block in ext4_symlink() is complicated. and it
also make things confused if user do not want to let the filesystem
worked in data=journal mode. This patch convert the final exceptional
case and make things clean, move the mapping of the symlink's external
data block to bdev like any other metadata block does.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20220424140936.1898920-3-yi.zhang@huawei.com
|
|
Current ext4_getblk() might sleep if some resources are not valid or
could be race with a concurrent extents modifing procedure. So we
cannot call ext4_getblk() and ext4_map_blocks() to get map blocks in
the atomic context in some fast path (e.g. the upcoming procedure of
getting symlink external block in the RCU context), even if the map
extents have already been check and cached.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20220424140936.1898920-2-yi.zhang@huawei.com
|