Age | Commit message (Collapse) | Author |
|
of_find_node_by_path() calls of_find_node_opts_by_path(),
which returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
Fixes: 9c1a5077fdca ("input: Rewrite sparcspkr device probing.")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220516081018.42728-1-linmq006@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
drm-next
fix address space collisions in some edge cases when userspace is
using softpin and cleans up the MMU reference handling a bit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/ffae9f7d03ca7a9e00da16d5910ae810befd3c5a.camel@pengutronix.de
|
|
Use kobj_to_dev() instead of open-coding it.
Link: https://lore.kernel.org/r/20220510105113.1351891-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The bsg_setup_queue() function does not return NULL. It returns error
pointers. Fix the check accordingly.
Link: https://lore.kernel.org/r/YnUf7RQl+A3tigWh@kili
Fixes: 4268fa751365 ("scsi: mpi3mr: Add bsg device support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Using get_cpu() leads to disabling preemption and in this context it is not
possible to acquire the following spinlock_t on PREEMPT_RT because it
becomes a sleeping lock.
Commit 0ea5c27583e1 ("[SCSI] bnx2fc: common free list for cleanup
commands") says that it is using get_cpu() as a fix in case the CPU is
preempted. While this might be true, the important part is that it is now
using the same CPU for locking and unlocking while previously it always
relied on smp_processor_id(). The date structure itself is protected with
a lock so it does not rely on CPU-local access.
Replace get_cpu() with raw_smp_processor_id() to obtain the current CPU
number which is used as an index for the per-CPU resource.
Link: https://lore.kernel.org/r/20220506105758.283887-5-bigeasy@linutronix.de
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The get_cpu() in fc_exch_em_alloc() was introduced in commit f018b73af6db
("[SCSI] libfc, libfcoe, fcoe: use smp_processor_id() only when preempt
disabled") for no other reason than to simply use smp_processor_id()
without getting a warning, because everything is done with the pool->lock
held anyway. However, get_cpu(), by disabling preemption, does not play
well with PREEMPT_RT, particularly when acquiring a regular (and thus
sleepable) spinlock.
Therefore remove the get_cpu() and just use the unstable value as we will
have CPU locality guarantees next by taking the lock. The window of
migration, as noted by Sebastian, is small and even if it happens the
result is correct.
Link: https://lore.kernel.org/r/20211117025956.79616-2-dave@stgolabs.net
Link: https://lore.kernel.org/r/20220506105758.283887-4-bigeasy@linutronix.de
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The per-CPU statistics (struct fc_stats) is updated by getting a stable
per-CPU pointer via get_cpu() + per_cpu_ptr() and then performing the
increment. This can be optimized by using this_cpu_*() which will do
whatever is needed on the architecture to perform the update safe and
efficient. The read out of the individual value (fc_get_host_stats())
should be done by using READ_ONCE() instead of a plain-C access. The
difference is that READ_ONCE() will always perform a single access while
the plain-C access can be split by the compiler into two loads if it
appears beneficial. The usage of u64 has the side-effect that it is also
64bit wide on 32bit architectures and the read is always split into two
loads. The can lead to strange values if the read happens during an update
which alters both 32bit parts of the 64bit value. This can be circumvented
by either using a 32bit variables on 32bit architecures or extending the
statistics with a sequence counter.
Use this_cpu_*() API to update the statistics and READ_ONCE() to read it.
Link: https://lore.kernel.org/r/20220506105758.283887-3-bigeasy@linutronix.de
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
fcoe_get_paged_crc_eof() relies on the caller having preemption disabled to
ensure the per-CPU fcoe_percpu context remains valid throughout the
call. This is done by either holding spinlocks (such as bnx2fc_global_lock
or qedf_global_lock) or the get_cpu() from fcoe_alloc_paged_crc_eof(). This
last one breaks PREEMPT_RT semantics as there can be memory allocation and
end up sleeping in atomic contexts.
Introduce a local_lock_t to struct fcoe_percpu that will keep the non-RT
case the same, mapping to preempt_disable/enable, while RT will use a
per-CPU spinlock allowing the region to be preemptible but still maintain
CPU locality. The other users of fcoe_percpu are already safe in this
regard and do not require local_lock()ing.
Link: https://lore.kernel.org/r/20211117025956.79616-3-dave@stgolabs.net
Link: https://lore.kernel.org/r/20220506105758.283887-2-bigeasy@linutronix.de
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Removed unnecessary:
select COMMON_CLK_SP7021
select RESET_SUNPLUS
select NVMEM_SUNPLUS_OCOTP
from Kconfig.
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Link: https://lore.kernel.org/r/1652443036-24731-1-git-send-email-wellslutw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cast pointers to unsigned long instead of to uint64_t to avoid this
problem on 32-bit arches:
31 6.89 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
bench/breakpoint.c: In function 'breakpoint_setup':
bench/breakpoint.c:56:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
56 | attr.bp_addr = (uint64_t)addr;
| ^
cc1: all warnings being treated as errors
make[3]: *** [/git/perf-5.18.0-rc7/tools/build/Makefile.build:139: bench] Error 2
Fixes: 68a6772f11dbb1ed ("perf bench: Add breakpoint benchmarks")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/YoLq1nHx1doi+VWl@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-05-16
the first 2 patches are by me and target the CAN raw protocol. The 1st
removes an unneeded assignment, the other one adds support for
SO_TXTIME/SCM_TXTIME.
Oliver Hartkopp contributes 2 patches for the ISOTP protocol. The 1st
adds support for transmission without flow control, the other let's
bind() return an error on incorrect CAN ID formatting.
Geert Uytterhoeven contributes a patch to clean up ctucanfd's Kconfig
file.
Vincent Mailhol's patch for the slcan driver uses the proper function
to check for invalid CAN frames in the xmit callback.
The next patch is by Geert Uytterhoeven and makes the interrupt-names
of the renesas,rcar-canfd dt bindings mandatory.
A patch by my update the ctucanfd dt bindings to include the common
CAN controller bindings.
The last patch is by Akira Yokosawa and fixes a breakage the
ctucanfd's documentation.
* tag 'linux-can-next-for-5.19-20220516' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
dt-bindings: can: ctucanfd: include common CAN controller bindings
dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required
can: slcan: slc_xmit(): use can_dropped_invalid_skb() instead of manual check
can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: add support for transmission without flow control
can: raw: add support for SO_TXTIME/SCM_TXTIME
can: raw: raw_sendmsg(): remove not needed setting of skb->sk
====================
Link: https://lore.kernel.org/r/20220516202625.1129281-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the %pg format specifier to save on stack consuption and code size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220512062014.1826835-1-hch@lst.de
|
|
The is_kmap_addr() and the is_vmalloc_addr() in the check_heap_object()
will not work, because the virt_addr_valid() will exclude the kmap and
vmalloc regions. So let's move the virt_addr_valid() below
the is_vmalloc_addr().
Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com>
Fixes: 4e140f59d285 ("mm/usercopy: Check kmap addresses properly")
Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220505071037.4121100-1-songyuanzheng@huawei.com
|
|
With all randstruct exceptions removed, remove all the exception
handling code. Any future warnings are likely to be shared between
this plugin and Clang randstruct, and will need to be addressed in a
more wholistic fashion.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
While preparing for Clang randstruct support (which duplicated many of
the warnings the randstruct GCC plugin warned about), one strange one
remained only for the randstruct GCC plugin. Eliminating this rids
the plugin of the last exception.
It seems the plugin is happy to dereference individual members of
a cross-struct cast, but it is upset about casting to a whole object
pointer. This only manifests in one place in the kernel, so just replace
the variable with individual member accesses. There is no change in
executable instruction output.
Drop the last exception from the randstruct GCC plugin.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: netdev@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/lkml/20220511022217.58586-1-kuniyu@amazon.co.jp
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/lkml/20220511151542.4cb3ff17@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Clang randstruct gets upset when it sees struct addresspace (which is
randomized) being assigned to a struct page (which is not randomized):
drivers/net/ethernet/sun/niu.c:3385:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct page *'
*link = (struct page *) page->mapping;
^
It looks like niu.c is looking for an in-line place to chain its allocated
pages together and is overloading the "mapping" member, as it is unused.
This is very non-standard, and is expected to be cleaned up in the
future[1], but there is no "correct" way to handle it today.
No meaningful machine code changes result after this change, and source
readability is improved.
Drop the randstruct exception now that there is no "confusing" cross-type
assignment.
[1] https://lore.kernel.org/lkml/YnqgjVoMDu5v9PNG@casper.infradead.org/
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Du Cheng <ducheng2@gmail.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-hardening@vger.kernel.org
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/lkml/20220511151647.7290adbe@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
The randstruct GCC plugin gets upset when it sees struct path (which is
randomized) being assigned from a "void *" (which it cannot type-check).
There's no need for these casts, as the entire internal payload use is
following a normal struct layout. Convert the enum-based void * offset
dereferencing to the new big_key_payload struct. No meaningful machine
code changes result after this change, and source readability is improved.
Drop the randstruct exception now that there is no "confusing" cross-type
assignment.
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-hardening@vger.kernel.org
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
bpf selftests can no longer be built with CFLAGS=-static with
liburandom_read.so and its dependent target.
Filter out -static for liburandom_read.so and its dependent target.
When building statically, this leaves urandom_read relying on
system-wide shared libraries.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220514002115.1376033-1-yosryahmed@google.com
|
|
Now we use huge_ptep_get() to get the pte value of a hugetlb page,
however it will only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb on ARM64 system, which can contain several
continuous pte or pmd entries with same page table attributes. And it
will not take into account the subpages' dirty or young bits of a
CONT-PTE/PMD size hugetlb page.
So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
which already takes account the dirty or young bits for any subpages
in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
young flags statistics for hugetlb pages with current huge_ptep_get(),
such as the gather_hugetlb_stats() function, and CONT-PTE/PMD hugetlb
monitoring with DAMON.
Thus define an ARM64 specific huge_ptep_get() implementation as well as
enabling __HAVE_ARCH_HUGE_PTEP_GET, that will take into account any
subpages' dirty or young bits for CONT-PTE/PMD size hugetlb page, for
those functions that want to check the dirty and young flags of a hugetlb
page.
[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
Suggested-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/624109a80ac4bbdf1e462dfa0b49e9f7c31a7c0d.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The original huge_ptep_get() on ARM64 is just a wrapper of ptep_get(),
which will not take into account any contig-PTEs dirty and access bits.
Meanwhile we will implement a new ARM64-specific huge_ptep_get()
interface in following patch, which will take into account any contig-PTEs
dirty and access bits. To keep the same efficient logic to get the pte
value, change to use ptep_get() as a preparation.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/5113ed6e103f995e1d0f0c9fda0373b761bbcad2.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
XFS has the unique behavior (as compared to the other Linux filesystems)
that on writeback errors it will completely invalidate the affected
folio and force the page cache to reread the contents from disk. All
other filesystems leave the page mapped and up to date.
This is a rude awakening for user programs, since (in the case where
write fails but reread doesn't) file contents will appear to revert to
old disk contents with no notification other than an EIO on fsync. This
might have been annoying back in the days when iomap dealt with one page
at a time, but with multipage folios, we can now throw away *megabytes*
worth of data for a single write error.
On *most* Linux filesystems, a program can respond to an EIO on write by
redirtying the entire file and scheduling it for writeback. This isn't
foolproof, since the page that failed writeback is no longer dirty and
could be evicted, but programs that want to recover properly *also*
have to detect XFS and regenerate every write they've made to the file.
When running xfs/314 on arm64, I noticed a UAF when xfs_discard_folio
invalidates multipage folios that could be undergoing writeback. If,
say, we have a 256K folio caching a mix of written and unwritten
extents, it's possible that we could start writeback of the first (say)
64K of the folio and then hit a writeback error on the next 64K. We
then free the iop attached to the folio, which is really bad because
writeback completion on the first 64k will trip over the "blocks per
folio > 1 && !iop" assertion.
This can't be fixed by only invalidating the folio if writeback fails at
the start of the folio, since the folio is marked !uptodate, which trips
other assertions elsewhere. Get rid of the whole behavior entirely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
A PCMD (Paging Crypto MetaData) page contains the PCMD
structures of enclave pages that have been encrypted and
moved to the shmem backing store. When all enclave pages
sharing a PCMD page are loaded in the enclave, there is no
need for the PCMD page and it can be truncated from the
backing store.
A few issues appeared around the truncation of PCMD pages. The
known issues have been addressed but the PCMD handling code could
be made more robust by loudly complaining if any new issue appears
in this area.
Add a check that will complain with a warning if the PCMD page is not
actually empty after it has been truncated. There should never be data
in the PCMD page at this point since it is was just checked to be empty
and truncated with enclave mutex held and is updated with the
enclave mutex held.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Link: https://lkml.kernel.org/r/6495120fed43fafc1496d09dd23df922b9a32709.1652389823.git.reinette.chatre@intel.com
|
|
Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back right away.
Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
just one entry, that of ENCLAVE_B.
Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_B is faulted in.
sgx_reclaim_pages() {
...
/*
* Reclaim ENCLAVE_A
*/
mutex_lock(&encl->lock);
/*
* Get a reference to ENCLAVE_A's
* shmem page where enclave page
* encrypted data will be stored
* as well as a reference to the
* enclave page's PCMD data page,
* PCMD_AB.
* Release mutex before writing
* any data to the shmem pages.
*/
sgx_encl_get_backing(...);
encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
mutex_unlock(&encl->lock);
/*
* Fault ENCLAVE_B
*/
sgx_vma_fault() {
mutex_lock(&encl->lock);
/*
* Get reference to
* ENCLAVE_B's shmem page
* as well as PCMD_AB.
*/
sgx_encl_get_backing(...)
/*
* Load page back into
* enclave via ELDU.
*/
/*
* Release reference to
* ENCLAVE_B' shmem page and
* PCMD_AB.
*/
sgx_encl_put_backing(...);
/*
* PCMD_AB is found empty so
* it and ENCLAVE_B's shmem page
* are truncated.
*/
/* Truncate ENCLAVE_B backing page */
sgx_encl_truncate_backing_page();
/* Truncate PCMD_AB */
sgx_encl_truncate_backing_page();
mutex_unlock(&encl->lock);
...
}
mutex_lock(&encl->lock);
encl_page->desc &=
~SGX_ENCL_PAGE_BEING_RECLAIMED;
/*
* Write encrypted contents of
* ENCLAVE_A to ENCLAVE_A shmem
* page and its PCMD data to
* PCMD_AB.
*/
sgx_encl_put_backing(...)
/*
* Reference to PCMD_AB is
* dropped and it is truncated.
* ENCLAVE_A's PCMD data is lost.
*/
mutex_unlock(&encl->lock);
}
What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
with a #GP.
If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
a new PCMD page is allocated and providing the empty PCMD data for
ENCLAVE_A would cause ENCLS[ELDU] to #GP
If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
would #GP during its checks of the PCMD value and the WARN would be
encountered.
Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
it obtains a reference to the backing store pages of an enclave page it
is in the process of reclaiming, fix the race by only truncating the PCMD
page after ensuring that no page sharing the PCMD page is in the process
of being reclaimed.
Cc: stable@vger.kernel.org
Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.com
|
|
Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back
right away.
The SGX backing storage is accessed on two paths: when there
are insufficient free pages in the EPC the reclaimer works
to move enclave pages to the backing storage and as enclaves
access pages that have been moved to the backing storage
they are retrieved from there as part of page fault handling.
An oversubscribed SGX system will often run the reclaimer and
page fault handler concurrently and needs to ensure that the
backing store is accessed safely between the reclaimer and
the page fault handler. This is not the case because the
reclaimer accesses the backing store without the enclave mutex
while the page fault handler accesses the backing store with
the enclave mutex.
Consider the scenario where a page is faulted while a page sharing
a PCMD page with the faulted page is being reclaimed. The
consequence is a race between the reclaimer and page fault
handler, the reclaimer attempting to access a PCMD at the
same time it is truncated by the page fault handler. This
could result in lost PCMD data. Data may still be
lost if the reclaimer wins the race, this is addressed in
the following patch.
The reclaimer accesses pages from the backing storage without
holding the enclave mutex and runs the risk of concurrently
accessing the backing storage with the page fault handler that
does access the backing storage with the enclave mutex held.
In the scenario below a PCMD page is truncated from the backing
store after all its pages have been loaded in to the enclave
at the same time the PCMD page is loaded from the backing store
when one of its pages are reclaimed:
sgx_reclaim_pages() { sgx_vma_fault() {
...
mutex_lock(&encl->lock);
...
__sgx_encl_eldu() {
...
if (pcmd_page_empty) {
/*
* EPC page being reclaimed /*
* shares a PCMD page with an * PCMD page truncated
* enclave page that is being * while requested from
* faulted in. * reclaimer.
*/ */
sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page()
}
mutex_unlock(&encl->lock);
} }
In this scenario there is a race between the reclaimer and the page fault
handler when the reclaimer attempts to get access to the same PCMD page
that is being truncated. This could result in the reclaimer writing to
the PCMD page that is then truncated, causing the PCMD data to be lost,
or in a new PCMD page being allocated. The lost PCMD data may still occur
after protecting the backing store access with the mutex - this is fixed
in the next patch. By ensuring the backing store is accessed with the mutex
held the enclave page state can be made accurate with the
SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
is in the process of being reclaimed.
Consistently protect the reclaimer's backing store access with the
enclave's mutex to ensure that it can safely run concurrently with the
page fault handler.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.1652389823.git.reinette.chatre@intel.com
|
|
Recent commit 08999b2489b4 ("x86/sgx: Free backing memory
after faulting the enclave page") expanded __sgx_encl_eldu()
to clear an enclave page's PCMD (Paging Crypto MetaData)
from the PCMD page in the backing store after the enclave
page is restored to the enclave.
Since the PCMD page in the backing store is modified the page
should be marked as dirty to ensure the modified data is retained.
Cc: stable@vger.kernel.org
Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.1652389823.git.reinette.chatre@intel.com
|
|
"regs" seems to generic when there are multiple register spaces, so
rename that one to "vop". Also change "gamma_lut" to better looking
"gamma-lut".
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220511082109.1110043-3-s.hauer@pengutronix.de
|
|
The VOP2 driver relies on reg-names properties, but these are not
documented. Add the missing documentation and make reg-names mandatory.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220511082109.1110043-2-s.hauer@pengutronix.de
|
|
SGX uses shmem backing storage to store encrypted enclave pages
and their crypto metadata when enclave pages are moved out of
enclave memory. Two shmem backing storage pages are associated with
each enclave page - one backing page to contain the encrypted
enclave page data and one backing page (shared by a few
enclave pages) to contain the crypto metadata used by the
processor to verify the enclave page when it is loaded back into
the enclave.
sgx_encl_put_backing() is used to release references to the
backing storage and, optionally, mark both backing store pages
as dirty.
Managing references and dirty status together in this way results
in both backing store pages marked as dirty, even if only one of
the backing store pages are changed.
Additionally, waiting until the page reference is dropped to set
the page dirty risks a race with the page fault handler that
may load outdated data into the enclave when a page is faulted
right after it is reclaimed.
Consider what happens if the reclaimer writes a page to the backing
store and the page is immediately faulted back, before the reclaimer
is able to set the dirty bit of the page:
sgx_reclaim_pages() { sgx_vma_fault() {
...
sgx_encl_get_backing();
... ...
sgx_reclaimer_write() {
mutex_lock(&encl->lock);
/* Write data to backing store */
mutex_unlock(&encl->lock);
}
mutex_lock(&encl->lock);
__sgx_encl_eldu() {
...
/*
* Enclave backing store
* page not released
* nor marked dirty -
* contents may not be
* up to date.
*/
sgx_encl_get_backing();
...
/*
* Enclave data restored
* from backing store
* and PCMD pages that
* are not up to date.
* ENCLS[ELDU] faults
* because of MAC or PCMD
* checking failure.
*/
sgx_encl_put_backing();
}
...
/* set page dirty */
sgx_encl_put_backing();
...
mutex_unlock(&encl->lock);
} }
Remove the option to sgx_encl_put_backing() to set the backing
pages as dirty and set the needed pages as dirty right after
receiving important data while enclave mutex is held. This ensures that
the page fault handler can get up to date data from a page and prepares
the code for a following change where only one of the backing pages
need to be marked as dirty.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Haitao Huang <haitao.huang@intel.com>
Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/
Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.1652389823.git.reinette.chatre@intel.com
|
|
Fix the following sparse warnings:
CHECK security/integrity/platform_certs/keyring_handler.c
security/integrity/platform_certs/keyring_handler.c:76:16: warning: Using plain integer as NULL pointer
security/integrity/platform_certs/keyring_handler.c:91:16: warning: Using plain integer as NULL pointer
security/integrity/platform_certs/keyring_handler.c:106:16: warning: Using plain integer as NULL pointer
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
This reverts commit 1571d67dc190e50c6c56e8f88cdc39f7cc53166e.
This commit broke support for setting interrupt affinity. It looks like
that it is related to the chained IRQ handler. Revert this commit until
issue with setting interrupt affinity is fixed.
Fixes: 1571d67dc190 ("PCI: aardvark: Rewrite IRQ code to chained IRQ handler")
Link: https://lore.kernel.org/r/20220515125815.30157-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Ricardo Martinez says:
====================
net: skb: Remove skb_data_area_size()
This patch series removes the skb_data_area_size() helper,
replacing it in t7xx driver with the size used during skb allocation.
https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/
====================
Link: https://lore.kernel.org/r/20220513173400.3848271-1-ricardo.martinez@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb_data_area_size() is not needed. As Jakub pointed out [1]:
For Rx, drivers can use the size passed during skb allocation or
use skb_tailroom().
For Tx, drivers should use skb_headlen().
[1] https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb_data_area_size() helper was used to calculate the size of the
DMA mapped buffer passed to the HW. Instead of doing this, use the
size passed to allocate the skbs.
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix sec_name memory leak if user defines target-less SEC("tp").
Fixes: 9af8efc45eb1 ("libbpf: Allow "incomplete" basic tracing SEC() definitions")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220516184547.3204674-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The recovery write thread started out as a normal pwrite thread and
when the filesystem was told about potential media error in the
range, filesystem turns the normal pwrite to a dax_recovery_write.
The recovery write consists of clearing media poison, clearing page
HWPoison bit, reenable page-wide read-write permission, flush the
caches and finally write. A competing pread thread will be held
off during the recovery process since data read back might not be
valid, and this is achieved by clearing the badblock records after
the recovery write is complete. Competing recovery write threads
are already serialized by writer lock held by dax_iomap_rw().
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/165247997655.53156.8381418704988035976.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Refactor the pmem_clear_poison() function such that the common
shared code between the typical write path and the recovery write
path is factored out.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20220422224508.440670-7-jane.chu@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Introduce dax_recovery_write() operation. The function is used to
recover a dax range that contains poison. Typical use case is when
a user process receives a SIGBUS with si_code BUS_MCEERR_AR
indicating poison(s) in a dax range, in response, the user process
issues a pwrite() to the page-aligned dax range, thus clears the
poison and puts valid data in the range.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
clk_disable_unprepare() already checks ERROR by using IS_ERR_OR_NULL.
Remove unneeded ERROR check for g->clk.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
Mat Martineau says:
====================
mptcp: Updates for net-next
Three independent fixes/features from the MPTCP tree:
Patch 1 is a selftest workaround for older iproute2 packages.
Patch 2 removes superfluous locks that were added with recent MP_FAIL
patches.
Patch 3 adds support for the TCP_DEFER_ACCEPT sockopt.
====================
Link: https://lore.kernel.org/r/20220514002115.725976-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support this via passthrough to the underlying tcp listener socket.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/271
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 4293248c6704b854bf816aa1967e433402bee11c.
Additional locks are not needed, all the touched sections
are already under mptcp socket lock protection.
Fixes: 4293248c6704 ("mptcp: add data lock for sk timers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Old tc versions (iproute2 5.3) show actions in multiple lines, not a
single line. Then the following unexpected MP_FAIL selftest output
occurs:
file received by server has inverted byte at 169
./mptcp_join.sh: line 1277: [: [{"total acts":1},{"actions":[{"order":0 pedit ,"control_action":{"type":"pipe"}keys 1
index 1 ref 1 bind 1,"installed":0,"last_used":0
key #0 at 148: val ff000000 mask ffffffff
5: integer expression expected
001 Infinite map syn[ ok ] - synack[ ok ] - ack[ ok ]
sum[ ok ] - csum [ ok ]
ftx[ ok ] - failrx[ ok ]
rtx[ ok ] - rstrx [ ok ]
itx[ ok ] - infirx[ ok ]
ftx[ ok ] - failrx[ ok ] invert
This patch adds a 'grep' before 'sed' to fix this.
Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Two issues were observed in the ReST doc added by commit c3a0addefbde
("docs: ctucanfd: CTU CAN FD open-source IP core documentation.")
with Sphinx versions 2.4.4 and 4.5.0.
The plain "figure" directive broke "make pdfdocs" due to a missing
PDF figure. For conversion of SVG -> PDF to work, the "kernel-figure"
directive, which is an extension for kernel documentation, should
be used instead.
The directive of "code:: raw" causes a warning from both
"make htmldocs" and "make pdfdocs", which reads:
[...]/can/ctu/ctucanfd-driver.rst:75: WARNING: Pygments lexer name
'raw' is not known
A plain literal-block marker should suffice where no syntax
highlighting is intended.
Fix the issues by using suitable directive and marker.
Fixes: c3a0addefbde ("docs: ctucanfd: CTU CAN FD open-source IP core documentation.")
Link: https://lore.kernel.org/all/5986752a-1c2a-5d64-f91d-58b1e6decd17@gmail.com
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Martin Jerabek <martin.jerabek01@gmail.com>
Cc: Ondrej Ille <ondrej.ille@gmail.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Since commit
| 1f9234401ce0 ("dt-bindings: can: add can-controller.yaml")
there is a common CAN controller binding. Add this to the ctucanfd
binding.
Cc: Ondrej Ille <ondrej.ille@gmail.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Lock the regmap during the whole PHY register access routines in
rtl8366rb.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220513213618.2742895-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
delta_ns is a s64, but it was being passed ptp_ocp_adjtime_coarse
as an u64. Also, it turns out that timespec64_add_ns() only handles
positive values, so perform the math with set_normalized_timespec().
Fixes: 90f8f4c0e3ce ("ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments")
Suggested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/20220513225231.1412-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Renesas R-Car CAN FD Controller always uses two or more interrupts.
Make the interrupt-names properties a required property, to make it
easier to identify the individual interrupts.
Update the example accordingly.
Link: https://lore.kernel.org/all/a68e65955e0df4db60233d468f348203c2e7b940.1651512451.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
slcan does a manual check in slc_xmit() to verify if the skb is valid.
This check is incomplete, use instead can_dropped_invalid_skb().
Link: https://lore.kernel.org/all/20220514141650.1109542-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The CTU CAN-FD IP core is only useful when used with one of the
corresponding PCI/PCIe or platform (FPGA, SoC) drivers, which depend on
PCI resp. OF.
Hence make the users select the core driver code, instead of letting
then depend on it. Keep the core code config option visible when
compile-testing, to maintain compile-coverage.
Link: https://lore.kernel.org/all/887b7440446b6244a20a503cc6e8dc9258846706.1652104941.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|