Age | Commit message (Collapse) | Author |
|
Use virt_to_dma64() and friends to properly convert virtual to physical and
physical to virtual addresses so that "make C=1" does not generate any
warnings anymore.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use virt_to_dma32() and friends to properly convert virtual to physical and
physical to virtual addresses so that "make C=1" does not generate any
warnings anymore.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion. This does not fix a bug since
virtual and physical address spaces are currently the same.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Only the last 12 bits of virtual / physical addresses are used when masking
with IDA_BLOCK_SIZE - 1. Given that the bits are the same regardless of
virtual or physical address, remove the virtual to physical address
conversion.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Adjust coding style, partially refactor code, and use kcalloc()
instead of kmalloc() to allocate an idaw array.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use virt_to_dma32() and friends to properly convert virtual to physical and
physical to virtual addresses so that "make C=1" does not generate any
warnings anymore.
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Only the last 12 bits of virtual / physical addresses are used when masking
with IDA_BLOCK_SIZE - 1. Given that the bits are the same regardless of
virtual or physical address, remove the virtual to physical address
conversion.
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Instead of converting virtual to physical addresses with the virt_to_dma*()
functions, use dma addresses as provided by DMA API and only add offsets to
these addresses. This makes sure that address conversion is only done by
the DMA API.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Change and use ccw_device_dma_zalloc() so it returns a virtual address like
before, which can be used to access data. However also pass a new dma32_t
pointer type handle, which correlates to the returned virtual address.
This pointer is used to directly pass/set the DMA handle as returned by the
DMA API.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion and use new dma types and helper
functions to allow for type checking. This does not fix a bug since virtual
and physical address spaces are currently the same.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Change types of I/O structure members which contain physical addresses to
dma32_t and dma64_t bitwise types.
This allows to make use of sparse (aka "make C=1") to find incorrect usage
of physical addresses.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Introduce dma32_t and dma64_t bitwise types, which are supposed to be used
for 31 and 64 bit DMA capable addresses. This allows to use sparse (make
C=1) for type checking, so that incorrect usages can be easily found.
Also add a couple of helper functions which
- convert virtual to DMA addresses and vice versa
- allow for simple logical and arithmetic operations on DMA addresses
- convert DMA addresses to plain u32 and u64 values
All helper functions exist to avoid excessive casting in C code.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
dax_direct_access() should receive a virtual kernel address in kaddr.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix IUCV_IPBUFLST-type buffers virtual vs physical address confusion.
This does not fix a bug since virtual and physical address spaces are
currently the same.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Tests with hot-plugging crytpo cards on KVM guests with debug
kernel build revealed an use after free for the load field of
the struct zcrypt_card. The reason was an incorrect reference
handling of the zcrypt card object which could lead to a free
of the zcrypt card object while it was still in use.
This is an example of the slab message:
kernel: 0x00000000885a7512-0x00000000885a7513 @offset=1298. First byte 0x68 instead of 0x6b
kernel: Allocated in zcrypt_card_alloc+0x36/0x70 [zcrypt] age=18046 cpu=3 pid=43
kernel: kmalloc_trace+0x3f2/0x470
kernel: zcrypt_card_alloc+0x36/0x70 [zcrypt]
kernel: zcrypt_cex4_card_probe+0x26/0x380 [zcrypt_cex4]
kernel: ap_device_probe+0x15c/0x290
kernel: really_probe+0xd2/0x468
kernel: driver_probe_device+0x40/0xf0
kernel: __device_attach_driver+0xc0/0x140
kernel: bus_for_each_drv+0x8c/0xd0
kernel: __device_attach+0x114/0x198
kernel: bus_probe_device+0xb4/0xc8
kernel: device_add+0x4d2/0x6e0
kernel: ap_scan_adapter+0x3d0/0x7c0
kernel: ap_scan_bus+0x5a/0x3b0
kernel: ap_scan_bus_wq_callback+0x40/0x60
kernel: process_one_work+0x26e/0x620
kernel: worker_thread+0x21c/0x440
kernel: Freed in zcrypt_card_put+0x54/0x80 [zcrypt] age=9024 cpu=3 pid=43
kernel: kfree+0x37e/0x418
kernel: zcrypt_card_put+0x54/0x80 [zcrypt]
kernel: ap_device_remove+0x4c/0xe0
kernel: device_release_driver_internal+0x1c4/0x270
kernel: bus_remove_device+0x100/0x188
kernel: device_del+0x164/0x3c0
kernel: device_unregister+0x30/0x90
kernel: ap_scan_adapter+0xc8/0x7c0
kernel: ap_scan_bus+0x5a/0x3b0
kernel: ap_scan_bus_wq_callback+0x40/0x60
kernel: process_one_work+0x26e/0x620
kernel: worker_thread+0x21c/0x440
kernel: kthread+0x150/0x168
kernel: __ret_from_fork+0x3c/0x58
kernel: ret_from_fork+0xa/0x30
kernel: Slab 0x00000372022169c0 objects=20 used=18 fp=0x00000000885a7c88 flags=0x3ffff00000000a00(workingset|slab|node=0|zone=1|lastcpupid=0x1ffff)
kernel: Object 0x00000000885a74b8 @offset=1208 fp=0x00000000885a7c88
kernel: Redzone 00000000885a74b0: bb bb bb bb bb bb bb bb ........
kernel: Object 00000000885a74b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
kernel: Object 00000000885a74c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
kernel: Object 00000000885a74d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
kernel: Object 00000000885a74e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
kernel: Object 00000000885a74f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
kernel: Object 00000000885a7508: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 68 4b 6b 6b 6b a5 kkkkkkkkkkhKkkk.
kernel: Redzone 00000000885a7518: bb bb bb bb bb bb bb bb ........
kernel: Padding 00000000885a756c: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZ
kernel: CPU: 0 PID: 387 Comm: systemd-udevd Not tainted 6.8.0-HF #2
kernel: Hardware name: IBM 3931 A01 704 (KVM/Linux)
kernel: Call Trace:
kernel: [<00000000ca5ab5b8>] dump_stack_lvl+0x90/0x120
kernel: [<00000000c99d78bc>] check_bytes_and_report+0x114/0x140
kernel: [<00000000c99d53cc>] check_object+0x334/0x3f8
kernel: [<00000000c99d820c>] alloc_debug_processing+0xc4/0x1f8
kernel: [<00000000c99d852e>] get_partial_node.part.0+0x1ee/0x3e0
kernel: [<00000000c99d94ec>] ___slab_alloc+0xaf4/0x13c8
kernel: [<00000000c99d9e38>] __slab_alloc.constprop.0+0x78/0xb8
kernel: [<00000000c99dc8dc>] __kmalloc+0x434/0x590
kernel: [<00000000c9b4c0ce>] ext4_htree_store_dirent+0x4e/0x1c0
kernel: [<00000000c9b908a2>] htree_dirblock_to_tree+0x17a/0x3f0
kernel: [<00000000c9b919dc>] ext4_htree_fill_tree+0x134/0x400
kernel: [<00000000c9b4b3d0>] ext4_dx_readdir+0x160/0x2f0
kernel: [<00000000c9b4bedc>] ext4_readdir+0x5f4/0x760
kernel: [<00000000c9a7efc4>] iterate_dir+0xb4/0x280
kernel: [<00000000c9a7f1ea>] __do_sys_getdents64+0x5a/0x120
kernel: [<00000000ca5d6946>] __do_syscall+0x256/0x310
kernel: [<00000000ca5eea10>] system_call+0x70/0x98
kernel: INFO: lockdep is turned off.
kernel: FIX kmalloc-96: Restoring Poison 0x00000000885a7512-0x00000000885a7513=0x6b
kernel: FIX kmalloc-96: Marking all objects used
The fix is simple: Before use of the queue not only the queue object
but also the card object needs to increase it's reference count
with a call to zcrypt_card_get(). Similar after use of the queue
not only the queue but also the card object's reference count is
decreased with zcrypt_card_put().
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Current average steal timer calculation produces volatile and inflated
values. The only user of this value is KVM so far and it uses that to
decide whether or not to yield the vCPU which is seeing steal time.
KVM compares average steal timer to a threshold and if the threshold
is past then it does not allow CPU polling and yields it to host, else
it keeps the CPU by polling.
Since KVM's steal time threshold is very low by default (%10) it most
likely is not effected much by the bloated average steal timer values
because the operating region is pretty small. However there might be
new users in the future who might rely on this number. Fix average
steal timer calculation by changing the formula from:
avg_steal_timer = avg_steal_timer / 2 + steal_timer;
to the following:
avg_steal_timer = (avg_steal_timer + steal_timer) / 2;
This ensures that avg_steal_timer is actually a naive average of steal
timer values. It now closely follows steal timer values but of course
in a smoother manner.
Fixes: 152e9b8676c6 ("s390/vtime: steal time exponential moving average")
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
As provided with commit cd4386a931b63 ("s390/cpcmd,vmcp: avoid GFP_DMA
allocations") the Diagnose Code 8 response buffer does not have to be
below 2GB.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use wake_up API instead of wake_up_interruptible, since
wait_event_timeout API is used for waiting on command completion.
Fixes: 1463f382f58d ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smp_call_function always runs its callback in hard IRQ context, even on
PREEMPT_RT, where spinlocks can sleep. So we need to use a raw spinlock
for cgr_lock to ensure we aren't waiting on a sleeping task.
Although this bug has existed for a while, it was not apparent until
commit ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate change")
which invokes smp_call_function_single via qman_update_cgr_safe every
time a link goes up or down.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
CC: stable@vger.kernel.org
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Closes: https://lore.kernel.org/all/20230323153935.nofnjucqjqnz34ej@skbuf/
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Closes: https://lore.kernel.org/linux-arm-kernel/87wmsyvclu.fsf@pengutronix.de/
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smp_call_function_single disables IRQs when executing the callback. To
prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
This is already done by qman_update_cgr and qman_delete_cgr; fix the
other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
CC: stable@vger.kernel.org
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The runtime_pm handling seems to have been loosely inspired by the
cs32l41 driver, but in this case the get_noresume/put sequence is not
required.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Message-ID: <20240312161217.79510-1-pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Some HP laptops have received revisions that altered their board IDs
and therefore the current patches/quirks do not apply to them.
Specifically, for my Probook 440 G8, I have a board ID of 8a74.
It is necessary to add a line for that specific model.
Signed-off-by: Valentine Altair <faetalize@proton.me>
Cc: <stable@vger.kernel.org>
Message-ID: <kOqXRBcxkKt6m5kciSDCkGqMORZi_HB3ZVPTX5sD3W1pKxt83Pf-WiQ1V1pgKKI8pYr4oGvsujt3vk2zsCE-DDtnUADFG6NGBlS5N3U4xgA=@proton.me>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.
|
|
Pick up support for injecting errors via ACPI EINJ into the CXL protocol
for v6.9.
|
|
Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT
from CXL CDAT and PCIe switch information.
|
|
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.
If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:
[ 48.691717] Malformed DSMAS table length: (24:0)
[ 48.702084] [CDAT:0x00] Invalid zero length
[ 48.711460] cxl_port endpoint1: Failed to parse CDAT: -22
In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.
Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.
Add a check to warn about a malformed CDAT table length.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Reading the CDAT table using DOE requires a Table Access Response
Header in addition to the CDAT entry. In current implementation this
has caused offsets with sizeof(__le32) to the actual buffers. This led
to hardly readable code and even bugs. E.g., see fix of devm_kfree()
in read_cdat_data():
commit c65efe3685f5 ("cxl/cdat: Free correct buffer on checksum error")
Rework code to avoid calculations with sizeof(__le32). Introduce
struct cdat_doe_rsp for this which contains the Table Access Response
Header and a variable payload size for various data structures
afterwards to access the CDAT table and its CDAT Data Structures
without recalculating buffer offsets.
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Fan Ni <nifan.cxl@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-3-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Trivial variable rename for the DOE mailbox handle from cdat_doe to
doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that
is commonly used for the mailbox.
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.
The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.
The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.
Fixes: 80aa780dda20 ("cxl: Add callback to parse the SSLBIS subtable from CDAT")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20240301210948.1298075-1-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Export CXL helper functions in einj-cxl.c for getting/injecting
available CXL protocol error types to sysfs under kernel/debug/cxl.
The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the einj module. The
kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same
as the error_type and error_inject files provided by the EINJ module,
i.e.: writing an error type into $dport_dev/einj_inject will inject
said error type into the CXL dport represented by $dport_dev.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-4-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Update EINJ kernel document to include how to inject CXL protocol error
types, build the kernel to include CXL error types, and give an example
injection.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-5-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Move CXL protocol error types from einj.c (now einj-core.c) to einj-cxl.c.
einj-cxl.c implements the necessary handling for CXL protocol error
injection and exposes an API for the CXL core to use said functionality,
while also allowing the EINJ module to be built without CXL support.
Because CXL error types targeting CXL 1.0/1.1 ports require special
handling, only allow them to be injected through the new cxl debugfs
interface (next commit) and return an error when attempting to inject
through the legacy interface.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-3-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Change the EINJ module to install a platform device/driver on module
init and move the module init() and exit() functions to driver probe and
remove. This change allows the EINJ module to load regardless of whether
setting up EINJ succeeds, which allows dependent modules to still load
(i.e. the CXL core).
Since EINJ may no longer be initialized when the module loads, any
functions that are called from dependent/external modules should
safegaurd against the case EINJ didn't load.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-2-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
"Improve the behavior during panic. The issues were found when testing
the ongoing changes introducing atomic consoles and printk kthreads:
- pr_flush() has to wait for the last reserved record instead of the
last finalized one. Note that records are finalized in random order
when generated by more CPUs in parallel.
- Ignore non-finalized records during panic(). Messages printed on
panic-CPU are always finalized. Messages printed by other CPUs
might never be finalized when the CPUs get stopped.
- Block new printk() calls on non-panic CPUs completely. Backtraces
are printed before entering the panic mode. Later messages would
just mess information printed by the panic CPU.
- Do not take console_lock in console_flush_on_panic() at all. The
original code did try_lock()/console_unlock(). The unlock part
might cause a deadlock when panic() happened in a scheduler code.
- Fix conversion of 64-bit sequence number for 32-bit atomic
operations"
* tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
dump_stack: Do not get cpu_sync for panic CPU
panic: Flush kernel log buffer at the end
printk: Avoid non-panic CPUs writing to ringbuffer
printk: Disable passing console lock owner completely during panic()
printk: ringbuffer: Skip non-finalized records in panic
printk: Wait for all reserved records with pr_flush()
printk: ringbuffer: Cleanup reader terminology
printk: Add this_cpu_in_panic()
printk: For @suppress_panic_printk check for other CPU in panic
printk: ringbuffer: Clarify special lpos values
printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()
printk: Use prb_first_seq() as base for 32bit seq macros
printk: Adjust mapping for 32bit seq macros
printk: nbcon: Relocate 32bit seq macros
|
|
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.
This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.
Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Freelist loading optimization (Chengming Zhou)
When the per-cpu slab is depleted and a new one loaded from the cpu
partial list, optimize the loading to avoid an irq enable/disable
cycle. This results in a 3.5% performance improvement on the "perf
bench sched messaging" test.
- Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)
Due to two different main slab implementations we've had boot
parameters prefixed either slab_ and slub_ with some later becoming
an alias as both implementations gained the same functionality (i.e.
slab_nomerge vs slub_nomerge). In order to eventually get rid of the
implementation-specific names, the canonical and documented
parameters are now all prefixed slab_ and the slub_ variants become
deprecated but still working aliases.
- SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka)
The flags had hardcoded #define values which became tedious and
error-prone when adding new ones. Assign the values via an enum that
takes care of providing unique bit numbers. Also deprecate
SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
SLAB removal. Assign it an explicit zero value. The removals of the
flag usage are handled independently in the respective subsystems,
with a final removal of any leftover usage planned for the next
release.
- Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian)
Includes removal of unused code or function parameters and a fix of a
memleak.
* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: remove PARTIAL_NODE slab_state
mm, slab: remove memcg_from_slab_obj()
mm, slab: remove the corner case of inc_slabs_node()
mm/slab: Fix a kmemleak in kmem_cache_destroy()
mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
mm, slab: use an enum to define SLAB_ cache creation flags
mm, slab: deprecate SLAB_MEM_SPREAD flag
mm, slab: fix the comment of cpu partial list
mm, slab: remove unused object_size parameter in kmem_cache_flags()
mm/slub: remove parameter 'flags' in create_kmalloc_caches()
mm/slub: remove unused parameter in next_freelist_entry()
mm/slub: remove full list manipulation for non-debug slab
mm/slub: directly load freelist from cpu partial slab in the likely case
mm/slub: make the description of slab_min_objects helpful in doc
mm/slub: replace slub_$params with slab_$params in slub.rst
mm/slub: unify all sl[au]b parameters with "slab_$param"
Documentation: kernel-parameters: remove noaliencache
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Promote IMA/EVM to a proper LSM
This is the bulk of the diffstat, and the source of all the changes
in the VFS code. Prior to the start of the LSM stacking work it was
important that IMA/EVM were separate from the rest of the LSMs,
complete with their own hooks, infrastructure, etc. as it was the
only way to enable IMA/EVM at the same time as a LSM.
However, now that the bulk of the LSM infrastructure supports
multiple simultaneous LSMs, we can simplify things greatly by
bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
something I've wanted to see happen for quite some time and Roberto
was kind enough to put in the work to make it happen.
- Use the LSM hook default values to simplify the call_int_hook() macro
Previously the call_int_hook() macro required callers to supply a
default return value, despite a default value being specified when
the LSM hook was defined.
This simplifies the macro by using the defined default return value
which makes life easier for callers and should also reduce the number
of return value bugs in the future (we've had a few pop up recently,
hence this work).
- Use the KMEM_CACHE() macro instead of kmem_cache_create()
The guidance appears to be to use the KMEM_CACHE() macro when
possible and there is no reason why we can't use the macro, so let's
use it.
- Fix a number of comment typos in the LSM hook comment blocks
Not much to say here, we fixed some questionable grammar decisions in
the LSM hook comment blocks.
* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
cred: Use KMEM_CACHE() instead of kmem_cache_create()
lsm: use default hook return value in call_int_hook()
lsm: fix typos in security/security.c comment headers
integrity: Remove LSM
ima: Make it independent from 'integrity' LSM
evm: Make it independent from 'integrity' LSM
evm: Move to LSM infrastructure
ima: Move IMA-Appraisal to LSM infrastructure
ima: Move to LSM infrastructure
integrity: Move integrity_kernel_module_request() to IMA
security: Introduce key_post_create_or_update hook
security: Introduce inode_post_remove_acl hook
security: Introduce inode_post_set_acl hook
security: Introduce inode_post_create_tmpfile hook
security: Introduce path_post_mknod hook
security: Introduce file_release hook
security: Introduce file_post_open hook
security: Introduce inode_post_removexattr hook
security: Introduce inode_post_setattr hook
security: Align inode_setattr hook definition with EVM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Really only a few notable changes:
- Continue the coding style/formatting fixup work
This is the bulk of the diffstat in this pull request, with the
focus this time around being the security/selinux/ss directory.
We've only got a couple of files left to cleanup and once we're
done with that we can start enabling some automatic style
verfication and introduce tooling to help new folks format their
code correctly.
- Don't restrict xattr copy-up when SELinux policy is not loaded
This helps systems that use overlayfs, or similar filesystems,
preserve their SELinux labels during early boot when the SELinux
policy has yet to be loaded.
- Reduce the work we do during inode initialization time
This isn't likely to show up in any benchmark results, but we
removed an unnecessary SELinux object class lookup/calculation
during inode initialization.
- Correct the return values in selinux_socket_getpeersec_dgram()
We had some inconsistencies with respect to our return values
across selinux_socket_getpeersec_dgram() and
selinux_socket_getpeersec_stream().
This provides a more uniform set of error codes across the two
functions and should help make it easier for users to identify
the source of a failure"
* tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (24 commits)
selinux: fix style issues in security/selinux/ss/symtab.c
selinux: fix style issues in security/selinux/ss/symtab.h
selinux: fix style issues in security/selinux/ss/sidtab.c
selinux: fix style issues in security/selinux/ss/sidtab.h
selinux: fix style issues in security/selinux/ss/services.h
selinux: fix style issues in security/selinux/ss/policydb.c
selinux: fix style issues in security/selinux/ss/policydb.h
selinux: fix style issues in security/selinux/ss/mls_types.h
selinux: fix style issues in security/selinux/ss/mls.c
selinux: fix style issues in security/selinux/ss/mls.h
selinux: fix style issues in security/selinux/ss/hashtab.c
selinux: fix style issues in security/selinux/ss/hashtab.h
selinux: fix style issues in security/selinux/ss/ebitmap.c
selinux: fix style issues in security/selinux/ss/ebitmap.h
selinux: fix style issues in security/selinux/ss/context.h
selinux: fix style issues in security/selinux/ss/context.h
selinux: fix style issues in security/selinux/ss/constraint.h
selinux: fix style issues in security/selinux/ss/conditional.c
selinux: fix style issues in security/selinux/ss/conditional.h
selinux: fix style issues in security/selinux/ss/avtab.c
...
|
|
Kuniyuki Iwashima says:
====================
tcp/rds: Fix use-after-free around kernel TCP reqsk.
syzkaller reported an warning of netns ref tracker for RDS TCP listener,
which commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in
inet_twsk_purge()") fixed for per-netns ehash.
This series fixes the bug in the partial fix and fixes the reported bug
in the global ehash.
v4: https://lore.kernel.org/netdev/20240307232151.55963-1-kuniyu@amazon.com/
v3: https://lore.kernel.org/netdev/20240307224423.53315-1-kuniyu@amazon.com/
v2: https://lore.kernel.org/netdev/20240227011041.97375-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20240223172448.94084-1-kuniyu@amazon.com/
====================
Link: https://lore.kernel.org/r/20240308200122.64357-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a warning of netns tracker [0] followed by KASAN
splat [1] and another ref tracker warning [1].
syzkaller could not find a repro, but in the log, the only suspicious
sequence was as follows:
18:26:22 executing program 1:
r0 = socket$inet6_mptcp(0xa, 0x1, 0x106)
...
connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async)
The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT.
So, the scenario would be:
1. unshare(CLONE_NEWNET) creates a per netns tcp listener in
rds_tcp_listen_init().
2. syz-executor connect()s to it and creates a reqsk.
3. syz-executor exit()s immediately.
4. netns is dismantled. [0]
5. reqsk timer is fired, and UAF happens while freeing reqsk. [1]
6. listener is freed after RCU grace period. [2]
Basically, reqsk assumes that the listener guarantees netns safety
until all reqsk timers are expired by holding the listener's refcount.
However, this was not the case for kernel sockets.
Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in
inet_twsk_purge()") fixed this issue only for per-netns ehash.
Let's apply the same fix for the global ehash.
[0]:
ref_tracker: net notrefcnt@0000000065449cc3 has 1/1 users at
sk_alloc (./include/net/net_namespace.h:337 net/core/sock.c:2146)
inet6_create (net/ipv6/af_inet6.c:192 net/ipv6/af_inet6.c:119)
__sock_create (net/socket.c:1572)
rds_tcp_listen_init (net/rds/tcp_listen.c:279)
rds_tcp_init_net (net/rds/tcp.c:577)
ops_init (net/core/net_namespace.c:137)
setup_net (net/core/net_namespace.c:340)
copy_net_ns (net/core/net_namespace.c:497)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3429)
__x64_sys_unshare (kernel/fork.c:3496)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
...
WARNING: CPU: 0 PID: 27 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
[1]:
BUG: KASAN: slab-use-after-free in inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966)
Read of size 8 at addr ffff88801b370400 by task swapper/0/0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
print_report (mm/kasan/report.c:378 mm/kasan/report.c:488)
kasan_report (mm/kasan/report.c:603)
inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966)
reqsk_timer_handler (net/ipv4/inet_connection_sock.c:979 net/ipv4/inet_connection_sock.c:1092)
call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
__run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2038)
run_timer_softirq (kernel/time/timer.c:2053)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
</IRQ>
Allocated by task 258 on cpu 0 at 83.612050s:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (mm/kasan/common.c:68)
__kasan_slab_alloc (mm/kasan/common.c:343)
kmem_cache_alloc (mm/slub.c:3813 mm/slub.c:3860 mm/slub.c:3867)
copy_net_ns (./include/linux/slab.h:701 net/core/net_namespace.c:421 net/core/net_namespace.c:480)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3429)
__x64_sys_unshare (kernel/fork.c:3496)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
Freed by task 27 on cpu 0 at 329.158864s:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (mm/kasan/common.c:68)
kasan_save_free_info (mm/kasan/generic.c:643)
__kasan_slab_free (mm/kasan/common.c:265)
kmem_cache_free (mm/slub.c:4299 mm/slub.c:4363)
cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:639)
process_one_work (kernel/workqueue.c:2638)
worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787)
kthread (kernel/kthread.c:388)
ret_from_fork (arch/x86/kernel/process.c:153)
ret_from_fork_asm (arch/x86/entry/entry_64.S:250)
The buggy address belongs to the object at ffff88801b370000
which belongs to the cache net_namespace of size 4352
The buggy address is located 1024 bytes inside of
freed 4352-byte region [ffff88801b370000, ffff88801b371100)
[2]:
WARNING: CPU: 0 PID: 95 at lib/ref_tracker.c:228 ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1))
Modules linked in:
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1))
...
Call Trace:
<IRQ>
__sk_destruct (./include/net/net_namespace.h:353 net/core/sock.c:2204)
rcu_core (./arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2165 kernel/rcu/tree.c:2433)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
</IRQ>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240308200122.64357-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_twsk_purge() uses rcu to find TIME_WAIT and NEW_SYN_RECV
objects to purge.
These objects use SLAB_TYPESAFE_BY_RCU semantic and need special
care. We need to use refcount_inc_not_zero(&sk->sk_refcnt).
Reuse the existing correct logic I wrote for TIME_WAIT,
because both structures have common locations for
sk_state, sk_family, and netns pointer.
If after the refcount_inc_not_zero() the object fields longer match
the keys, use sock_gen_put(sk) to release the refcount.
Then we can call inet_twsk_deschedule_put() for TIME_WAIT,
inet_csk_reqsk_queue_drop_and_put() for NEW_SYN_RECV sockets,
with BH disabled.
Then we need to restart the loop because we had drop rcu_read_lock().
Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()")
Link: https://lore.kernel.org/netdev/CANn89iLvFuuihCtt9PME2uS1WJATnf5fKjDToa1WzVnRzHnPfg@mail.gmail.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240308200122.64357-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 2beb81fbf0c01a62515a1bcef326168494ee2bd0.
While removing CONFIG_CRYPTO_STATS is a worthy goal, this also
removed unrelated infrastructure such as crypto_comp_alg_common.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Since atomic write way was changed to out-place-update, we should
prevent it on pinned files.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
{new,change}_curseg() may return error in some special cases,
error handling should be did in their callers, and this will also
facilitate subsequent error path expansion in {new,change}_curseg().
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There are some cases of f2fs_is_valid_blkaddr not handled as
ERROR_INVALID_BLKADDR,so unify the error handling about all of
f2fs_is_valid_blkaddr.
Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned
device") missed to remove pow2 check condition in init_blkz_info(),
fix it.
Fixes: 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device")
Signed-off-by: Feng Song <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Below race case can cause data corruption:
Thread A GC thread
- gc_data_segment
- ra_data_block
- locked meta_inode page
- f2fs_inplace_write_data
- invalidate_mapping_pages
: fail to invalidate meta_inode page
due to lock failure or dirty|writeback
status
- f2fs_submit_page_bio
: write last dirty data to old blkaddr
- move_data_block
- load old data from meta_inode page
- f2fs_submit_page_write
: write old data to new blkaddr
Because invalidate_mapping_pages() will skip invalidating page which
has unclear status including locked, dirty, writeback and so on, so
we need to use truncate_inode_pages_range() instead of
invalidate_mapping_pages() to make sure meta_inode page will be dropped.
Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC")
Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|