Age | Commit message (Collapse) | Author |
|
both syscall.c and helpers.c have the declaration of
__bpf_obj_drop_impl(), so just move it to a common header file.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For bpf_global_percpu_ma, the pointer passed to bpf_mem_free_rcu() is
allocated by kmalloc() and its size is fixed (16-bytes on x86-64). So
no matter which cache allocates the dynamic per-cpu area, on x86-64
cache[2] will always be used to free the per-cpu area.
Fix the unbalance by checking whether the bpf memory allocator is
per-cpu or not and use pcpu_alloc_size() instead of ksize() to
find the correct cache for per-cpu free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu
area. It will be used by bpf memory allocator in the following patches.
BPF memory allocator maintains per-cpu area caches for multiple area
sizes and its free API only has the to-be-freed per-cpu pointer, so it
needs the size of dynamic per-cpu area to select the corresponding cache
when bpf program frees the dynamic per-cpu pointer.
Acked-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The comment for seq_buf_has_overflowed() says that an overflow condition is
marked by len == size, but that's not what the code is testing. Make the
comment match reality.
Link: https://lkml.kernel.org/r/87pm19kp0m.fsf@meer.lwn.net
Fixes: 8cd709ae7658a ("tracing: Have seq_buf use full buffer")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"In the raw NAND subsystem, the major fix prevents using cached reads
with devices not supporting it. There was two bug reports about this.
Apart from that, three drivers (pl353, arasan and marvell) could
sometimes hide page program failures due to their their own program
page helper not being fully compliant with the specification (many
drivers use the default helpers shared by the core). Adding a missing
check prevents these situation.
Finally, the Qualcomm driver had a broken error path.
In the SPI-NAND subsystem one Micron device used a wrong bitmak
reporting possibly corrupted ECC status.
Finally, the physmap-core got stripped from its map_rom fallback by
mistake, this feature is added back"
* tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: Ensure the nand chip supports cached reads
mtd: rawnand: qcom: Unmap the right resource upon probe failure
mtd: rawnand: pl353: Ensure program page operations are successful
mtd: rawnand: arasan: Ensure program page operations are successful
mtd: spinand: micron: correct bitmask for ecc status
mtd: physmap-core: Restore map_rom fallback
mtd: rawnand: marvell: Ensure program page operations are successful
|
|
Modify the governor .throttle() callback definition so that it takes a
trip pointer instead of a trip index as its second argument, adjust the
governors accordingly and update the core code invoking .throttle().
This causes the governors to become independent of the representation
of the list of trips in the thermal zone structure.
This change is not expected to alter the general functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
To make seq_buf more lightweight as a string buf, move the readpos member
from seq_buf to its container, trace_seq. That puts the responsibility
of maintaining the readpos entirely in the tracing code. If some future
users want to package up the readpos with a seq_buf, we can define a
new struct then.
Link: https://lore.kernel.org/linux-trace-kernel/20231020033545.2587554-2-willy@infradead.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit 26c5334d344d ("ethtool: Add forced speed to supported link
modes maps") added a dependency between ethtool.h and linkmode.h.
The dependency in the opposite direction already exists so the
new code was inserted in an awkward place.
The reason for ethtool.h to include linkmode.h, is that
ethtool_forced_speed_maps_init() is a static inline helper.
That's not really necessary.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provide access to MIIM PHY Control register (Reg. 31) through
ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for
upcoming micrel.c patch to address forced link mode configuration.
Closes: https://lore.kernel.org/oe-kbuild-all/202310112224.iYgvjBUy-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CONFIG_CPU_SRSO isn't dependent on CONFIG_CPU_UNRET_ENTRY (AMD
Retbleed), so the two features are independently configurable. Fix
several issues for the (presumably rare) case where CONFIG_CPU_SRSO is
enabled but CONFIG_CPU_UNRET_ENTRY isn't.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/299fb7740174d0f2335e91c58af0e9c242b4bac1.1693889988.git.jpoimboe@kernel.org
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: 1st set of new device support, features and cleanup for 6.7
Particularly great to see a resolver driver move out of staging via a
massive set of changes. Only took 13 years :)
One small patch added then reverted due to a report of test breakage
(ashai-kasei,ak8975: Drop deprecated enums.)
An immutable branch was used for some hid-senors changes in case
there was a need to take them into the HID tree as well.
New device support
-----------------
adi,hmc425a
- Add support for HMC540SLP3E broadband 4-bit digital attenuator.
kionix,kx022a
- Add support for the kx132-1211 accelerometer. Require significant
driver rework to enable this including add a chip type specific
structure to deal with the chip differences.
- Add support for the kx132acr-lbz accelerometer (subset of the kx022a
feature set).
lltc,ltc2309
- New driver for this 8 channel ADC.
microchip,mcp3911
- Add support for rest of mcp391x family of ADCs (there are various
differences beyond simple channel count variation.
Series includes some general driver cleanup.
microchip,mcp3564
- New driver for MCP3461, MCP3462, MCP3464, MCP3541, MCP3562, MCP3564
and their R variants of 16/24bit ADCs. A few minor fixed followed.
rohm,bu1390
- New driver for this pressure sensor.
Staging graduation
------------------
adi,ad1210 (after 13 or so years :)
- More or less a complete (step-wise) rewrite of this resolver driver
to bring it up to date with modern IIO standards. The fault signal
handling mapping to event channels was particularly complex and
significant part of the changes.
Features
--------
iio-core
- Add chromacity and color temperature channel types.
adi,ad7192
- Oversampling ratio control (called fast settling in datasheet).
adi,adis16475
- Add core support and then driver support for delta angle and delta
velocity channels. These are intended for summation to establish
angle and velocity changes over larger timescales. Fix was
needed for alignment after the temperature channel. Further fix
reduced set of devices for which the buffer support was applicable
as seems burst reads don't cover these on all devices.
hid-sensors-als
- Chromacity and color temperatures support including in amd sfh.
stx104
- Add support for counter subsystem to this multipurpose device.
ti,twl6030
- Add missing device tree binding description.
Clean up and minor fixes.
------------------------
treewide
- Drop some unused declarations across IIO.
- Make more use of device_get_match_data() instead of OF specific
approaches.
Similar cleanup to sets of drivers.
- Stop platform remove callbacks returning anything by using the
temporary remove_new() callback.
- Use i2c_get_match_data() to cope nicely with all types of ID table
entry.
- Use device_get_match_data() for various platform device to cope
with more types of firmware.
- Convert from enum to pointer in ID tables allowing use of
i2c_get_match_data().
- Fix sorting on some ID tables.
- Include specific string helper headers rather than simply string_helpers.h
docs
- Better description of the ordering requirements etc for
available_scan_masks.
tools
- Handle alignment of mixed sizes where the last element isn't the biggest
correctly. Seems that doesn't happen often!
adi,ad2s1210
- Lots of work from David Lechner on this driver including a few fixes
that are going with the rework to avoid slowing that down.
adi,ad4310
- Replace deprecated devm_clk_register()
adi,ad74413r
- Bring the channel function setting inline with the datasheet.
adi,ad7192
- Change to FIELD_PREP(), FIELD_GET().
- Calculate f_order from the sinc filter and chop filter states.
- Move more per chip config into data in struct ad7192_chip_info
- Cleanup unused parameter in channel macros.
adi,adf4350
- Make use of devm_* to simplify error handling for many of the setup
calls in probe() / tear down in remove() and error paths. Some more
work to be done on this one.
- Use dev_err_probe() for errors in probe() callback.
adi,adf4413
- Typo in function name prefix.
adi,adxl345
- Add channel scale to the chip type specific structure and drop
using a type field previously used for indirection.
asahi,ak8985
- Fix a mismatch introduced when switching from enum->pointers
in the match tables.
amlogic,meson
- Expand error logging during probe.
invensense,mpu6050
- Support level-shifter control. Whilst no one is sure exactly what this
is doing it is needed for some old boards.
- Document mount-matrix dt-binding.
mediatek,mt6577
- Use devm_clk_get_enabled() to replace open coded version and move
everything over to being device managed. Drop now empty remove()
callback. Fix follows to put the drvdata back.
- Use dev_err_probe() for error reporting in probe() callback.
memsic,mxc4005
- Add of_match_table.
microchip,mcp4725
- Move various chip specific data from being looked up by chip ID to
data in the chip type specific structure.
silicon-labs,si7005
- Add of_match_table and entry in trivial-devices.yaml
st,lsm6dsx
- Add missing mount-matrix dt binding documentation.
st,spear
- Use devm_clk_get_enabled() and some other devm calls to move everything
over to being device managed. Drop now empty remove() callback.
- Use dev_err_probe() to better handled deferred probing and tidy up
error reporting in probe() callback.
st,stm32-adc
- Add a bit of additional checking in probe() to protect against a NULL
pointer (no known path to trigger it today).
- Replace deprecated strncpy()
ti,ads1015
- Allow for edge triggers.
- Document interrupt in dt-bindings.
* tag 'iio-for-6.7a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (201 commits)
iio: Use device_get_match_data()
iio: adc: MCP3564: fix warn: unsigned '__x' is never less than zero.
dt-bindings: trivial-devices: add silabs,si7005
iio: si7005: Add device tree support
drivers: imu: adis16475.c: Remove scan index from delta channels
dt-bindings: iio: imu: st,lsm6dsx: add mount-matrix property
iio: resolver: ad2s1210: remove of_match_ptr()
iio: resolver: ad2s1210: remove DRV_NAME macro
iio: resolver: ad2s1210: move out of staging
staging: iio: resolver: ad2s1210: simplify code with guard(mutex)
staging: iio: resolver: ad2s1210: clear faults after soft reset
staging: iio: resolver: ad2s1210: refactor sample toggle
staging: iio: resolver: ad2s1210: remove fault attribute
staging: iio: resolver: ad2s1210: add label attribute support
staging: iio: resolver: ad2s1210: add register/fault support summary
staging: iio: resolver: ad2s1210: implement fault events
iio: event: add optional event label support
staging: iio: resolver: ad2s1210: rename DOS reset min/max attrs
staging: iio: resolver: ad2s1210: convert DOS mismatch threshold to event attr
staging: iio: resolver: ad2s1210: convert DOS overrange threshold to event attr
...
|
|
During hisilicon accelerator live migration operation. In order to
prevent the problem of EQ/AEQ interrupt loss. Migration driver will
trigger an EQ/AEQ doorbell at the end of the migration.
This operation may cause double interruption of EQ/AEQ events.
To ensure that the EQ/AEQ interrupt processing function is normal.
The interrupt handling functionality of EQ/AEQ needs to be updated.
Used to handle repeated interrupts event.
Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Removes support for sha1 signed kernel modules, importing sha1 signed
x.509 certificates.
rsa-pkcs1pad keeps sha1 padding support, which seems to be used by
virtio driver.
sha1 remains available as there are many drivers and subsystems using
it. Note only hmac(sha1) with secret keys remains cryptographically
secure.
In the kernel there are filesystems, IMA, tpm/pcr that appear to be
using sha1. Maybe they can all start to be slowly upgraded to
something else i.e. blake3, ParallelHash, SHAKE256 as needed.
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Allow for the declaration of variables that trigger kvfree() when they
go out of scope. The check for NULL and call to kvfree() can be elided
by the compiler in most cases, otherwise without the NULL check an
unnecessary call to kvfree() may be emitted. Peter proposed a comment
for this detail [1].
Link: http://lore.kernel.org/r/20230816103102.GF980931@hirez.programming.kicks-ass.net [1]
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
One of the common operations of a TSM (Trusted Security Module) is to
provide a way for a TVM (confidential computing guest execution
environment) to take a measurement of its launch state, sign it and
submit it to a verifying party. Upon successful attestation that
verifies the integrity of the TVM additional secrets may be deployed.
The concept is common across TSMs, but the implementations are
unfortunately vendor specific. While the industry grapples with a common
definition of this attestation format [1], Linux need not make this
problem worse by defining a new ABI per TSM that wants to perform a
similar operation. The current momentum has been to invent new ioctl-ABI
per TSM per function which at best is an abdication of the kernel's
responsibility to make common infrastructure concepts share common ABI.
The proposal, targeted to conceptually work with TDX, SEV-SNP, COVE if
not more, is to define a configfs interface to retrieve the TSM-specific
blob.
report=/sys/kernel/config/tsm/report/report0
mkdir $report
dd if=binary_userdata_plus_nonce > $report/inblob
hexdump $report/outblob
This approach later allows for the standardization of the attestation
blob format without needing to invent a new ABI. Once standardization
happens the standard format can be emitted by $report/outblob and
indicated by $report/provider, or a new attribute like
"$report/tcg_coco_report" can emit the standard format alongside the
vendor format.
Review of previous iterations of this interface identified that there is
a need to scale report generation for multiple container environments
[2]. Configfs enables a model where each container can bind mount one or
more report generation item instances. Still, within a container only a
single thread can be manipulating a given configuration instance at a
time. A 'generation' count is provided to detect conflicts between
multiple threads racing to configure a report instance.
The SEV-SNP concepts of "extended reports" and "privilege levels" are
optionally enabled by selecting 'tsm_report_ext_type' at register_tsm()
time. The expectation is that those concepts are generic enough that
they may be adopted by other TSM implementations. In other words,
configfs-tsm aims to address a superset of TSM specific functionality
with a common ABI where attributes may appear, or not appear, based on
the set of concepts the implementation supports.
Link: http://lore.kernel.org/r/64961c3baf8ce_142af829436@dwillia2-xfh.jf.intel.com.notmuch [1]
Link: http://lore.kernel.org/r/57f3a05e-8fcd-4656-beea-56bb8365ae64@linux.microsoft.com [2]
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
css_iter and task_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.
This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
whether we are in rcu cs before we want to invoke this kfunc. If the rcu
protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
It is worth noting that currently, bpf_rcu_read_unlock does not
clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
search STACK_ITER.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-6-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch makes some preparations for using css_task_iter_*() in BPF
Program.
1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
use them. Convert them to enum so bpf prog can take them from vmlinux.h.
2. In the next patch we will add css_task_iter_*() in common kfuncs which
is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
screw up irq flags depending on the context where bpf prog is running.
So we should use irqsave/irqrestore here and the switching is harmless.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-2-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231016134750.1381153-6-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-5-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Cross-merge networking fixes after downstream PR.
net/mac80211/key.c
02e0e426a2fb ("wifi: mac80211: fix error path key leak")
2a8b665e6bcc ("wifi: mac80211: remove key_mtx")
7d6904bf26b9 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/
Adjacent changes:
drivers/net/ethernet/ti/Kconfig
a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object")
98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-3-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
fairly clear-cut user reported regressions. The mlx5 DMA bug was
causing strife for 390x folks. The fixes themselves are not
particularly scary, tho. No open investigations / outstanding reports
at the time of writing.
Current release - regressions:
- eth: mlx5: perform DMA operations in the right locations, make
devices usable on s390x, again
- sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
curve, previous fix of rejecting invalid config broke some scripts
- rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
- revert "ethtool: Fix mod state of verbose no_mask bitset", needs
more work
Current release - new code bugs:
- tcp: fix listen() warning with v4-mapped-v6 address
Previous releases - regressions:
- tcp: allow tcp_disconnect() again when threads are waiting, it was
denied to plug a constant source of bugs but turns out .NET depends
on it
- eth: mlx5: fix double-free if buffer refill fails under OOM
- revert "net: wwan: iosm: enable runtime pm support for 7560", it's
causing regressions and the WWAN team at Intel disappeared
- tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
single skb, fix single-stream perf regression on some devices
Previous releases - always broken:
- Bluetooth:
- fix issues in legacy BR/EDR PIN code pairing
- correctly bounds check and pad HCI_MON_NEW_INDEX name
- netfilter:
- more fixes / follow ups for the large "commit protocol" rework,
which went in as a fix to 6.5
- fix null-derefs on netlink attrs which user may not pass in
- tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
Debian for keeping HZ=250 alive)
- net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
letting frankenstein UDP super-frames from getting into the stack
- net: fix interface altnames when ifc moves to a new namespace
- eth: qed: fix the size of the RX buffers
- mptcp: avoid sending RST when closing the initial subflow"
* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
Revert "ethtool: Fix mod state of verbose no_mask bitset"
selftests: mptcp: join: no RST when rm subflow/addr
mptcp: avoid sending RST when closing the initial subflow
mptcp: more conservative check for zero probes
tcp: check mptcp-level constraints for backlog coalescing
selftests: mptcp: join: correctly check for no RST
net: ti: icssg-prueth: Fix r30 CMDs bitmasks
selftests: net: add very basic test for netdev names and namespaces
net: move altnames together with the netdevice
net: avoid UAF on deleted altname
net: check for altname conflicts when changing netdev's netns
net: fix ifname in netlink ntf during netns move
net: ethernet: ti: Fix mixed module-builtin object
net: phy: bcm7xxx: Add missing 16nm EPHY statistics
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
tcp_bpf: properly release resources on error paths
net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
net: mdio-mux: fix C45 access returning -EIO after API change
tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
octeon_ep: update BQL sent bytes before ringing doorbell
...
|
|
This patch adds genradix_peek_prev(), genradix_iter_rewind(), and
genradix_for_each_reverse(), for iterating backwards over a generic
radix tree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we started spreading new inode numbers throughout most of the 64
bit inode space, that triggered some corner case bugs, in particular
some integer overflows related to the radix tree code. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Factor out a new helper, which returns the number of events outstanding.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Like wait_event() - except, because it uses closures and closure
waitlists it doesn't have the restriction on modifying task state inside
the condition check, like wait_event() does.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>
|
|
Prep work for bcachefs - being a fork of bcache it also uses closures
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"An openat() call from io_uring triggering an audit call can apparently
cause the refcount of struct filename to be incremented from multiple
threads concurrently during async execution, triggering a refcount
underflow and hitting a BUG_ON(). That bug has been lurking around
since at least v5.16 apparently.
Switch to an atomic counter to fix that. The underflow check is
downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
remove that check altogether tbh"
* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
audit,io_uring: io_uring openat triggers audit reference count underflow
|
|
We currently have napi_if_scheduled_mark_missed that can be used to
check if napi is scheduled but that does more thing than simply checking
it and return a bool. Some driver already implement custom function to
check if napi is scheduled.
Drop these custom function and introduce napi_is_scheduled that simply
check if napi is scheduled atomically.
Update any driver and code that implement a similar check and instead
use this new helper.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
plat_stmmacenet_data::ext_snapshot_num
Do not store bitmask for enabling AUX_SNAPSHOT0. The previous commit
("net: stmmac: fix PPS capture input index") takes care of calculating
the proper bit mask from the request data's extts.index field, which is
0 if not explicitly specified otherwise.
Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make IS_POSIXACL() return false if POSIX ACL support is disabled.
Never skip applying the umask in namei.c and never bother to do any
ACL specific checks if the filesystem falsely indicates it has ACLs
enabled when the feature is completely disabled in the kernel.
This fixes a problem where the umask is always ignored in the NFS
client when compiled without CONFIG_FS_POSIX_ACL. This is a 4 year
old regression caused by commit 013cdf1088d723 which itself was not
completely wrong, but failed to consider all the side effects by
misdesigned VFS code.
Prior to that commit, there were two places where the umask could be
applied, for example when creating a directory:
1. in the VFS layer in SYSCALL_DEFINE3(mkdirat), but only if
!IS_POSIXACL()
2. again (unconditionally) in nfs3_proc_mkdir()
The first one does not apply, because even without
CONFIG_FS_POSIX_ACL, the NFS client sets SB_POSIXACL in
nfs_fill_super().
After that commit, (2.) was replaced by:
2b. in posix_acl_create(), called by nfs3_proc_mkdir()
There's one branch in posix_acl_create() which applies the umask;
however, without CONFIG_FS_POSIX_ACL, posix_acl_create() is an empty
dummy function which does not apply the umask.
The approach chosen by this patch is to make IS_POSIXACL() always
return false when POSIX ACL support is disabled, so the umask always
gets applied by the VFS layer. This is consistent with the (regular)
behavior of posix_acl_create(): that function returns early if
IS_POSIXACL() is false, before applying the umask.
Therefore, posix_acl_create() is responsible for applying the umask if
there is ACL support enabled in the file system (SB_POSIXACL), and the
VFS layer is responsible for all other cases (no SB_POSIXACL or no
CONFIG_FS_POSIX_ACL).
Signed-off-by: Max Kellermann <mk@cm4all.com>
Link: https://lore.kernel.org/r/151603744662.29035.4910161264124875658.stgit@rabbit.intern.cm-ag
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
A backing file struct stores two path's, one "real" path that is referring
to f_inode and one "fake" path, which should be displayed to users in
/proc/<pid>/maps.
There is a lot more potential code that needs to know the "real" path, then
code that needs to know the "fake" path.
Instead of code having to request the "real" path with file_real_path(),
store the "real" path in f_path and require code that needs to know the
"fake" path request it with file_user_path().
Replace the file_real_path() helper with a simple const accessor f_path().
After this change, file_dentry() is not expected to observe any files
with overlayfs f_path and real f_inode, so the call to ->d_real() should
not be needed. Leave the ->d_real() call for now and add an assertion
in ovl_d_real() to catch if we made wrong assumptions.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231009153712.1566422-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Overlayfs uses backing files with "fake" overlayfs f_path and "real"
underlying f_inode, in order to use underlying inode aops for mapped
files and to display the overlayfs path in /proc/<pid>/maps.
In preparation for storing the overlayfs "fake" path instead of the
underlying "real" path in struct backing_file, define a noop helper
file_user_path() that returns f_path for now.
Use the new helper in procfs and kernel logs whenever a path of a
mapped file is displayed to users.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231009153712.1566422-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20231004111916.728135-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.
Cc: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct watch_filter.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Siddh Raman Pant <code@siddh.me>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qian Cai <cai@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Message-Id: <20230922175407.work.754-kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This declutters the code by reducing the number of #ifdefs and makes
the watch_queue checks simpler. This has no runtime effect; the
machine code is identical.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-2-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This has no effect on 64 bit because there are 10 32-bit integers
surrounding the two bools, but on 32 bit architectures, this reduces
the struct size by 4 bytes by merging the two bools into one word.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-1-max.kellermann@ionos.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4
also sets this flag to prevent the VFS from applying the umask on
newly-created files. NFSv4 doesn't support POSIX ACLs however, which
causes confusion when other subsystems try to test for them.
Add a new SB_I_NOUMASK flag that allows filesystems to opt-in to umask
stripping without advertising support for POSIX ACLs. Set the new flag
on NFSv4 instead of SB_POSIXACL.
Also, move mode_strip_umask to namei.h and convert init_mknod and
init_mkdir to use it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230911-acl-fix-v3-1-b25315333f6c@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There are no users with a vlynq_driver in the Kernel tree. Also, only
the AR7 platform ever initialized a VLYNQ bus, but AR7 is going to be
removed from the Kernel. OpenWRT had some out-of-tree drivers which they
probably intended to upport, but AR7 devices are even there not
supported anymore because they are "stuck with Kernel
3.18" [1]. This code can go.
[1] https://openwrt.org/docs/techref/targets/ar7
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Because group consistency is non-atomic between parent (filedesc) and children
(inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum
non-matching counter groups -- with non-sensical results.
Add group_generation to distinguish the case where a parent group removes and
adds an event and thus has the same number, but a different configuration of
events as inherited groups.
This became a problem when commit fa8c269353d5 ("perf/core: Invert
perf_read_group() loops") flipped the order of child_list and sibling_list.
Previously it would iterate the group (sibling_list) first, and for each
sibling traverse the child_list. In this order, only the group composition of
the parent is relevant. By flipping the order the group composition of the
child (inherited) events becomes an issue and the mis-match in group
composition becomes evident.
That said; even prior to this commit, while reading of a group that is not
equally inherited was not broken, it still made no sense.
(Ab)use ECHILD as error return to indicate issues with child process group
composition.
Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net
|
|
The vmlinux decompressor code intentionally has only a limited set of
included header files, but this started running into a build failure
because of the bitmap logic needing linux/errno.h:
In file included from include/linux/cpumask.h:12,
from include/linux/mm_types_task.h:14,
from include/linux/mm_types.h:5,
from include/linux/buildid.h:5,
from include/linux/module.h:14,
from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:39,
from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
from arch/arm/boot/compressed/decompress.c:60:
include/linux/bitmap.h: In function 'bitmap_allocate_region':
include/linux/bitmap.h:527:25: error: 'EBUSY' undeclared (first use in this function)
527 | return -EBUSY;
| ^~~~~
include/linux/bitmap.h:527:25: note: each undeclared identifier is reported only once for each function it appears in
include/linux/bitmap.h: In function 'bitmap_find_free_region':
include/linux/bitmap.h:554:17: error: 'ENOMEM' undeclared (first use in this function)
554 | return -ENOMEM;
| ^~~~~~
This is easily avoided by changing linux/buildid.h to no longer depend on
linux/mm_types.h, a header that pulls in a huge number of indirect dependencies.
Fixes: b9c957f554442 ("bitmap: move bitmap_*_region() functions to bitmap.h")
Fixes: bd7525dacd7e2 ("bpf: Move stack_map_get_build_id into lib")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|
Arnd noticed we have a case where a shorter source string is being copied
into a destination byte array, but this results in a strnlen() call that
exceeds the size of the source. This is seen with -Wstringop-overread:
In file included from ../include/linux/uuid.h:11,
from ../include/linux/mod_devicetable.h:14,
from ../include/linux/cpufeature.h:12,
from ../arch/x86/coco/tdx/tdx.c:7:
../arch/x86/coco/tdx/tdx.c: In function 'tdx_panic.constprop':
../include/linux/string.h:284:9: error: 'strnlen' specified bound 64 exceeds source size 60 [-Werror=stringop-overread]
284 | memcpy_and_pad(dest, _dest_len, src, strnlen(src, _dest_len), pad); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/x86/coco/tdx/tdx.c:124:9: note: in expansion of macro 'strtomem_pad'
124 | strtomem_pad(message.str, msg, '\0');
| ^~~~~~~~~~~~
Use the smaller of the two buffer sizes when calling strnlen(). When
src length is unknown (SIZE_MAX), it is adjusted to use dest length,
which is what the original code did.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: dfbafa70bde2 ("string: Introduce strtomem() and strtomem_pad()")
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Prior to f747e6667ebb2 __is_constexpr() was in its only user minmax.h.
That commit moved it to const.h - but that file just defines ULL(x) and
UL(x) so that constants can be defined for .S and .c files.
So apart from the word 'const' it wasn't really a good location. Instead
move the definition to compiler.h just before the similar
is_signed_type() and is_unsigned_type().
This may not be a good long-term home, but the three definitions belong
together.
Link: https://lkml.kernel.org/r/2a6680bbe2e84459816a113730426782@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
signed constants
Allow (for example) min(unsigned_var, 20).
The opposite min(signed_var, 20u) is still errored.
Since a comparison between signed and unsigned never makes the unsigned
value negative it is only necessary to adjust the __types_ok() test.
Link: https://lkml.kernel.org/r/633b64e2f39e46bb8234809c5595b8c7@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since 'unsigned char/short' get promoted to 'signed int' it is safe to
compare them against an 'int' value.
Link: https://lkml.kernel.org/r/8732ef5f809c47c28a7be47c938b28d4@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove the extra indentation and align continuation markers.
Link: https://lkml.kernel.org/r/bed41317a05c498ea0209eafbcab45a5@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The type-check in min()/max() is there to stop unexpected results if a
negative value gets converted to a large unsigned value. However it also
rejects 'unsigned int' v 'unsigned long' compares which are common and
never problematc.
Replace the 'same type' check with a 'same signedness' check.
The new test isn't itself a compile time error, so use static_assert() to
report the error and give a meaningful error message.
Due to the way builtin_choose_expr() works detecting the error in the
'non-constant' side (where static_assert() can be used) also detects
errors when the arguments are constant.
Link: https://lkml.kernel.org/r/fe7e6c542e094bfca655abcd323c1c98@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "minmax: Relax type checks in min() and max()", v4.
The min() (etc) functions in minmax.h require that the arguments have
exactly the same types.
However when the type check fails, rather than look at the types and fix
the type of a variable/constant, everyone seems to jump on min_t(). In
reality min_t() ought to be rare - when something unusual is being done,
not normality.
The orginal min() (added in 2.4.9) replaced several inline functions and
included the type - so matched the implicit casting of the function call.
This was renamed min_t() in 2.4.10 and the current min() added. There is
no actual indication that the conversion of negatve values to large
unsigned values has ever been an actual problem.
A quick grep shows 5734 min() and 4597 min_t(). Having the casts on
almost half of the calls shows that something is clearly wrong.
If the wrong type is picked (and it is far too easy to pick the type of
the result instead of the larger input) then significant bits can get
discarded.
Pretty much the worst example is in the derived clamp_val(), consider:
unsigned char x = 200u;
y = clamp_val(x, 10u, 300u);
I also suspect that many of the min_t(u16, ...) are actually wrong. For
example copy_data() in printk_ringbuffer.c contains:
data_size = min_t(u16, buf_size, len);
Here buf_size is 'unsigned int' and len 'u16', pass a 64k buffer (can you
prove that doesn't happen?) and no data is returned. Apparantly it did -
and has since been fixed.
The only reason that most of the min_t() are 'fine' is that pretty much
all the values in the kernel are between 0 and INT_MAX.
Patch 1 adds umin(), this uses integer promotions to convert both
arguments to 'unsigned long long'. It can be used to compare a signed
type that is known to contain a non-negative value with an unsigned type.
The compiler typically optimises it all away. Added first so that it can
be referred to in patch 2.
Patch 2 replaces the 'same type' check with a 'same signedness' one. This
makes min(unsigned_int_var, sizeof()) be ok. The error message is also
improved and will contain the expanded form of both arguments (useful for
seeing how constants are defined).
Patch 3 just fixes some whitespace.
Patch 4 allows comparisons of 'unsigned char' and 'unsigned short' to
signed types. The integer promotion rules convert them both to 'signed
int' prior to the comparison so they can never cause a negative value be
converted to a large positive one.
Patch 5 (rewritted for v4) allows comparisons of unsigned values against
non-negative constant integer expressions. This makes
min(unsigned_int_var, 4) be ok.
The only common case that is still errored is the comparison of signed
values against unsigned constant integer expressions below __INT_MAX__.
Typcally min(int_val, sizeof (foo)), the real fix for this is casting the
constant: min(int_var, (int)sizeof (foo)).
With all the patches applied pretty much all the min_t() could be replaced
by min(), and most of the rest by umin(). However they all need careful
inspection due to code like:
sz = min_t(unsigned char, sz - 1, LIM - 1) + 1;
which converts 0 to LIM.
This patch (of 6):
umin() and umax() can be used when min()/max() errors a signed v unsigned
compare when the signed value is known to be non-negative.
Unlike min_t(some_unsigned_type, a, b) umin() will never mask off high
bits if an inappropriate type is selected.
The '+ 0u + 0ul + 0ull' may look strange.
The '+ 0u' is needed for 'signed int' on 64bit systems.
The '+ 0ul' is needed for 'signed long' on 32bit systems.
The '+ 0ull' is needed for 'signed long long'.
Link: https://lkml.kernel.org/r/b97faef60ad24922b530241c5d7c933c@AcuMS.aculab.com
Link: https://lkml.kernel.org/r/41d93ca827a248698ec64bf57e0c05a5@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|