Age | Commit message (Collapse) | Author |
|
RT_TOS() from include/uapi/linux/in_route.h is defined using
IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for
files such as include/net/ip_fib.h that want to use RT_TOS() as without
including both header files kernel compilation fails:
In file included from ./include/net/ip_fib.h:25,
from ./include/net/route.h:27,
from ./include/net/lwtunnel.h:9,
from net/core/dst.c:24:
./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’:
./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function)
31 | #define RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~~~~~~
./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’
440 | return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
Therefore, cited commit changed linux/in_route.h to include linux/ip.h.
However, as reported by David, this breaks iproute2 compilation due
overlapping definitions between linux/ip.h and
/usr/include/netinet/ip.h:
In file included from ../include/uapi/linux/in_route.h:5,
from iproute.c:19:
../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined
25 | #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~
In file included from iproute.c:17:
/usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition
222 | #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK)
Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that
usage of RT_TOS() should not spread further in the kernel due to recent
work in this area.
Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching")
Reported-by: David Ahern <dsahern@kernel.org>
Closes: https://lore.kernel.org/netdev/2f5146ff-507d-4cab-a195-b28c0c9e654e@kernel.org/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240903133554.2807343-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows managing
LEDs in storage enclosures. NPEM is indication oriented and it does not
give direct access to LEDs. Although each indication *could* represent an
individual LED, multiple indications could also be represented as a single,
multi-color LED or a single LED blinking in a specific interval. The
specification leaves that open.
Each enabled indication (capability register bit on) is represented as a
ledclass_dev which can be controlled through sysfs. For every ledclass
device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
This corresponds to the NPEM control register (Indication bit on/off).
Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
device which has an NPEM Extended Capability and indication is enabled in
NPEM capability register. For example, these are LEDs created for pcieport
"10000:02:05.0" on my setup:
leds/
├── 10000:02:05.0:enclosure:fail
├── 10000:02:05.0:enclosure:locate
├── 10000:02:05.0:enclosure:ok
└── 10000:02:05.0:enclosure:rebuild
They can be also found in "/sys/class/leds" directory. The parent PCIe
device domain/bus/device/function address is used to guarantee uniqueness
across leds subsystem.
To enable/disable a "fail" indication, the "brightness" file can be edited:
echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness
PCIe r6.1, sec 7.9.19.2 defines the possible indications.
Multiple indications for same parent PCIe device can conflict and hardware
may update them when processing new request. To avoid issues, driver
refresh all indications by reading back control register.
This driver expects to be the exclusive NPEM extended capability manager.
It waits up to 1 second after imposing new request, it doesn't verify if
controller is busy before write, and it assumes the mutex lock gives
protection from concurrent updates.
If _DSM LED management is available, we assume the platform may be using
NPEM for its own purposes (see PCI Firmware Spec r3.3 sec 4.7), so the
driver does not use NPEM. A future patch will add _DSM support; an info
message notes whether NPEM or _DSM is being used.
NPEM is a PCIe extended capability so it should be registered in
pcie_init_capabilities() but it is not possible due to LED dependency. The
parent pci_device must be added earlier for led_classdev_register() to be
successful. NPEM does not require configuration on kernel side, so it is
safe to register LED devices later.
Link: https://lore.kernel.org/r/20240904104848.23480-3-mariusz.tkaczyk@linux.intel.com
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Pull bpf/master to receive baebe9aaba1e ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Now we have everything in place and we can allow idmapped mounts
by setting the FS_ALLOW_IDMAP flag. Notice that real availability
of idmapped mounts will depend on the fuse daemon. Fuse daemon
have to set FUSE_ALLOW_IDMAP flag in the FUSE_INIT reply.
To discuss:
- we enable idmapped mounts support only if "default_permissions" mode is
enabled, because otherwise we would need to deal with UID/GID mappings in
the userspace side OR provide the userspace with idmapped
req->in.h.uid/req->in.h.gid values which is not something that we probably
want to. Idmapped mounts philosophy is not about faking caller uid/gid.
Some extra links and examples:
- libfuse support
https://github.com/mihalicyn/libfuse/commits/idmap_support
- fuse-overlayfs support:
https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support
- cephfs-fuse conversion example
https://github.com/mihalicyn/ceph/commits/fuse_idmap
- glusterfs conversion example
https://github.com/mihalicyn/glusterfs/commits/fuse_idmap
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add some preparational changes in fuse_get_req/fuse_force_creds
to handle idmappings.
Miklos suggested [1], [2] to change the meaning of in.h.uid/in.h.gid
fields when daemon declares support for idmapped mounts. In a new semantic,
we fill uid/gid values in fuse header with a id-mapped caller uid/gid (for
requests which create new inodes), for all the rest cases we just send -1
to userspace.
No functional changes intended.
Link: https://lore.kernel.org/all/CAJfpegsVY97_5mHSc06mSw79FehFWtoXT=hhTUK_E-Yhr7OAuQ@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAJfpegtHQsEUuFq1k4ZbTD3E1h-GsrN3PWyv7X8cg6sfU_W2Yw@mail.gmail.com/ [2]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add a regset for POE containing POR_EL0.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-21-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch uses zero as timeout marker for those elements that never expire
when the element is created.
If userspace provides no timeout for an element, then the default set
timeout applies. However, if no default set timeout is specified and
timeout flag is set on, then timeout extension is allocated and timeout
is set to zero to allow for future updates.
Use of zero a never timeout marker has been suggested by Phil Sutter.
Note that, in older kernels, it is already possible to define elements
that never expire by declaring a set with the set timeout flag set on
and no global set timeout, in this case, new element with no explicit
timeout never expire do not allocate the timeout extension, hence, they
never expire. This approach makes it complicated to accomodate element
timeout update, because element extensions do not support reallocations.
Therefore, allocate the timeout extension and use the new marker for
this case, but do not expose it to userspace to retain backward
compatibility in the set listing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-testing
Jonathan writes:
IIO: 1st set of new device support, features and cleanup for 6.12
Includes a merge of spi-mos-config branch from spi.git that brings
support needed for the AD4000 driver.
Lots of new device support this time including 9 new drivers and substantial
changes to add new support to several more.
New device support
------------------
Given we have a lot of new support, I've subcategorized them:
Substantial changes, or new driver
**********************************
adi,ad4000
- New driver for this high speed ADC.
adi,ad4695
- New driver supporting AD4690, AD4696, AD4697 and AD4698 ADCs.
- Follow up series added triggered buffer support.
adi,ad7380
- Add support for single ended parts, AD7386, ADC7387, AD7388 and -4 variants.
(driver previously only support differential parts).
These variants have an additional front end MUX so only half the channels
can be sampled efficiently.
adi,ad9467
- Refactor and extend driver to support ad9643, ad9449 and ad9652 high speed
ADCs.
adi,adxl380
- New driver for this low power accelerometer.
adi,ltc2664
- New driver supporting LTC2664 and LTC2672 DACs.
microchip,pac1921
- New driver for this power/current monitor chip.
rohm,bh1745
- New driver for this RGBC colour sensor.
rohm,bu27034anuc
- The original bu27034 was canceled before mass production, so the
driver is modified to support the BU27034ANUC which had some significant
differences. DT compatible changed to avoid chance of old driver ever
binding to real hardware.
sciosense,ens210
- New driver for ens210, ens210a, ens211, ens212, ens213a, and ens215
temperature and humidity sensors (all register compatible up to some
conversion time differences)
sensiron,sdp500
- New driver for this differential pressure sensor.
tyhx,hx9023s
- New driver to support this capacitive proximity sensor.
Minor changes to support new devices
************************************
adi,adf4377
- Add support for the single output adf4378.
kionix,kxcjk-1013
- Add support for KX022-1020 accelerometer (binding and ID table only)
liteon,ltrf216a
- Add support for ltr-308. A few minor differences in features set
rockchip,saradc
- Add ID for rk3576-saradc
sensortek,stk3310
- Add ID for stk3013 proximity sensor which (despite documentation) has
an ambient light sensor and is compatible with existing parts.
Documentation updates
---------------------
Generalize ABI docs for shunt resistor attribute
Improve calibscale and calibbias related documentation. A couple of follow
up patches to resolve duplicate documentation that resulted.
New core features
-----------------
backend
- Add option for debugfs - useful for test pattern control
- Use this for both adi-axi-adc and adi-axi-dac
trigger suspend
- Add functions to allow triggers to be suspended. This avoids problems
when a device enters suspend to idle with a sysfs trigger. Use it for now
in the bmi323 only.
New driver features
-------------------
adi,ad7192
- Add option to be a clock provider (+ additional clock config options)
adi,ad7380
- Add documentation for this fairly new driver.
adi,ad9461
- Provide control of test modes and backend validation blocks used
to identify problems (via debugfs)
adi,ad9739
- Add backend debugfs and docs for what is provided via adi-axi-dac
avago,apds9960
- Add proximity and gesture calibration offset control
bosch,bmp280
- Triggered buffer support including adding raw+scale output for sysfs.
liteon,ltr390
- Add configuration of integration time and scale.
stm,dfsdm
- Convert this SD modulator driver to backend framework and add support
for channel scaling + modern channel bindings.
Treewide cleanup
----------------
iio_dev->masklength: Making it private.
- Provide access function to read the core compute channel mask length
and a macro to iterate over elements in the active_scan_mask.
- Enables marking masklength __private preventing drivers from
writing it without triggering a build warning whilst minimizing overhead
in what are typically hot paths.
- Convert all drivers and finally mark it private.
Merge conflicts resolved in drivers applied after this point.
Constify regmap_bus
- These are never modified, so mark them const.
Core cleanup
------------
backend
- A few late breaking bits of feedback (unused variable, error messages)
dma-buffer
- Namespace exports.
core
- Drop unused assignment.
Driver cleanup
--------------
adi,ad4695
- Fixing binding to reflect that common-mode-channel is a scalar.
adi,ad7280a
- Use __free(kfree) to simplify freeing of receive buffer.
adi,ad7606
- Various dt-binding cleanup and improvements.
- Fix oversampling related gpio handling.
- Make polarity of standby gpio match documentation.
- use guard() to simplify lock handling.
adi,ad7768
- Use device_for_each_child_node_scoped() instead of fwnode equivalent.
adi,ad7124
- Reduce SPI transfers by avoiding separate writes to different fields
in the same register.
- Start the ADC in idle mode.
adi,adis
- Drop ifdefs in favor of IS_ENABLED.
adi,admv8818
- Fix wrong ABI docs.
asahi-kasei,ak8975
- Drop a prefix free compatible accidentally added recently.
aspeed,adc
- Use of_property_present() instead of of_find_property() to see if the
property is there or not.
atmel,at91,
- Use __free(kfree) to simplify freeing of channel related array.
bosch,bma400
- Use __free(kfree) to simplify freeing a locally allocated string.
bosch,bmc150
- Add missing mount-matrix binding docs.
bosch,bme680
- Fix read/write to ensure multiple necessary sequential reads without
device configuration change.
- Drop unnecessary type casts and use more appropriate data types.
- Drop some left over ACPI code as ACPI support was removed due to invalid
IDs (and no known users).
- Sort headers consistently.
- Avoid unnecessary duplicate read and redundant read of gas config.
- Use bulk reads to get calibration data.
- Reorder allocation of IIO device to be prior to device init.
- Add remaining read/write buffers to the union used already for all others.
- Tidy up error checks for consistency of style, including dev_err_probe()
- Bring the device startup procedure inline with the vendor code.
- Reorder code so mode forcing is more obvious occurring where needed.
- Tidy up data locality in reading functions so no magic data is stored
in state structures just to get it across function calls.
- Make a local lookup table static to avoid placing it on the stack.
bosch,bmp280
- Fix BME280 regmap to not include registers it doesn't have.
- Wait a little longer after config to allow for maximum possible necessary
wait.
- Reorganize headers.
- Make conversion_time_max array static to avoid placing it on the stack.
maxim,max1363
- Use __free(kfree) to simplify freeing transmission buffer.
microchip,mcp3964
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp3911
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4728
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4922
- Use devm_regulator_get_enable_read_voltage() and devm_* to allow
dropping of explicit remove() callback.
onnn,noa1305
- Various tidy up.
- Provide available scale values.
- Make integration time configurable.
- Fix up integration time look up (/2 error)
ti,dac7311
- Check if spi_setup() succeeded.
ti,tsc2046
- Use __free(kfree) to simplify freeing rx and tx buffers.
- Use devm_regulator_get_enable_read_voltage()
Various minor fixes not called out explicitly.
* tag 'iio-for-6.12a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (250 commits)
drivers:iio:Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
iio: sgp40: retain documentation in driver
iio: ABI: remove duplicate in_resistance_calibbias
dt-bindings: iio: st,stm32-adc: add top-level constraints
iio: ABI: add missing calibbias attributes
iio: ABI: add missing calibscale attributes
iio: ABI: sort calibscale attributes
iio: ABI: document calibscale_available attributes
iio: light: ltr390: Calculate 'counts_per_uvi' dynamically
iio: light: ltr390: Add ALS channel and support for gain and resolution
doc: iio: ad4695: document buffered read
iio: adc: ad4695: implement triggered buffer
iio: proximity: hx9023s: Fix error code in hx9023s_property_get()
iio: light: noa1305: Fix up integration time look up
iio: humidity: Add support for ENS210
dt-bindings: iio: humidity: add ENS210 sensor family
iio: imu: adis16460: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16400: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16480: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16475: drop ifdef around CONFIG_DEBUG_FS
...
|
|
The PG_error bit is now unused; delete it and free up a bit in
page->flags.
Link: https://lkml.kernel.org/r/20240807193528.1865100-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add ability to set per-dentry mount expire timeout to autofs.
There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).
Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.
One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as the
current autofs default).
2) "nounmount", don't expire this mount (same as setting the autofs
timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
timeout (again same as setting the autofs timeout but only for
this mount).
To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map keys
(mounts) for autofs indirect mounts use an expire timeout stored in
the autofs mount super block info. structure and all indirect mounts
use the same expire timeout.
Now I have a request to add the "nounmount" option so I need to add
the per-dentry expire handling to the kernel implementation to do this.
The implementation uses the trailing path component to identify the
mount (and is also used as the autofs map key) which is passed in the
autofs_dev_ioctl structure path field. The expire timeout is passed
in autofs_dev_ioctl timeout field (well, of the timeout union).
If the passed in timeout is equal to -1 the per-dentry timeout and
flag are cleared providing for the "unmount" option. If the timeout
is greater than or equal to 0 the timeout is set to the value and the
flag is also set. If the dentry timeout is 0 the dentry will not expire
by timeout which enables the implementation of the "nounmount" option
for the specific mount. When the dentry timeout is greater than zero it
allows for the implementation of the "utimeout=<seconds>" option.
Signed-off-by: Ian Kent <raven@themaw.net>
Link: https://lore.kernel.org/r/20240814090231.963520-1-raven@themaw.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
By default, any recv/read operation that uses provided buffers will
consume at least 1 buffer fully (and maybe more, in case of bundles).
This adds support for incremental consumption, meaning that an
application may add large buffers, and each read/recv will just consume
the part of the buffer that it needs.
For example, let's say an application registers 1MB buffers in a
provided buffer ring, for streaming receives. If it gets a short recv,
then the full 1MB buffer will be consumed and passed back to the
application. With incremental consumption, only the part that was
actually used is consumed, and the buffer remains the current one.
This means that both the application and the kernel needs to keep track
of what the current receive point is. Each recv will still pass back a
buffer ID and the size consumed, the only difference is that before the
next receive would always be the next buffer in the ring. Now the same
buffer ID may return multiple receives, each at an offset into that
buffer from where the previous receive left off. Example:
Application registers a provided buffer ring, and adds two 32K buffers
to the ring.
Buffer1 address: 0x1000000 (buffer ID 0)
Buffer2 address: 0x2000000 (buffer ID 1)
A recv completion is received with the following values:
cqe->res 0x1000 (4k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
and the application now knows that 4096b of data is available at
0x1000000, the start of that buffer, and that more data from this buffer
will be coming. Now the next receive comes in:
cqe->res 0x2010 (8k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
which tells the application that 8k is available where the last
completion left off, at 0x1001000. Next completion is:
cqe->res 0x5000 (20k bytes received)
cqe->flags 0x1 (CQE_F_BUFFER set, buffer ID 0)
and the application now knows that 20k of data is available at
0x1003000, which is where the previous receive ended. CQE_F_BUF_MORE
isn't set, as no more data is available in this buffer ID. The next
completion is then:
cqe->res 0x1000 (4k bytes received)
cqe->flags 0x10001 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 1)
which tells the application that buffer ID 1 is now the current one,
hence there's 4k of valid data at 0x2000000. 0x2001000 will be the next
receive point for this buffer ID.
When a buffer will be reused by future CQE completions,
IORING_CQE_BUF_MORE will be set in cqe->flags. This tells the application
that the kernel isn't done with the buffer yet, and that it should expect
more completions for this buffer ID. Will only be set by provided buffer
rings setup with IOU_PBUF_RING INC, as that's the only type of buffer
that will see multiple consecutive completions for the same buffer ID.
For any other provided buffer type, any completion that passes back
a buffer to the application is final.
Once a buffer has been fully consumed, the buffer ring head is
incremented and the next receive will indicate the next buffer ID in the
CQE cflags.
On the send side, the application can manage how much data is sent from
an existing buffer by setting sqe->len to the desired send length.
An application can request incremental consumption by setting
IOU_PBUF_RING_INC in the provided buffer ring registration. Outside of
that, any provided buffer ring setup and buffer additions is done like
before, no changes there. The only change is in how an application may
see multiple completions for the same buffer ID, hence needing to know
where the next receive will happen.
Note that like existing provided buffer rings, this should not be used
with IOSQE_ASYNC, as both really require the ring to remain locked over
the duration of the buffer selection and the operation completion. It
will consume a buffer otherwise regardless of the size of the IO done.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is a need for userspace applications to open HID devices directly.
Use-cases include configuration of gaming mice or direct access to
joystick devices. The latter is currently handled by the uaccess tag in
systemd, other devices include more custom/local configurations or just
sudo.
A better approach is what we already have for evdev devices: give the
application a file descriptor and revoke it when it may no longer access
that device.
This patch is the hidraw equivalent to the EVIOCREVOKE ioctl, see
commit c7dc65737c9a ("Input: evdev - add EVIOCREVOKE ioctl") for full
details.
An MR for systemd-logind has been filed here:
https://github.com/systemd/systemd/pull/33970
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Link: https://patch.msgid.link/20240827-hidraw-revoke-v5-1-d004a7451aea@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
The fallocate system call takes a mode argument, but that argument
contains a wild mix of exclusive modes and an optional flags.
Replace FALLOC_FL_SUPPORTED_MASK with FALLOC_FL_MODE_MASK, which excludes
the optional flag bit, so that we can use switch statement on the value
to easily enumerate the cases while getting the check for duplicate modes
for free.
To make this (and in the future the file system implementations) more
readable also add a symbolic name for the 0 mode used to allocate blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240827065123.1762168-4-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This adds GENMASK_U128() and __GENMASK_U128() macros using __BITS_PER_U128
and __int128 data types. These macros will be used in providing support for
generating 128 bit masks.
The macros wouldn't work in all assembler flavors for reasons described
in the comments on top of declarations. Enforce it for more by adding
!__ASSEMBLY__ guard.
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|
Nicolin Chen says:
=========
IOMMU_RESV_SW_MSI is a unique region defined by an IOMMU driver. Though it
is eventually used by a device for address translation to an MSI location
(including nested cases), practically it is a universal region across all
domains allocated for the IOMMU that defines it.
Currently IOMMUFD core fetches and reserves the region during an attach to
an hwpt_paging. It works with a hwpt_paging-only case, but might not work
with a nested case where a device could directly attach to a hwpt_nested,
bypassing the hwpt_paging attachment.
Move the enforcement forward, to the hwpt_paging allocation function. Then
clean up all the SW_MSI related things in the attach/replace routine.
=========
Based on v6.11-rc5 for dependencies.
* nesting_reserved_regions: (562 commits)
iommufd/device: Enforce reserved IOVA also when attached to hwpt_nested
Linux 6.11-rc5
...
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.12-2024-08-26:
amdgpu:
- SDMA devcoredump support
- DCN 4.0.1 updates
- DC SUBVP fixes
- Refactor OPP in DC
- Refactor MMHUBBUB in DC
- DC DML 2.1 updates
- DC FAMS2 updates
- RAS updates
- GFX12 updates
- VCN 4.0.3 updates
- JPEG 4.0.3 updates
- Enable wave kill (soft recovery) for compute queues
- Clean up CP error interrupt handling
- Enable CP bad opcode interrupts
- VCN 4.x fixes
- VCN 5.x fixes
- GPU reset fixes
- Fix vbios embedded EDID size handling
- SMU 14.x updates
- Misc code cleanups and spelling fixes
- VCN devcoredump support
- ISP MFD i2c support
- DC vblank fixes
- GFX 12 fixes
- PSR fixes
- Convert vbios embedded EDID to drm_edid
- DCN 3.5 updates
- DMCUB updates
- Cursor fixes
- Overdrive support for SMU 14.x
- GFX CP padding optimizations
- DCC fixes
- DSC fixes
- Preliminary per queue reset infrastructure
- Initial per queue reset support for GFX 9
- Initial per queue reset support for GFX 7, 8
- DCN 3.2 fixes
- DP MST fixes
- SR-IOV fixes
- GFX 9.4.3/4 devcoredump support
- Add process isolation framework
- Enable process isolation support for GFX 9.4.3/4
- Take IOMMU remapping into account for P2P DMA checks
amdkfd:
- CRIU fixes
- Improved input validation for user queues
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Initial per queue reset support for GFX 9
- Allow users to target recommended SDMA engines
radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking
UAPI:
- KFD support for targetting queues on recommended SDMA engines
Proposed userspace:
https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7
https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30
drm/buddy:
- Add start address support for trim function
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com
|
|
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might
as well catch up with a backmerge and handle them all. Plus both misc
and intel maintainers asked for a backmerge anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Implement and document new pin attributes for providing Embedded SYNC
capabilities to the DPLL subsystem users through a netlink pin-get
do/dump messages. Allow the user to set Embedded SYNC frequency with
pin-set do netlink message.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-2-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct spelling in Networking headers.
As reported by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822-net-spell-v1-12-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct spelling in if_packet.h
As reported by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240822-net-spell-v1-1-3a98971ce2d2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the ethtool netlink cable testing interface by adding support for
specifying the source of cable testing results. This allows users to
differentiate between results obtained through different diagnostic
methods.
For example, some TI 10BaseT1L PHYs provide two variants of cable
diagnostics: Time Domain Reflectometry (TDR) and Active Link Cable
Diagnostic (ALCD). By introducing `ETHTOOL_A_CABLE_RESULT_SRC` and
`ETHTOOL_A_CABLE_FAULT_LENGTH_SRC` attributes, this update enables
drivers to indicate whether the result was derived from TDR or ALCD,
improving the clarity and utility of diagnostic information.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240822120703.1393130-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-08-23
We've added 10 non-merge commits during the last 15 day(s) which contain
a total of 10 files changed, 222 insertions(+), 190 deletions(-).
The main changes are:
1) Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_*sockopt() to address the case
when long-lived sockets miss a chance to set additional callbacks
if a sockops program was not attached early in their lifetime,
from Alan Maguire.
2) Add a batch of BPF selftest improvements which fix a few bugs and add
missing features to improve the test coverage of sockmap/sockhash,
from Michal Luczaj.
3) Fix a false-positive Smatch-reported off-by-one in tcp_validate_cookie()
which is part of the test_tcp_custom_syncookie BPF selftest,
from Kuniyuki Iwashima.
4) Fix the flow_dissector BPF selftest which had a bug in IP header's
tot_len calculation doing subtraction after htons() instead of inside
htons(), from Asbjørn Sloth Tønnesen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftest: bpf: Remove mssind boundary check in test_tcp_custom_syncookie.c.
selftests/bpf: Introduce __attribute__((cleanup)) in create_pair()
selftests/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected()
selftests/bpf: Honour the sotype of af_unix redir tests
selftests/bpf: Simplify inet_socketpair() and vsock_socketpair_connectible()
selftests/bpf: Socket pair creation, cleanups
selftests/bpf: Support more socket types in create_pair()
selftests/bpf: Avoid subtraction after htons() in ipip tests
selftests/bpf: add sockopt tests for TCP_BPF_SOCK_OPS_CB_FLAGS
bpf/bpf_get,set_sockopt: add option to set TCP-BPF sock ops flags
====================
Link: https://patch.msgid.link/20240823134959.1091-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reorder include files to alphabetic order to simplify maintenance, and
separate local headers and global headers with a blank line.
No functional change intended.
Link: https://patch.msgid.link/r/7524b037cc05afe19db3c18f863253e1d1554fa2.1722644866.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Expose min_wait_usec in io_uring_getevents_arg, replacing the pad member
that is currently in there. The value is in usecs, which is explained in
the name as well.
Note that if min_wait_usec and a normal timeout is used in conjunction,
the normal timeout is still relative to the base time. For example, if
min_wait_usec is set to 100 and the normal timeout is 1000, the max
total time waited is still 1000. This also means that if the normal
timeout is shorter than min_wait_usec, then only the min_wait_usec will
take effect.
See previous commit for an explanation of how this works.
IORING_FEAT_MIN_TIMEOUT is added as a feature flag for this, as
applications doing submit_and_wait_timeout() style operations will
generally not see the -EINVAL from the wait side as they return the
number of IOs submitted. Only if no IOs are submitted will the -EINVAL
bubble back up to the application.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new registration opcode IORING_REGISTER_CLOCK, which allows the
user to select which clock id it wants to use with CQ waiting timeouts.
It only allows a subset of all posix clocks and currently supports
CLOCK_MONOTONIC and CLOCK_BOOTTIME.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98f2bc8a3c36cdf8f0e6a275245e81e903459703.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In addition to current relative timeouts for the waiting loop, where the
timespec argument specifies the maximum time it can wait for, add
support for the absolute mode, with the value carrying a CLOCK_MONOTONIC
absolute time until which we should return control back to the user.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d5b74d67ada882590b2e42aa3aa7117bbf6b55f.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This adds a kfunc wrapper around strncpy_from_user,
which can be called from sleepable BPF programs.
This matches the non-sleepable 'bpf_probe_read_user_str'
helper except it includes an additional 'flags'
param, which allows consumers to clear the entire
destination buffer on success or failure.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Link: https://lore.kernel.org/r/20240823195101.3621028-1-linux@jordanrome.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, users can only stash kptr into map values with bpf_kptr_xchg().
This patch further supports stashing kptr into local kptr by adding local
kptr as a valid destination type.
When stashing into local kptr, btf_record in program BTF is used instead
of btf_record in map to search for the btf_field of the local kptr.
The local kptr specific checks in check_reg_type() only apply when the
source argument of bpf_kptr_xchg() is local kptr. Therefore, we make the
scope of the check explicit as the destination now can also be local kptr.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Link: https://lore.kernel.org/r/20240813212424.2871455-5-amery.hung@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the passed phy_index.
Add a helper that netlink command handlers need to use to grab the
targeted PHY from the req_info. This helper needs to hold rtnl_lock()
while interacting with the PHY, as it may be removed at any point.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Link topologies containing multiple network PHYs attached to the same
net_device can be found when using a PHY as a media converter for use
with an SFP connector, on which an SFP transceiver containing a PHY can
be used.
With the current model, the transceiver's PHY can't be used for
operations such as cable testing, timestamping, macsec offload, etc.
The reason being that most of the logic for these configuration, coming
from either ethtool netlink or ioctls tend to use netdev->phydev, which
in multi-phy systems will reference the PHY closest to the MAC.
Introduce a numbering scheme allowing to enumerate PHY devices that
belong to any netdev, which can in turn allow userspace to take more
precise decisions with regard to each PHY's configuration.
The numbering is maintained per-netdev, in a phy_device_list.
The numbering works similarly to a netdevice's ifindex, with
identifiers that are only recycled once INT_MAX has been reached.
This prevents races that could occur between PHY listing and SFP
transceiver removal/insertion.
The identifiers are assigned at phy_attach time, as the numbering
depends on the netdevice the phy is attached to. The PHY index can be
re-used for PHYs that are persistent.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops")
f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC")
Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
flexible array
Replace the deprecated[1] use of a 1-element array in
struct vmmdev_hgcm_pagelist with a modern flexible array. As this is
UAPI, we cannot trivially change the size of the struct, so use a union
to retain the old first element's size, but switch "pages" to a flexible
array.
No binary differences are present after this conversion.
Link: https://github.com/KSPP/linux/issues/79 [1]
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240710231555.work.406-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
While supporting GET_REPORT is a mandatory request per the HID
specification the current implementation of the GET_REPORT request responds
to the USB Host with an empty reply of the request length. However, some
USB Hosts will request the contents of feature reports via the GET_REPORT
request. In addition, some proprietary HID 'protocols' will expect
different data, for the same report ID, to be to become available in the
feature report by sending a preceding SET_REPORT to the USB Device that
defines what data is to be presented when that feature report is
subsequently retrieved via GET_REPORT (with a very fast < 5ms turn around
between the SET_REPORT and the GET_REPORT).
There are two other patch sets already submitted for adding GET_REPORT
support. The first [1] allows for pre-priming a list of reports via IOCTLs
which then allows the USB Host to perform the request, with no further
userspace interaction possible during the GET_REPORT request. And another
[2] which allows for a single report to be setup by userspace via IOCTL,
which will be fetched and returned by the kernel for subsequent GET_REPORT
requests by the USB Host, also with no further userspace interaction
possible.
This patch, while loosely based on both the patch sets, differs by allowing
the option for userspace to respond to each GET_REPORT request by setting
up a poll to notify userspace that a new GET_REPORT request has arrived. To
support this, two extra IOCTLs are supplied. The first of which is used to
retrieve the report ID of the GET_REPORT request (in the case of having
non-zero report IDs in the HID descriptor). The second IOCTL allows for
storing report responses in a list for responding to requests.
The report responses are stored in a list (it will be either added if it
does not exist or updated if it exists already). A flag (userspace_req) can
be set to whether subsequent requests notify userspace or not.
Basic operation when a GET_REPORT request arrives from USB Host:
- If the report ID exists in the list and it is set for immediate return
(i.e. userspace_req == false) then response is sent immediately,
userspace is not notified
- The report ID does not exist, or exists but is set to notify userspace
(i.e. userspace_req == true) then notify userspace via poll:
- If userspace responds, and either adds or update the response in
the list and respond to the host with the contents
- If userspace does not respond within the fixed timeout (2500ms)
but the report has been set prevously, then send 'old' report
contents
- If userspace does not respond within the fixed timeout (2500ms)
and the report does not exist in the list then send an empty
report
Note that userspace could 'prime' the report list at any other time.
While this patch allows for flexibility in how the system responds to
requests, and therefore the HID 'protocols' that could be supported, a
drawback is the time it takes to service the requests and therefore the
maximum throughput that would be achievable. The USB HID Specification
v1.11 itself states that GET_REPORT is not intended for periodic data
polling, so this limitation is not severe.
Testing on an iMX8M Nano Ultra Lite with a heavy multi-core CPU loading
showed that userspace can typically respond to the GET_REPORT request
within 1200ms - which is well within the 5000ms most operating systems seem
to allow, and within the 2500ms set by this patch.
[1] https://lore.kernel.org/all/20220805070507.123151-2-sunil@amarulasolutions.com/
[2] https://lore.kernel.org/all/20220726005824.2817646-1-vi@endrift.com/
Signed-off-by: David Sands <david.sands@biamp.com>
Signed-off-by: Chris Wulff <chris.wulff@biamp.com>
Link: https://lore.kernel.org/r/20240817142850.1311460-2-crwulff@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch provides a new feature (i.e., "tunsrc") for the tunnel (i.e.,
"encap") mode of ioam6. Just like seg6 already does, except it is
attached to a route. The "tunsrc" is optional: when not provided (by
default), the automatic resolution is applied. Using "tunsrc" when
possible has a benefit: performance. See the comparison:
- before (= "encap" mode): https://ibb.co/bNCzvf7
- after (= "encap" mode with "tunsrc"): https://ibb.co/PT8L6yq
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Users of IPE require a way to identify when and why an operation fails,
allowing them to both respond to violations of policy and be notified
of potentially malicious actions on their systems with respect to IPE
itself.
This patch introduces 3 new audit events.
AUDIT_IPE_ACCESS(1420) indicates the result of an IPE policy evaluation
of a resource.
AUDIT_IPE_CONFIG_CHANGE(1421) indicates the current active IPE policy
has been changed to another loaded policy.
AUDIT_IPE_POLICY_LOAD(1422) indicates a new IPE policy has been loaded
into the kernel.
This patch also adds support for success auditing, allowing users to
identify why an allow decision was made for a resource. However, it is
recommended to use this option with caution, as it is quite noisy.
Here are some examples of the new audit record types:
AUDIT_IPE_ACCESS(1420):
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=297 comm="sh" path="/root/vol/bin/hello" dev="tmpfs"
ino=3897 rule="op=EXECUTE boot_verified=TRUE action=ALLOW"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=299 comm="sh" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=300 path="/tmp/tmpdp2h1lub/deny/bin/hello" dev="tmpfs"
ino=131 rule="DEFAULT action=DENY"
The above three records were generated when the active IPE policy only
allows binaries from the initramfs to run. The three identical `hello`
binary were placed at different locations, only the first hello from
the rootfs(initramfs) was allowed.
Field ipe_op followed by the IPE operation name associated with the log.
Field ipe_hook followed by the name of the LSM hook that triggered the IPE
event.
Field enforcing followed by the enforcement state of IPE. (it will be
introduced in the next commit)
Field pid followed by the pid of the process that triggered the IPE
event.
Field comm followed by the command line program name of the process that
triggered the IPE event.
Field path followed by the file's path name.
Field dev followed by the device name as found in /dev where the file is
from.
Note that for device mappers it will use the name `dm-X` instead of
the name in /dev/mapper.
For a file in a temp file system, which is not from a device, it will use
`tmpfs` for the field.
The implementation of this part is following another existing use case
LSM_AUDIT_DATA_INODE in security/lsm_audit.c
Field ino followed by the file's inode number.
Field rule followed by the IPE rule made the access decision. The whole
rule must be audited because the decision is based on the combination of
all property conditions in the rule.
Along with the syscall audit event, user can know why a blocked
happened. For example:
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=2138 comm="bash" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit[1956]: SYSCALL arch=c000003e syscall=59
success=no exit=-13 a0=556790138df0 a1=556790135390 a2=5567901338b0
a3=ab2a41a67f4f1f4e items=1 ppid=147 pid=1956 auid=4294967295 uid=0
gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
The above two records showed bash used execve to run "hello" and got
blocked by IPE. Note that the IPE records are always prior to a SYSCALL
record.
AUDIT_IPE_CONFIG_CHANGE(1421):
audit: AUDIT1421
old_active_pol_name="Allow_All" old_active_pol_version=0.0.0
old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649
new_active_pol_name="boot_verified" new_active_pol_version=0.0.0
new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed the current IPE active policy switch from
`Allow_All` to `boot_verified` along with the version and the hash
digest of the two policies. Note IPE can only have one policy active
at a time, all access decision evaluation is based on the current active
policy.
The normal procedure to deploy a policy is loading the policy to deploy
into the kernel first, then switch the active policy to it.
AUDIT_IPE_POLICY_LOAD(1422):
audit: AUDIT1422 policy_name="boot_verified" policy_version=0.0.0
policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F2676
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed a new policy has been loaded into the kernel
with the policy name, policy version and policy hash.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.
What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).
Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:
# ip -4 rule add tos 0x1c table 100
# ip -4 rule add tos 0x3c table 100
Error: Invalid tos.
While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.
Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.
A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.
However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.
This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.
Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.
Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.
Include <linux/ip.h> in <linux/in_route.h> since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
<linux/in_route.h>.
No regressions in FIB tests:
# ./fib_tests.sh
[...]
Tests passed: 218
Tests failed: 0
And FIB rule tests:
# ./fib_rule_tests.sh
[...]
Tests passed: 116
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record
the sizes of ringbufs for all connections that have ever appeared in
the net namespace. They are incremental and we cannot know the actual
ringbufs usage from these. So here introduces statistics for current
ringbufs usage of existing smc connections in the net namespace into
smc_stats, it will be incremented when new connection uses a ringbuf
and decremented when the ringbuf is unused.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently we have the statistics on sndbuf/RMB sizes of all connections
that have ever been on the link group, namely smc_stats_memsize. However
these statistics are incremental and since the ringbufs of link group
are allowed to be reused, we cannot know the actual allocated buffers
through these. So here introduces the statistic on actual allocated
ringbufs of the link group, it will be incremented when a new ringbuf is
added into buf_list and decremented when it is deleted from buf_list.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Integrity Policy Enforcement (IPE) is an LSM that provides an
complimentary approach to Mandatory Access Control than existing LSMs
today.
Existing LSMs have centered around the concept of access to a resource
should be controlled by the current user's credentials. IPE's approach,
is that access to a resource should be controlled by the system's trust
of a current resource.
The basis of this approach is defining a global policy to specify which
resource can be trusted.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Systemd has a helper called openat_report_new() that returns whether a
file was created anew or it already existed before for cases where
O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't
something that's specific to systemd but it's where I noticed it.
The current logic is that it first attempts to open the file without
O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both
flags. If that succeeds all is well. If it now reports EEXIST it
retries.
That works fairly well but some corner cases make this more involved. If
this operates on a dangling symlink the first openat() without O_CREAT |
O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL
will fail with EEXIST. The reason is that openat() without O_CREAT |
O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security
reasons. So it's not something we can really change unless we add an
explicit opt-in via O_FOLLOW which seems really ugly.
The caller could try and use fanotify() to register to listen for
creation events in the directory before calling openat(). The caller
could then compare the returned tid to its own tid to ensure that even
in threaded environments it actually created the file. That might work
but is a lot of work for something that should be fairly simple and I'm
uncertain about it's reliability.
The caller could use a bpf lsm hook to hook into security_file_open() to
figure out whether they created the file. That also seems a bit wild.
So let's add F_CREATED_QUERY which allows the caller to check whether
they actually did create the file. That has caveats of course but I
don't think they are problematic:
* In multi-threaded environments a thread can only be sure that it did
create the file if it calls openat() with O_CREAT. In other words,
it's obviously not enough to just go through it's fdtable and check
these fds because another thread could've created the file.
* If there's any codepaths where an openat() with O_CREAT would yield
the same struct file as that of another thread it would obviously
cause wrong results. I'm not aware of any such codepaths from openat()
itself. Imho, that would be a bug.
* Related to the previous point, calling the new fcntl() on files created
and opened via special-purpose system calls or ioctl()s would cause
wrong results only if the affected subsystem a) raises FMODE_CREATED
and b) may return the same struct file for two different calls. I'm
not seeing anything outside of regular VFS code that raises
FMODE_CREATED.
There is code for b) in e.g., the drm layer where the same struct file
is resurfaced but again FMODE_CREATED isn't used and it would be very
misleading if it did.
Link: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 [1]
Link: https://lore.kernel.org/r/20240724-work-fcntl-v1-1-e8153a2f1991@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We need the usb / thunderbolt fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the char/misc fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a comment in the uapi header using the wrong member name (Caleb)
- Fix KCSAN warning for a debug check in sqpoll (me)
- Two more NAPI tweaks (Olivier)
* tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux:
io_uring: fix user_data field name in comment
io_uring/sqpoll: annotate debug task == current with data_race()
io_uring/napi: remove duplicate io_napi_entry timeout assignation
io_uring/napi: check napi_enabled in io_napi_add() before proceeding
|
|
io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe
does not have a data field. Fix the comment to say `sqe->user_data`.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://github.com/axboe/liburing/pull/1206
Link: https://lore.kernel.org/r/20240816181526.3642732-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add new result codes to support TDR diagnostics in preparation for
Open Alliance 1000BaseT1 TDR support:
- ETHTOOL_A_CABLE_RESULT_CODE_NOISE: TDR not possible due to high noise
level.
- ETHTOOL_A_CABLE_RESULT_CODE_RESOLUTION_NOT_POSSIBLE: TDR resolution not
possible / out of distance.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240812073046.1728288-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml
c25504a0ba36 ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys")
be034ee6c33d ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties")
https://lore.kernel.org/20240815110934.56ae623a@canb.auug.org.au
drivers/net/dsa/vitesse-vsc73xx-core.c
5b9eebc2c7a5 ("net: dsa: vsc73xx: pass value in phy_write operation")
fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations")
2524d6c28bdc ("net: dsa: vsc73xx: use defined values in phy operations")
https://lore.kernel.org/20240813104039.429b9fe6@canb.auug.org.au
Resolve by using FIELD_PREP(), Stephen's resolution is simpler.
Adjacent changes:
net/vmw_vsock/af_vsock.c
69139d2919dd ("vsock: fix recursive ->recvmsg calls")
744500d81f81 ("vsock: add support for SIOCOUTQ ioctl")
Link: https://patch.msgid.link/20240815141149.33862-1-pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the `__struct_group()` helper to create a new tagged
`struct tc_u32_sel_hdr`. This structure groups together all the
members of the flexible `struct tc_u32_sel` except the flexible
array. As a result, the array is effectively separated from the
rest of the members without modifying the memory layout of the
flexible structure.
This new tagged struct will be used to fix problematic declarations
of middle-flex-arrays in composite structs[1].
[1] https://git.kernel.org/linus/d88cabfd9abc
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/e59fe833564ddc5b2cc83056a4c504be887d6193.1723586870.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix failure to start guests with kvm.use_gisa=0
- Panic if (un)share fails to maintain security.
ARM:
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
- Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
- Fix smatch issues
- Small cleanups
- Make x2APIC ID 100% readonly
- Fix typo in uapi constant
Generic:
- Use synchronize_srcu_expedited() on irqfd shutdown"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
KVM: x86: Make x2APIC ID 100% readonly
KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
KVM: SVM: Fix an error code in sev_gmem_post_populate()
KVM: SVM: Fix uninitialized variable bug
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
s390/uv: Panic for set and remove shared access UVC errors
KVM: s390: fix validity interception issue when gisa is switched off
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
...
|
|
"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
the UAPI, keep the current definition and add a new one with the fix.
Fix-suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Amit Shah <amit.shah@amd.com>
Message-ID: <20240814083113.21622-1-amit@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|