Age | Commit message (Collapse) | Author |
|
Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all
objects with attached connectors, not only inodes with attached
connectors.
This will be used to optimize fsnotify() calls on sb without any
type of marks.
Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The names of resources are used for the schema name presented to
user-space. The name used is rooted in a structure provided by the
architecture code because the names are different when CDP is enabled.
x86 implements this by swapping between two sets of resource structures
based on their alloc_enabled flag. The type of configuration in-use is
encoded in the name (and cbm_idx_offset).
Once the CDP behaviour is moved into the parts of resctrl that will
move to /fs/, there will be two struct resctrl_schema for one struct
rdt_resource. The schema describes the type of configuration being
applied to the resource. The name of the schema should be generated
by resctrl, base on the type of configuration. To do this struct
resctrl_schema needs to store the type of configuration in use for a
schema.
Create an enum resctrl_conf_type describing the options, and add it to
struct resctrl_schema. The underlying resources are still separate, as
cbm_idx_offset is still in use.
Temporarily label all the entries in rdt_resources_all[] and copy that
value to struct resctrl_schema. Copying the value ensures there is no
mismatch while the filesystem parts of resctrl are modified to use the
schema. Once the resources are merged, the filesystem code can assign
this value based on the schema being created.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lkml.kernel.org/r/20210728170637.25610-6-james.morse@arm.com
|
|
get_queue_type() comments that splict virtqueue is preferred, however,
the actual logic preferred packed virtqueues. Since firmware has not
supported packed virtqueues we ended up using split virtqueues as was
desired.
Since we do not advertise support for packed virtqueues, we add a check
to verify split virtqueues are indeed supported.
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210811053759.66752-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
we use a spinlock now pull in the correct header to
make vring.h self sufficient.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The return value of vdpa_alloc_device() macro is not very
clear, so that most of callers did the wrong check. Let's
add some comments to better document it.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210715080026.242-4-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
|
|
Resctrl exposes schemata to user-space, which allow the control values
to be specified for a group of tasks.
User-visible properties of the interface, (such as the schemata names
and how the values are parsed) are rooted in a struct provided by the
architecture code. (struct rdt_hw_resource). Once a second architecture
uses resctrl, this would allow user-visible properties to diverge
between architectures.
These properties should come from the resctrl code that will be common
to all architectures. Resctrl has no per-schema structure, only struct
rdt_{hw_,}resource. Create a struct resctrl_schema to hold the
rdt_resource. Before a second architecture can be supported, this
structure will also need to hold the schema name visible to user-space
and the type of configuration values for resctrl.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lkml.kernel.org/r/20210728170637.25610-4-james.morse@arm.com
|
|
resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
struct rdt_domain contains a mix of architecture private details and
properties of the filesystem interface user-space uses.
Continue by splitting struct rdt_domain, into an architecture private
'hw' struct, which contains the common resctrl structure that would be
used by any architecture. The hardware values in ctrl_val and mbps_val
need to be accessed via helpers to allow another architecture to convert
these into a different format if necessary. After this split, filesystem
code paths touching a 'hw' struct indicates where an abstraction is
needed.
Splitting this structure only moves types around, and should not lead
to any change in behaviour.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lkml.kernel.org/r/20210728170637.25610-3-james.morse@arm.com
|
|
Both PMC and cpuidle drivers are probed at the same init level and
cpuidle depends on the PMC suspend mode. Add new default suspend mode
that indicates whether PMC driver has been probed and reset the mode in
a case of deferred probe of the PMC driver.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
struct rdt_resource contains a mix of architecture private details
and properties of the filesystem interface user-space uses.
Start by splitting struct rdt_resource, into an architecture private
'hw' struct, which contains the common resctrl structure that would be
used by any architecture. The foreach helpers are most commonly used by
the filesystem code, and should return the common resctrl structure.
for_each_rdt_resource() is changed to walk the common structure in its
parent arch private structure.
Move as much of the structure as possible into the common structure
in the core code's header file. The x86 hardware accessors remain
part of the architecture private code, as do num_closid, mon_scale
and mbm_width.
mon_scale and mbm_width are used to detect overflow of the hardware
counters, and convert them from their native size to bytes. Any
cross-architecture abstraction should be in terms of bytes, making
these properties private.
The hardware's num_closid is kept in the private structure to force the
filesystem code to use a helper to access it. MPAM would return a single
value for the system, regardless of the resource. Using the helper
prevents this field from being confused with the version of num_closid
that is being exposed to user-space (added in a later patch).
After this split, filesystem code touching a 'hw' struct indicates
where an abstraction is needed.
Splitting this structure only moves types around, and should not lead
to any change in behaviour.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lkml.kernel.org/r/20210728170637.25610-2-james.morse@arm.com
|
|
The generic definition of ffz() is not defined in the same header files
as the generic definitions of ffs().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Use nfnetlink_unicast() instead of netlink_unicast() in nft_compat.
2) Remove call to nf_ct_l4proto_find() in flowtable offload timeout
fixup.
3) CLUSTERIP registers ARP hook on demand, from Florian.
4) Use clusterip_net to store pernet warning, also from Florian.
5) Remove struct netns_xt, from Florian Westphal.
6) Enable ebtables hooks in initns on demand, from Florian.
7) Allow to filter conntrack netlink dump per status bits,
from Florian Westphal.
8) Register x_tables hooks in initns on demand, from Florian.
9) Remove queue_handler from per-netns structure, again from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Move mt8183-pinfunc.h into include/dt-bindings/pinctrl so that we can
include it in yaml examples.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Link: https://lore.kernel.org/r/20210804044033.3047296-2-hsinyi@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Move mt8135-pinfunc.h into include/dt-bindings/pinctrl so that we can
include it in yaml examples.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Link: https://lore.kernel.org/r/20210804044033.3047296-1-hsinyi@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.15-2021-08-06:
amdgpu:
- Aldebaran fixes
- Powergating fix for Renoir
- Switch virtual DCE over to vkms based atomic modesetting
- Misc typo fixes
- PSP handling cleanups
- DC FP cleanups
- RAS fixes
- Wave debug improvements
- Freesync fix
- BACO/BOCO fixes
- Misc fixes
amdkfd:
- Expose gfx version in sysfs
- Aldebaran fixes
radeon:
- Coding style fix
- Typo fixes
- Pageflip fix
UAPI:
- amdkfd: SVM address range query
Proposed userspace: https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/memory_model_queries
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210806205248.3864-1-alexander.deucher@amd.com
|
|
For the smartreflex device, we need to disable smartreflex on SoC idle,
and have been using pm_runtime_irq_safe() to do that. But we want to
remove the irq_safe usage as PM runtime takes a permanent usage count
on the parent device with it.
In order to remove the need for pm_runtime_irq_safe(), let's gate
the clock directly in the driver. This removes the need to call PM runtime
during idle, and allows us to switch to using CPU_PM in the following
patch.
Note that the smartreflex interconnect target module is configured for smart
idle, but the clock does not have autoidle capability, and needs to be gated
manually. If the clock supported autoidle, we would not need to even gate
the clock.
With this change, we can now remove the related quirk flags for ti-sysc
also.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into drm-next
Bus: Make remove callback return void tag
Tag for other trees/branches to pull from in order to have a stable
place to build off of if they want to add new busses for 5.15.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fixed up merge conflict in drm]
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patchwork.freedesktop.org/patch/msgid/YPkwQwf0dUKnGA7L@kroah.com
|
|
The Intel 82426EX ISA Bridge (IB), a part of the Intel 82420EX PCIset,
implements PCI interrupt steering with a PIRQ router in the form of two
PIRQ Route Control registers, available in the PCI configuration space
at locations 0x66 and 0x67 for the PIRQ0# and PIRQ1# lines respectively.
The semantics is the same as with the PIIX router, however it is not
clear if BIOSes use register indices or line numbers as the cookie to
identify PCI interrupts in their routing tables and therefore support
either scheme.
The IB is directly attached to the Intel 82425EX PCI System Controller
(PSC) component of the chipset via a dedicated PSC/IB Link interface
rather than the host bus or PCI. Therefore it does not itself appear in
the PCI configuration space even though it responds to configuration
cycles addressing registers it implements. Use 82425EX's identification
then for determining the presence of the IB.
References:
[1] "82420EX PCIset Data Sheet, 82425EX PCI System Controller (PSC) and
82426EX ISA Bridge (IB)", Intel Corporation, Order Number:
290488-004, December 1995, Section 3.3.18 "PIRQ1RC/PIRQ0RC--PIRQ
Route Control Registers", p. 61
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107200213490.9461@angie.orcam.me.uk
|
|
The ALi M1487 ISA Bus Controller (IBC), a part of the ALi FinALi 486
chipset, implements PCI interrupt steering with a PIRQ router[1] in the
form of four 4-bit mappings, spread across two PCI INTx Routing Table
Mapping Registers, available in the port I/O space accessible indirectly
via the index/data register pair at 0x22/0x23, located at indices 0x42
and 0x43 for the INT1/INT2 and INT3/INT4 lines respectively.
Additionally there is a separate PCI INTx Sensitivity Register at index
0x44 in the same port I/O space, whose bits 3:0 select the trigger mode
for INT[4:1] lines respectively[2]. Manufacturer's documentation says
that this register has to be set consistently with the relevant ELCR
register[3]. Add a router-specific hook then and use it to handle this
register.
Accesses to the port I/O space concerned here need to be unlocked by
writing the value of 0xc5 to the Lock Register at index 0x03
beforehand[4]. Do so then and then lock access after use for safety.
The IBC is implemented as a peer bridge on the host bus rather than a
southbridge on PCI and therefore it does not itself appear in the PCI
configuration space. It is complemented by the M1489 Cache-Memory PCI
Controller (CMP) host-to-PCI bridge, so use that device's identification
for determining the presence of the IBC.
References:
[1] "M1489/M1487: 486 PCI Chip Set", Version 1.2, Acer Laboratories
Inc., July 1997, Section 4: "Configuration Registers", pp. 76-77
[2] same, p. 77
[3] same, Section 5: "M1489/M1487 Software Programming Guide", pp.
99-100
[4] same, Section 4: "Configuration Registers", p. 37
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107191702020.9461@angie.orcam.me.uk
|
|
With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads
could incur a cache line miss in invoke_softirq() and other places.
Replace the test with a static key to avoid the potential cache miss.
[ tglx: Dropped the IDE part, removed the export and updated blk-mq ]
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tanner Love <tannerlove@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210602180338.3324213-1-tannerlove.kernel@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
pull-request: mlx5-next 2020-08-9
This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.
1) mlx5 single E-Switch FDB for lag
2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
3) Add DCS caps & fields support
[1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/
[2] https://patchwork.kernel.org/project/netdevbpf/patch/0e3364dab7e0e4eea5423878b01aa42470be8d36.1626609184.git.leonro@nvidia.com/
[3] https://patchwork.kernel.org/project/netdevbpf/patch/55e1d69bef1fbfa5cf195c0bfcbe35c8019de35e.1624258894.git.leonro@nvidia.com/
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
net/mlx5: Add DCS caps & fields support
====================
Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ignore fdb flags when adding port extern learn entries and always set
BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
closest to the behaviour we had before and avoids breaking any use cases
which were allowed.
This patch fixes iproute2 calls which assume NUD_PERMANENT and were
allowed before, example:
$ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
Extern learn entries are allowed to roam, but do not expire, so static
or dynamic flags make no sense for them.
Also add a comment for future reference.
Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space")
Fixes: 0541a6293298 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that there is an alternate method for returning an auth_stat
value, replace the RQ_AUTHERR flag with use of that new method.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
I'd like to take commit 4532608d71c8 ("SUNRPC: Clean up generic
dispatcher code") even further by using only private local SVC
dispatchers for all kernel RPC services. This change would enable
the removal of the logic that switches between
svc_generic_dispatch() and a service's private dispatcher, and
simplify the invocation of the service's pc_release method
so that humans can visually verify that it is always invoked
properly.
All that will come later.
First, let's provide a better way to return authentication errors
from SVC dispatcher functions. Instead of overloading the dispatch
method's *statp argument, add a field to struct svc_rqst that can
hold an error value.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Hide the DRM midlayer behind CONFIG_DRM_LEGACY, make functions use
the prefix drm_legacy_, and move declarations to drm_legacy.h.
In struct drm_device, move the fields irq and irq_enabled behind
CONFIG_DRM_LEGACY.
All callers have been updated.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210803090704.32152-15-tzimmermann@suse.de
|
|
DRM IRQ helpers will become legacy. The function devm_drm_irq_install()
is unused and won't be required later.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210803090704.32152-14-tzimmermann@suse.de
|
|
'nolibc.2021.07.20c', 'tasks.2021.07.20c', 'torture.2021.07.27a' and 'torturescript.2021.07.27a' into HEAD
doc.2021.07.20c: Documentation updates.
fixes.2021.08.06a: Miscellaneous fixes.
nocb.2021.07.20c: Callback-offloading (NOCB CPU) updates.
nolibc.2021.07.20c: Tiny userspace library updates.
tasks.2021.07.20c: Tasks RCU updates.
torture.2021.07.27a: In-kernel torture-test updates.
torturescript.2021.07.27a: Torture-test scripting updates.
|
|
For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.
Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
01. cache
02. crypt
03. integrity
04. linear
05. mirror
06. multipath
07. raid
08. snapshot
09. striped
10. verity
For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.
For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.
/etc/ima/ima-policy
measure func=CRITICAL_DATA label=device-mapper template=ima-buf
The measurements will be reflected in the IMA logs, which are located at:
/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements
These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.
The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.
Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.
Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
/s/reatdown/teardown/
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1621585689-177398-1-git-send-email-john.garry@huawei.com
|
|
Resuming timekeeping is a clock-was-set event and uses the clock-was-set
notification mechanism. This is in the way of making the clock-was-set
update for hrtimers selective so unnecessary IPIs are avoided when a CPU
base does not have timers queued which are affected by the clock setting.
Distangle it by invoking hrtimer_resume() on each unfreezing CPU and invoke
the new timerfd_resume() function from timekeeping_resume() which is the
only place where this is needed.
Rename hrtimer_resume() to hrtimer_resume_local() to reflect the change.
With this the clock_was_set*() functions are not longer required to IPI all
CPUs unconditionally and can get some smarts to avoid them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210713135158.488853478@linutronix.de
|
|
Resuming timekeeping is a clock-was-set event and uses the clock-was-set
notification mechanism. This is in the way of making the clock-was-set
update for hrtimers selective so unnecessary IPIs are avoided when a CPU
base does not have timers queued which are affected by the clock setting.
Provide a seperate timerfd_resume() interface so the resume logic and the
clock-was-set mechanism can be distangled in the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210713135158.395287410@linutronix.de
|
|
If high resolution timers are disabled the timerfd notification about a
clock was set event is not happening for all cases which use
clock_was_set_delayed() because that's a NOP for HIGHRES=n, which is wrong.
Make clock_was_set_delayed() unconditially available to fix that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210713135158.196661266@linutronix.de
|
|
VQs may be accessed to mark the device broken while they are
created/destroyed. Hence protect the access to the vqs list.
Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210721142648.1525924-4-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This was done to detect when the pernet->init() function was not called
yet, by checking if net->nf.queue_handler is NULL.
Once the nfnetlink_queue module is active, all struct net pointers
contain the same address. So place this back in nf_queue.c.
Handle the 'netns error unwind' test by checking nfnl_queue_net for a
NULL pointer and add a comment for this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
queueing
There are several scenarios that can result in posix_cpu_timer_set()
not queueing the timer but still leaving the threadgroup cputime counter
running or keeping the tick dependency around for a random amount of time.
1) If timer_settime() is called with a 0 expiration on a timer that is
already disabled, the process wide cputime counter will be started
and won't ever get a chance to be stopped by stop_process_timer()
since no timer is actually armed to be processed.
The following snippet is enough to trigger the issue.
void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, TIMER_ABSTIME, &val, NULL);
timer_delete(id);
}
2) If timer_settime() is called with a 0 expiration on a timer that is
already armed, the timer is dequeued but not really disarmed. So the
process wide cputime counter and the tick dependency may still remain
a while around.
The following code snippet keeps this overhead around for one week after
the timer deletion:
void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };
val.it_value.tv_sec = 604800;
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, 0, &val, NULL);
timer_delete(id);
}
3) If the timer was initially deactivated, this call to timer_settime()
with an early expiration may have started the process wide cputime
counter even though the timer hasn't been queued and armed because it
has fired early and inline within posix_cpu_timer_set() itself. As a
result the process wide cputime counter may never stop until a new
timer is ever armed in the future.
The following code snippet can reproduce this:
void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };
signal(SIGALRM, SIG_IGN);
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
val.it_value.tv_nsec = 1;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
}
4) If the timer was initially armed with a former expiration value
before this call to timer_settime() and the current call sets an
early deadline that has already expired, the timer fires inline
within posix_cpu_timer_set(). In this case it must have been dequeued
before firing inline with its new expiration value, yet it hasn't
been disarmed in this case. So the process wide cputime counter and
the tick dependency may still be around for a while even after the
timer fired.
The following code snippet can reproduce this:
void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };
signal(SIGALRM, SIG_IGN);
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
val.it_value.tv_sec = 100;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
val.it_value.tv_sec = 0;
val.it_value.tv_nsec = 1;
timer_settime(id, TIMER_ABSTIME, &val, NULL);
}
Fix all these issues with triggering the related base next expiration
recalculation on the next tick. This also implies to re-evaluate the need
to keep around the process wide cputime counter and the tick dependency, in
a similar fashion to disarm_timer().
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210726125513.271824-7-frederic@kernel.org
|
|
A timer deletion only dequeues the timer but it doesn't shutdown
the related costly process wide cputimer counter and the tick dependency.
The following code snippet keeps this overhead around for one week after
the timer deletion:
void trigger_process_counter(void)
{
timer_t id;
struct itimerspec val = { };
val.it_value.tv_sec = 604800;
timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
timer_settime(id, 0, &val, NULL);
timer_delete(id);
}
Make sure the next target's tick recalculates the nearest expiration and
clears the process wide counter and tick dependency if necessary.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210726125513.271824-3-frederic@kernel.org
|
|
Starting the process wide cputime counter needs to be done in the same
sighand locking sequence than actually arming the related timer otherwise
this races against concurrent timers setting/expiring in the same
threadgroup.
Detecting that the cputime counter is started without holding the sighand
lock is a first step toward debugging such situations.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org
|
|
Daniel Borkmann says:
====================
bpf-next 2021-08-10
We've added 31 non-merge commits during the last 8 day(s) which contain
a total of 28 files changed, 3644 insertions(+), 519 deletions(-).
1) Native XDP support for bonding driver & related BPF selftests, from Jussi Maki.
2) Large batch of new BPF JIT tests for test_bpf.ko that came out as a result from
32-bit MIPS JIT development, from Johan Almbladh.
3) Rewrite of netcnt BPF selftest and merge into test_progs, from Stanislav Fomichev.
4) Fix XDP bpf_prog_test_run infra after net to net-next merge, from Andrii Nakryiko.
5) Follow-up fix in unix_bpf_update_proto() to enforce socket type, from Cong Wang.
6) Fix bpf-iter-tcp4 selftest to print the correct dest IP, from Jose Blanquicet.
7) Various misc BPF XDP sample improvements, from Niklas Söderlund, Matthew Cover,
and Muhammad Falak R Wani.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
bpf, tests: Add tail call test suite
bpf, tests: Add tests for BPF_CMPXCHG
bpf, tests: Add tests for atomic operations
bpf, tests: Add test for 32-bit context pointer argument passing
bpf, tests: Add branch conversion JIT test
bpf, tests: Add word-order tests for load/store of double words
bpf, tests: Add tests for ALU operations implemented with function calls
bpf, tests: Add more ALU64 BPF_MUL tests
bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64
bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH
bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations
bpf, tests: Fix typos in test case descriptions
bpf, tests: Add BPF_MOV tests for zero and sign extension
bpf, tests: Add BPF_JMP32 test cases
samples, bpf: Add an explict comment to handle nested vlan tagging.
selftests/bpf: Add tests for XDP bonding
selftests/bpf: Fix xdp_tx.c prog section name
net, core: Allow netdev_lower_get_next_private_rcu in bh context
bpf, devmap: Exclude XDP broadcast to master device
net, bonding: Add XDP support to the bonding driver
...
====================
Link: https://lore.kernel.org/r/20210810130038.16927-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Borkmann says:
====================
bpf 2021-08-10
We've added 5 non-merge commits during the last 2 day(s) which contain
a total of 7 files changed, 27 insertions(+), 15 deletions(-).
1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song.
2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song.
3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann.
4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, core: Fix kernel-doc notation
bpf: Fix potentially incorrect results with bpf_get_local_storage()
bpf: Add missing bpf_read_[un]lock_trace() for syscall program
bpf: Add lockdown check for probe_write_user helper
bpf: Add _kernel suffix to internal lockdown_bpf_read
====================
Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the !CONFIG_BLOCK build after the recent cleanup.
Fixes: 5ed964f8e54e ("mm: hide laptop_mode_wb_timer entirely behind the BDI API")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
The new UAPI will have its own firewall, and we don't want to run
the firewall in the Host1x driver for those jobs. As such, add a
parameter to host1x_job_alloc to specify if we want to skip the
firewall in the Host1x driver.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.
Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.
Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.
While at it, use 32-bit waits on chips that support them.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.
On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.
The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up. The timeout can potentially be removed in the future
after job tracking code has been refactored.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Add device tree binding documentation and header file for Renesas
RZ/G2L pinctrl.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210727112328.18809-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
allows userspace applications to control whether a pidfd information
record containing a pidfd is to be returned alongside the generic
event metadata for each event.
If FAN_REPORT_PIDFD is enabled for a notification group, an additional
struct fanotify_event_info_pidfd object type will be supplied
alongside the generic struct fanotify_event_metadata for a single
event. This functionality is analogous to that of FAN_REPORT_FID in
terms of how the event structure is supplied to a userspace
application. Usage of FAN_REPORT_PIDFD with
FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a
struct fanotify_event_info_pidfd object will likely follow any struct
fanotify_event_info_fid object.
Currently, the usage of the FAN_REPORT_TID flag is not permitted along
with FAN_REPORT_PIDFD as the pidfd API currently only supports the
creation of pidfds for thread-group leaders. Additionally, usage of
the FAN_REPORT_PIDFD flag is limited to privileged processes only
i.e. event listeners that are running with the CAP_SYS_ADMIN
capability. Attempting to supply the FAN_REPORT_TID initialization
flags with FAN_REPORT_PIDFD or creating a notification group without
CAP_SYS_ADMIN will result with -EINVAL being returned to the caller.
In the event of a pidfd creation error, there are two types of error
values that can be reported back to the listener. There is
FAN_NOPIDFD, which will be reported in cases where the process
responsible for generating the event has terminated prior to the event
listener being able to read the event. Then there is FAN_EPIDFD, which
will be reported when a more generic pidfd creation error has occurred
when fanotify calls pidfd_create().
Link: https://lore.kernel.org/r/5f9e09cff7ed62bfaa51c1369e0f7ea5f16a91aa.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The copy_info_records_to_user() helper allows for the separation of
info record copying routines/conditionals from copy_event_to_user(),
which reduces the overall clutter within this function. This becomes
especially true as we start introducing additional info records in the
future i.e. struct fanotify_event_info_pidfd. On success, this helper
returns the total amount of bytes that have been copied into the user
supplied buffer and on error, a negative value is returned to the
caller.
The newly defined macro FANOTIFY_INFO_MODES can be used to obtain info
record types that have been enabled for a specific notification
group. This macro becomes useful in the subsequent patch when the
FAN_REPORT_PIDFD initialization flag is introduced.
Link: https://lore.kernel.org/r/8872947dfe12ce8ae6e9a7f2d49ea29bc8006af0.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|