Age | Commit message (Collapse) | Author |
|
Make the function available for outside use and fortify it against NULL
kobject.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.
The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment.
Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.
Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently we wait for all cgwbs to get released in cgwb_bdi_destroy()
(called from bdi_unregister()). That is however unnecessary now when
cgwb->bdi is a proper refcounted reference (thus bdi cannot get
released before all cgwbs are released) and when cgwb_bdi_destroy()
shuts down writeback directly.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy()
which also means that writeback has been shutdown on them. Since this
wait is going away, directly shutdown writeback on cgwbs from
cgwb_bdi_destroy() to avoid live writeback structures after
bdi_unregister() has finished. To make that safe with concurrent
shutdown from cgwb_release_workfn(), we also have to make sure
wb_shutdown() returns only after the bdi_writeback structure is really
shutdown.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
congested->bdi pointer is used only to be able to remove congested
structure from bdi->cgwb_congested_tree on structure release. Moreover
the pointer can become NULL when we unregister the bdi. Rename the field
to __bdi and add a comment to make it more explicit this is internal
stuff of memcg writeback code and people should not use the field as
such use will be likely race prone.
We do not bother with converting congested->bdi to a proper refcounted
reference. It will be slightly ugly to special-case bdi->wb.congested to
avoid effectively a cyclic reference of bdi to itself and the reference
gets cleared from bdi_unregister() making it impossible to reference
a freed bdi.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
|
|
This patch adds hash of maps support (hashmap->bpf_map).
BPF_MAP_TYPE_HASH_OF_MAPS is added.
A map-in-map contains a pointer to another map and lets call
this pointer 'inner_map_ptr'.
Notes on deleting inner_map_ptr from a hash map:
1. For BPF_F_NO_PREALLOC map-in-map, when deleting
an inner_map_ptr, the htab_elem itself will go through
a rcu grace period and the inner_map_ptr resides
in the htab_elem.
2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC),
when deleting an inner_map_ptr, the htab_elem may
get reused immediately. This situation is similar
to the existing prealloc-ated use cases.
However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls
inner_map->ops->map_free(inner_map) which will go
through a rcu grace period (i.e. all bpf_map's map_free
currently goes through a rcu grace period). Hence,
the inner_map_ptr is still safe for the rcu reader side.
This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the
check_map_prealloc() in the verifier. preallocation is a
must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect
heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map
is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using
map-in-map first.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a few helper funcs to enable map-in-map
support (i.e. outer_map->inner_map). The first outer_map type
BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch.
The next patch will introduce a hash of maps type.
Any bpf map type can be acted as an inner_map. The exception
is BPF_MAP_TYPE_PROG_ARRAY because the extra level of
indirection makes it harder to verify the owner_prog_type
and owner_jited.
Multi-level map-in-map is not supported (i.e. map->map is ok
but not map->map->map).
When adding an inner_map to an outer_map, it currently checks the
map_type, key_size, value_size, map_flags, max_entries and ops.
The verifier also uses those map's properties to do static analysis.
map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT
is using a preallocated hashtab for the inner_hash also. ops and
max_entries are needed to generate inlined map-lookup instructions.
For simplicity reason, a simple '==' test is used for both map_flags
and max_entries. The equality of ops is implied by the equality of
map_type.
During outer_map creation time, an inner_map_fd is needed to create an
outer_map. However, the inner_map_fd's life time does not depend on the
outer_map. The inner_map_fd is merely used to initialize
the inner_map_meta of the outer_map.
Also, for the outer_map:
* It allows element update and delete from syscall
* It allows element lookup from bpf_prog
The above is similar to the current fd_array pattern.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit adds a new sysctl accept_ra_rt_info_min_plen that
defines the minimum acceptable prefix length of Route Information
Options. The new sysctl is intended to be used together with
accept_ra_rt_info_max_plen to configure a range of acceptable
prefix lengths. It is useful to prevent misconfigurations from
unintentionally blackholing too much of the IPv6 address space
(e.g., home routers announcing RIOs for fc00::/7, which is
incorrect).
Signed-off-by: Joel Scherpelz <jscherpelz@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the indirect MMD read/write methods which are now no longer
necessary.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make phy_(read|write)_mmd() generic 802.3 clause 45 register accessors
for both Clause 22 and Clause 45 PHYs, using either the direct register
reading for Clause 45, or the indirect method for Clause 22 PHYs.
Allow this behaviour to be overriden by PHY drivers where necessary.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the phy_(read|write)__mmd() helpers out of line, they will become
our main MMD accessor functions, and so will be a little more complex.
This complexity doesn't belong in an inline function. Also move the
_indirect variants as well to keep like functionality together.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to the recent multi-queue changes the driver would configure the
queues to use the AVB mode, but the mode then got switched to DCB. The
hardware still works fine in DCB mode, but my testing capabilities are
limited, so it's safer to revert to the prior setting anyway.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The introduction of reserved regions has left a couple of rough edges
which we could do with sorting out sooner rather than later. Since we
are not yet addressing the potential dynamic aspect of software-managed
reservations and presenting them at arbitrary fixed addresses, it is
incongruous that we end up displaying hardware vs. software-managed MSI
regions to userspace differently, especially since ARM-based systems may
actually require one or the other, or even potentially both at once,
(which iommu-dma currently has no hope of dealing with at all). Let's
resolve the former user-visible inconsistency ASAP before the ABI has
been baked into a kernel release, in a way that also lays the groundwork
for the latter shortcoming to be addressed by follow-up patches.
For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
IOMMU_RESV_MSI to describe the hardware type, and document everything a
little bit. Since the x86 MSI remapping hardware falls squarely under
this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
so that we tell the same story to userspace across all platforms.
Secondly, as the various region types require quite different handling,
and it really makes little sense to ever try combining them, convert the
bitfield-esque #defines to a plain enum in the process before anyone
gets the wrong impression.
Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: kvm@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Thirds set of IIO fixes for the 4.11 cycle.
* core
- iio sw-device - ensure configfs is enabled both when building as module
and built in.
* ak8974
- drop incorrect __exit markup on remove.
* hid-sensor-trigger
- code reorganise to avoid losing settings if a power cycle occurs during S3.
* lsm6dsx
- fix incorrect overwrite of parts of FIFO_CTRL2 register during watermark
configuration.
* ti-am335x
- fix a hard to hit bug when reenabling from a fifo overrun by waiting for
current cycle to finish.
|
|
Unfortunately the HWMON_T_ALARM define was missing,
although the associated entry was present in hwmon_temp_attributes.
This is needed to convert drivers to the new interface which use channel
based alarms.
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
while packets are collected on the error queue.
So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
is not enough to safely assume that the skb contains
OPT_STATS data.
Add a bit in sock_exterr_skb to indicate whether the
skb contains opt_stats data.
Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Reported-by: JongHwan Kim <zzoru007@gmail.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add rhashtable_lookup_get_insert_fast for fixed keys, similar to
rhashtable_lookup_get_insert_key for explicit keys.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
into fixes
Reset controller fixes for v4.11
Fix optional reset_control_get_stubs to return NULL and remove warnings
from reset_control_* stubs.
This fixes commit bb475230b8e5 ("reset: make optional functions really
optional"), which was merged in reset-for-4.11, and would cause consumer
drivers depending on the new behaviour of optional resets to fail probing
if RESET_CONTROLLER Kconfig option is disabled.
* tag 'reset-fixes-for-4.11' of git://git.pengutronix.de/git/pza/linux:
reset: fix optional reset_control_get stubs to return NULL
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
This patch adds the configuration of RX queues' routing.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the configuration of RX and TX queues' priority.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function usbnet_{get|set}_settings is no longer used,
so we remove it.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ethtool api {get|set}_settings is deprecated.
We add the new api {get|set}_link_ksettings to this driver.
As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So that we can cancel a queued pkt later if necessary.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here is the first set of GPIO fixes for 4.11. It was delayed a bit
beacuse I was chicken when linux-next was not rotating last week.
This hits the ST serial driver in drivers/tty/serial and that has an
ACK from Greg, he suggested to keep the old GPIO fwnode API around to
smoothen things in the merge Windod and those have now served their
purpose so we take them out and convert the last driver to the new
API.
Apart from that it's fixes as usual.
Summary:
- set the parent on the Altera A10SR driver, also fix high level
IRQs.
- fix error path on the mockup driver.
- compilation noise about unused functions fixed.
- fix missed interrupts on the MCP23S08 expander, this is also tagged
for stable.
- retire the interrim helpers devm_get_gpiod_from_child() used to
smoothen merging in the merge window"
* tag 'gpio-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio:mcp23s08 Fixed missing interrupts
serial: st-asc: Use new GPIOD API to obtain RTS pin
gpio: altera: Use handle_level_irq when configured as a level_high
gpio: xgene: mark PM functions as __maybe_unused
gpio: mockup: return -EFAULT if copy_from_user() fails
gpio: altera-a10sr: Set gpio_chip parent property
|
|
When RESET_CONTROLLER is not enabled, the optional reset_control_get
stubs should now also return NULL.
Since it is now valid for reset_control_assert/deassert/reset/status/put
to be called unconditionally, with NULL as an argument for optional
resets, the stubs are not allowed to warn anymore.
Fixes: bb475230b8e5 ("reset: make optional functions really optional")
Reported-by: Andrzej Hajda <a.hajda@samsung.com>
Tested-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Ramiro Oliveira <Ramiro.Oliveira@synopsys.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
|
|
Currently, statistics are gathered in ~0.13s windows, and users grab the
statistics whenever they need them. This is not ideal for both in-tree
users:
1. Writeback throttling wants its own dynamically sized window of
statistics. Since the blk-stats statistics are reset after every
window and the wbt windows don't line up with the blk-stats windows,
wbt doesn't see every I/O.
2. Polling currently grabs the statistics on every I/O. Again, depending
on how the window lines up, we may miss some I/Os. It's also
unnecessary overhead to get the statistics on every I/O; the hybrid
polling heuristic would be just as happy with the statistics from the
previous full window.
This reworks the blk-stats infrastructure to be callback-based: users
register a callback that they want called at a given time with all of
the statistics from the window during which the callback was active.
Users can dynamically bucketize the statistics. wbt and polling both
currently use read vs. write, but polling can be extended to further
subdivide based on request size.
The callbacks are kept on an RCU list, and each callback has percpu
stats buffers. There will only be a few users, so the overhead on the
I/O completion side is low. The stats flushing is also simplified
considerably: since the timer function is responsible for clearing the
statistics, we don't have to worry about stale statistics.
wbt is a trivial conversion. After the conversion, the windowing problem
mentioned above is fixed.
For polling, we register an extra callback that caches the previous
window's statistics in the struct request_queue for the hybrid polling
heuristic to use.
Since we no longer have a single stats buffer for the request queue,
this also removes the sysfs and debugfs stats entries. To replace those,
we add a debugfs entry for the poll statistics.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This is an implementation detail that no-one outside of blk-stat.c uses.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We need to store the minor number each drivers. In case of hidraw, the
minor number is stored stores in struct hidraw. But hiddev's minor is
located in struct hid_device.
The hid-core driver announces a kernel message which driver is loaded when
HID device connected, but hiddev's minor number is always zero. To proper
display hiddev's minor number, we need to store the minor number asked from
usb core and do some refactoring work (move from hiddev.c to hiddev.h) to
access hiddev in hid-core.
[jkosina@suse.cz: rebase on top of newer codebase]
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
It looks like a bunch of devices do not like to be polled
for their reports at init time. When you look into the details,
it seems that for those that are requiring the quirk
HID_QUIRK_NO_INIT_REPORTS, the driver fails to retrieve part
of the features/inputs while others (more generic) work.
IMO, it should be acceptable to remove the need for the quirk
in the general case. On the small amount of cases where
we actually need to read the current values, the driver
in charge (hid-mt or wacom) already retrieves the features
manually.
There are 2 cases where we might need to retrieve the reports at
init:
1. hiddev devices with specific use-space tool
2. a device that would require the driver to fetch a specific
feature/input at plug
For case 2, I have seen this a few time on hid-multitouch. It
is solved in hid-multitouch directly by fetching the feature.
I hope it won't be too common and this can be solved on a per-case
basis (crossing fingers).
For case 1, we moved the implementation of HID_QUIRK_NO_INIT_REPORTS
in hiddev. When somebody starts calling ioctls that needs an initial
update, the hiddev device will fetch the initial state of the reports
to mimic the current behavior. This adds a small amount of time during
the first HIDIOCGUSAGE(S), but it should be acceptable in
most cases. To keep the currently known broken devices, we have to
keep around HID_QUIRK_NO_INIT_REPORTS, but the scope will only be
for hiddev.
Note that I don't think hidraw would be affected and I checked that
the FF drivers that need to interact with the report fields are all
using output reports, which are not initialized by
usbhid_init_reports().
NO_INIT_INPUT_REPORTS is then replaced by HID_QUIRK_NO_INIT_REPORTS:
there is no point keeping it for just one device.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
XHCI debug capability (DbC) is an optional but standalone
functionality provided by an xHCI host controller. Software
learns this capability by walking through the extended
capability list of the host. XHCI specification describes
DbC in section 7.6.
This patch introduces the code to probe and initialize the
debug capability hardware during early boot. With hardware
initialized, the debug target (system on which this code is
running) will present a debug device through the debug port
(normally the first USB3 port). The debug device is fully
compliant with the USB framework and provides the equivalent
of a very high performance (USB3) full-duplex serial link
between the debug host and target. The DbC functionality is
independent of the xHCI host. There isn't any precondition
from the xHCI host side for the DbC to work.
One use for this feature is kernel debugging, for example
when your machine crashes very early before the regular
console code is initialized. Other uses include simpler,
lockless logging instead of a full-blown printk console
driver and klogd.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/1490083293-3792-3-git-send-email-baolu.lu@linux.intel.com
[ Small fix to the Kconfig help text. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.
When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991
Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
is enabled (and thus the CPUID instruction is not available) or 1 if
CPUID faulting is not enabled.
ARCH_SET_CPUID: Set the CPUID state to the second argument. If
cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
be deactivated. Returns ENODEV if CPUID faulting is not supported on
this system.
The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat
mode on x86-64. This allows to have arch_prctls that are not specific to 64
bits.
On UML, simply stub out this syscall.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
get_user_pages_fast() function
This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.
Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's
only implemented on x86. On non-x86, the new code will be compiled out.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 acpi fixes from Thomas Gleixner:
"This update deals with the fallout of the recent work to make
cpuid/node mappings persistent.
It turned out that the boot time ACPI based mapping tripped over ACPI
inconsistencies and caused regressions. It's partially reverted and
the fragile part replaced by an implementation which makes the mapping
persistent when a CPU goes online for the first time"
* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
acpi/processor: Check for duplicate processor ids at hotplug time
acpi/processor: Implement DEVICE operator for processor enumeration
x86/acpi: Restore the order of CPU IDs
Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
|
|
This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of
hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted.
Delete it now.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
non-root cgroups
Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator. Once the new kthread starts
running, it initializes itself and wakes up the creator. The creator
then can further configure the kthread and then let it start doing its
job by waking it up.
In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland. When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity. This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.
Unfortunately, the cgroup side of protection is racy. While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.
This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one. Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.
This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland. kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed. The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.
It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side. Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.
v2: Switch to a simpler implementation using a new task_struct bit
field suggested by Oleg.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-debugged-by: Chris Mason <clm@fb.com>
Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3)
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem()
into a sequence of bpf instructions.
When JIT is on the sequence of bpf instructions is the sequence
of native cpu instructions with significantly faster performance
than indirect call and two function's prologue/epilogue.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.
In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.
For such Hypervisors, there is a race condition between the VF's
shutdown method and its internal-error detection/reset thread.
The internal-error detection/reset thread (which runs every 5 seconds) also
detects a disabled comm channel. If the internal-error detection/reset
flow wins the race, we still get delays (while that flow tries repeatedly
to detect comm-channel recovery).
The cited commit fixed the command timeout problem when the
internal-error detection/reset flow loses the race.
This commit avoids the unneeded delays when the internal-error
detection/reset flow wins.
Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The last caller of assert_held_device_hotplug() is gone, so remove it again.
Link: http://lkml.kernel.org/r/20170314125226.16779-3-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a prototype of task_struct to fix below warning on arm64.
In file included from arch/arm64/kernel/probes/kprobes.c:19:0:
include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
As same as other types (kmem_cache, page, and vm_struct) this adds a
prototype of task_struct data structure on top of kasan.h.
[arnd] A related warning was fixed before, but now appears in a
different line in the same file in v4.11-rc2. The patch from Masami
Hiramatsu still seems appropriate, so let's take his version.
Fixes: 71af2ed5eeea ("kasan, sched/headers: Remove <linux/sched.h> from <linux/kasan.h>")
Link: https://patchwork.kernel.org/patch/9569839/
Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction
decoder used in the Intel PT code to study hot paths to samples (Andi Kleen)
Kernel changes:
- Default UPROBES_EVENTS to Y (Alexei Starovoitov)
- Fix check for kretprobe offset within function entry (Naveen N. Rao)
Infrastructure changes:
- Introduce util func is_sdt_event() (Ravi Bangoria)
- Make perf_event__synthesize_mmap_events() scale on older kernels where
reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The CCP driver generally uses a round-robin approach when
assigning operations to available CCPs. For the DMA engine,
however, the DMA mappings of the SGs are associated with a
specific CCP. When an IOMMU is enabled, the IOMMU is
programmed based on this specific device.
If the DMA operations are not performed by that specific
CCP then addressing errors and I/O page faults will occur.
Update the CCP driver to allow a specific CCP device to be
requested for an operation and use this in the DMA engine
support.
Cc: <stable@vger.kernel.org> # 4.9.x-
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If a PER_CPU struct which contains a spin_lock is statically initialized
via:
DEFINE_PER_CPU(struct foo, bla) = {
.lock = __SPIN_LOCK_UNLOCKED(bla.lock)
};
then lockdep assigns a seperate key to each lock because the logic for
assigning a key to statically initialized locks is to use the address as
the key. With per CPU locks the address is obvioulsy different on each CPU.
That's wrong, because all locks should have the same key.
To solve this the following modifications are required:
1) Extend the is_kernel/module_percpu_addr() functions to hand back the
canonical address of the per CPU address, i.e. the per CPU address
minus the per CPU offset.
2) Check the lock address with these functions and if the per CPU check
matches use the returned canonical address as the lock key, so all per
CPU locks have the same key.
3) Move the static_obj(key) check into look_up_lock_class() so this check
can be avoided for statically initialized per CPU locks. That's
required because the canonical address fails the static_obj(key) check
for obvious reasons.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Merged Dan's fixups for !MODULES and !SMP into this patch. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit:
f8319483f57f ("locking/lockdep: Provide a type check for lock_is_held")
didn't fully cover rwsems as downgrade_write() was left out.
Introduce lock_downgrade() and use it to add new checks.
See-also: http://marc.info/?l=linux-kernel&m=148581164003149&w=2
Originally-written-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1486053497-9948-3-git-send-email-hooanon05g@gmail.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for adding more flags to perf AUX records, introduce a
separate API for setting the flags for a session, rather than appending
more bool arguments to perf_aux_output_end. This allows to set each
flag at the time a corresponding condition is detected, instead of
tracking it in each driver's private state.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With the recent introduction of per-channel tasklet, we need to update
the way we handle the 3 concurrency issues:
1. hv_process_channel_removal -> percpu_channel_deq vs.
vmbus_chan_sched -> list_for_each_entry(..., percpu_list);
2. vmbus_process_offer -> percpu_channel_enq/deq vs. vmbus_chan_sched.
3. vmbus_close_internal vs. the per-channel tasklet vmbus_on_event;
The first 2 issues can be handled by Stephen's recent patch
"vmbus: use rcu for per-cpu channel list", and the third issue
can be handled by calling tasklet_disable in vmbus_close_internal here.
We don't need the original hv_event_tasklet_disable/enable since we
now use per-channel tasklet instead of the previous per-CPU tasklet,
and actually we must remove them due to the side effect now:
vmbus_process_offer -> hv_event_tasklet_enable -> tasklet_schedule will
start the per-channel callback prematurely, cauing NULL dereferencing
(the channel may haven't been properly configured to run the callback yet).
Fixes: 631e63a9f346 ("vmbus: change to per channel tasklet")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|