Age | Commit message (Collapse) | Author |
|
|
|
mmc_start_areq() is an internal mmc core API. Move the declaration
accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Add CQE host operations, capabilities, and host members.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
NVMe 1.3 specification defines the Optional Admin Command Support feature
flags, bit 8 set to '1' then the controller supports the Doorbell Buffer
Config command. Bit 7 is used for Virtualization Mangement command.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Cc: stable@vger.kernel.org
|
|
There are new types and helpers that are supposed to be used in new code.
As a preparation to get rid of legacy types and API functions do
the conversion here.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Most of the information needed to issue requests to a CQE is already in
struct mmc_request and struct mmc_data. Add data block address, some flags,
and the task id (tag), and allow for cmd being NULL which it is for CQE
tasks.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Packed commands support was removed but some bits got left behind. Remove
them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
There is one SDHI instance on Gen2 which does not have the CBSY bit.
So, turn CBSY usage into an extra flag and set it accordingly. This has
the additional advantage that we can also set it for other incarnations
later.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Chris Brandt <Chris.Brandt@renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Starting with the A83T SoC, Allwinner introduced a new timing mode for
its MMC clocks. The new mode changes how the MMC controller sample and
output clocks are delayed to match chip and board specifics. There are
two controls for this, one on the CCU side controlling how the clocks
behave, and one in the MMC controller controlling what inputs to take
and how to route them.
In the old mode, the MMC clock had 2 child clocks providing the output
and sample clocks, which could be delayed by a number of clock cycles
measured from the MMC clock's parent.
With the new mode, the 2 delay clocks are no longer active. Instead,
the delays and associated controls are moved into the MMC controller.
The output of the MMC clock is also halved.
The difference in how things are wired between the modes means that the
clock controls and the MMC controls must match. To achieve this in a
clear, explicit way, we introduce two functions for the MMC driver to
use: one queries the hardware for the current mode set, and the other
allows the MMC driver to request a mode.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Per the spec of JESD84-B51, section 7.3, replace tacc with taac to
fix the obvious typo.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The intention of this check was to prevent the conflict between
hotplug and removing driver for whatever reason. Currently it
doesn't improve anything and the following rescan process could
still saftly perform the scan flow. So these code seems pointless
now and let's remove them.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Allow TMIO and SDHI driver implementations to provide values for
max_segs and max_blk_count.
A follow-up patch will set these values for Renesas Gen3 SoCs
the using an SDHI driver.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Ai Kyuse <ai.kyuse.uw@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The glink protocol supports different types of transports (shared
memory). With the core protocol remaining the same, the way the
transport's memory is probed and accessed is different. So add support
for glink's smem based transports.
Adding a new smem transport register function and the fifo accessors for
the same.
Acked-by: Arun Kumar Neelakantam <aneela@codeaurora.org>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Simplify the SMP passthrough code by switching it to the generic bsg-lib
helpers that abstract away the details of the request code, and gets
drivers out of seeing struct scsi_request.
For the libsas host SMP code there is a small behavior difference in
that we now always clear the residual len for successful commands,
similar to the three other SMP handler implementations. Given that
there is no partial command handling in the host SMP handler this should
not matter in practice.
[mkp: typos and checkpatch fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The SAS code will need it. Also mark the name argument const to match
bsg_register_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce struct scsi_vpd for the VPD page length, data and the RCU head
that will be used to free the VPD data. Use kfree_rcu() instead of
kfree() to free VPD data. Move the VPD buffer pointer check inside the
RCU read lock in the sysfs code. Only annotate pointers that are shared
across threads with __rcu. Use rcu_dereference() when dereferencing an
RCU pointer. This patch suppresses about twenty sparse complaints about
the vpd_pg8[03] pointers. This patch also fixes a race condition, namely
that updating of the VPD pointers and length variables in struct
scsi_device was not atomic with reference to the code reading these
variables. See also "Does the update code tolerate concurrent accesses?"
in Documentation/RCU/checklist.txt.
Fixes: commit 09e2b0b14690 ("scsi: rescan VPD attributes")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A common pattern in RCU code is to assign a new value to an RCU pointer
after having read and stored the old value. Introduce a macro for this
pattern.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Shane M Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
According to the ACPI specification, firmware is not required to provide
the Hardware Error Source Table (HEST). When HEST is not present, the
following superfluous message is printed to the kernel boot log -
[ 3.460067] GHES: HEST is not enabled!
Extend hest_disable variable to track whether the firmware provides this
table and if it is not present skip any log output. The existing
behaviour is preserved in all other cases.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Make the drivers that want to include the polling state into their
states table initialize it explicitly and drop the initialization of
it (which in fact is conditional, but that is not obvious from the
code) from the core.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Move the polling state initialization code to a separate file built
conditionally on CONFIG_ARCH_HAS_CPU_RELAX to get rid of the #ifdef
in driver.c.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
On some architectures the first (index 0) idle state is a polling
one and it doesn't really save energy, so there is the
CPUIDLE_DRIVER_STATE_START symbol allowing some pieces of
cpuidle code to avoid using that state.
However, this makes the code rather hard to follow. It is better
to explicitly avoid the polling state, so add a new cpuidle state
flag CPUIDLE_FLAG_POLLING to mark it and make the relevant code
check that flag for the first state instead of using the
CPUIDLE_DRIVER_STATE_START symbol.
In the ACPI processor driver that cannot always rely on the state
flags (like before the states table has been set up) define
a new internal symbol ACPI_IDLE_STATE_START equivalent to the
CPUIDLE_DRIVER_STATE_START one and drop the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.
Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).
Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.
Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed
by netdevice.h anymore.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tariq repored local pings to linklocal address is failing:
$ ifconfig ens8
ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255
inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link>
ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet)
RX packets 12 bytes 1164 (1.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30 bytes 2484 (2.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.
This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.
[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
[Jiri: added nsh_hdr() helper and renamed the header struct to "struct
nshhdr" to match the usual pattern. Removed packet type defines, these are
now shared with VXLAN-GPE.]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The values are shared between VXLAN-GPE and NSH. Originally probably by
coincidence but I notified both working groups about this last year and they
seem to keep the values in sync since then.
Hopefully they'll get a single IANA registry for the values, too. (I asked
them for that.)
Factor out the code to be shared by the NSH implementation.
NSH and MPLS values are added in this patch, too. For MPLS, the drafts
incorrectly assign only a single value, while we have two MPLS ethertypes.
I raised the problem with both groups. For now, I assume the value is for
unicast.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The NSH draft says:
An IEEE EtherType, 0x894F, has been allocated for NSH.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.
This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to avoid temporary large structs on the stack,
allocate them dynamically.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A few useful tracepoints to trace bridge forwarding
database updates.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We had one call to kmalloc that actually allocates an array. Switch that
one to the kmalloc_array() function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some backend devices claim to support write-same,
but would fail actual write-same requests.
Allow to set (or toggle) whether or not DRBD tries to support write-same.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
pci-epc-mem uses a page size equal to *PAGE_SIZE* (usually 4KB) to manage
the address space. However certain platforms like TI's K2G have a
restriction that this address space should be either divided into
1MB/2MB/4MB or 8MB sizes (Ref: 11.14.4.9.1 Outbound Address Translation in
K2G TRM SPRUHY8F January 2016 – Revised May 2017). Add support to handle
different page sizes here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
In some systems, such as Baseboard Management Controllers (BMCs), we
want to retain the state of LEDs across a reboot of the BMC (whilst the
host remains up). Implement support for the retain-state-shutdown
devicetree property in leds-gpio.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Brandon Wyman <bjwyman@gmail.com>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Late fixes for libata. There's a minor platform driver fix but the
important one is READ LOG PAGE.
This is a new ATA command which is used to test some optional features
but it broke probing of some devices - they locked up instead of
failing the unknown command.
Christoph tried blacklisting, but, after finding out there are
multiple devices which fail this way, backed off to testing feature
bit in IDENTIFY data first, which is a bit lossy (we can miss features
on some devices) but should be a lot safer"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
Revert "libata: quirk read log on no-name M.2 SSD"
libata: check for trusted computing in IDENTIFY DEVICE data
libata: quirk read log on no-name M.2 SSD
sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.14
rsi driver is getting a lot of new features lately, but as usual
active development happening on iwlwifi as well as other drivers.
I pulled wireless-drivers to fix multiple conflicts in iwlwifi and to
make it easier further development.
Major changes:
ath10k
* initial UBS bus support (no full support yet)
* add tdls support for 10.4 firmware
ath9k
* add Dell Wireless 1802
wil6210
* support FW RSSI reporting
rsi
* support legacy power save, U-APSD, rf-kill and AP mode
* RTS threshold configuration
brcmfmac
* support CYW4373 SDIO/USB chipset
iwlwifi
* some more code moved to a new directory
* add new PCI ID for 7265D
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creating as specific xdp_redirect_map variant of the xdp tracepoints
allow users to write simpler/faster BPF progs that get attached to
these tracepoints.
Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map
similar enough, that a tool can read the top part of the TP_STRUCT and
produce similar monitor statistics.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.
Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact. Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)
The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).
Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))
Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To make sense of the map index, the tracepoint user also need to know
that map we are talking about. Supply the map pointer but only expose
the map->id.
The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.
This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.
Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog
processing events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
chip->bits_per_cell which is used to determine the NAND cell type
(SLC/MLC) should always have a value != 0.
Complain loudly if the value is 0 in nand_is_slc() to catch use before
correct initialization.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
|
|
This reverts commit 35f0b6a779b8b7a98faefd7c1c660b4dac9a5c26.
We now conditionalize issuing of READ LOG PAGE on the TRUSTED
COMPUTING SUPPORTED bit in the identity data and this shouldn't be
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of
the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE
command to avoid issues with buggy devices. The only downside is that
we can't support Security Send / Receive for a device with an older
revision due to the conflicting use of this field in earlier
specifications.
tj: The reason we need this is because some devices which don't
support READ LOG PAGE lock up after getting issued that command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
COMPLETION_INITIALIZER_ONSTACK()
In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK().
For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1):
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462: e8 00 00 00 00 callq 2467 <write_journal+0x47>
2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax
246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
x->done = 0;
247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp)
2483: 00 00 00
init_waitqueue_head(&x->wait);
2486: 48 8d 78 18 lea 0x18(%rax),%rdi
248a: e8 00 00 00 00 callq 248f <write_journal+0x6f>
if (commit_start + commit_sections <= ic->journal_sections) {
248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi
249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi
24a4: b9 17 00 00 00 mov $0x17,%ecx
24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac: 41 39 c6 cmp %eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi
24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi
24bd: b9 17 00 00 00 mov $0x17,%ecx
24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
We can obviously see the temporary structure allocated, and the compiler
also does two meaningless memcpy with "rep movsq".
And according to:
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs
The return value of a statement expression is returned by value, so the
temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and
that's why the temporary structures are allocted.
To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
return a pointer and dereference it outside the block rather than return
the whole structure, in this way, we are able to teach the compiler not
to do the unnecessary stack allocation.
This could also reduce the stack size even if !LOCKDEP, for example in
write_journal(), compiled with gcc 7.1.1, the result of command:
objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86
before:
0x0000246a write_journal [dm-integrity.o]: 696
after:
0x00002b7a write_journal [dm-integrity.o]: 296
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
struct call_single_data is used in IPIs to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Currently it is not allocated with any explicit alignment
requirements. This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.
This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2. If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.
Fortunately, this is enforced by GCC, which will report bad sizes.
To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.
To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
ensure the temporal IRQ events don't interact with task state, the
XHLOCK_PROC is a fundament different beast that just happens to share
the interface.
The purpose of XHLOCK_PROC is to annotate independent execution inside
one task. For example workqueues, each work should appear to run in its
own 'pristine' 'task'.
Remove XHLOCK_PROC in favour of its own interface to avoid confusion.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: kernel-team@lge.com
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.
Add a new sample type for physical address.
perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.
- For kernel direct mapping addresses, virt_to_phys is used to convert
the virtual addresses to physical address.
- For user virtual addresses, __get_user_pages_fast is used to walk the
pages tables for user physical address.
- This does not work for vmalloc addresses right now. These are not
resolved, but code to do that could be added.
The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.
For security, the physical address can only be exposed to root or
privileged user.
Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
I just noticed that hw.itrace_started and hw.config are aliased to the
same location. Now, the PT driver happens to use both, which works out
fine by sheer luck:
- STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
program order, although there are no compiler barriers to ensure that,
- to the perf_log_itrace_start() hw.itrace_start looks set at the same
time as when it is intended to be set because both stores happen in the
same path,
- hw.config is never reset to zero in the PT driver.
Now, the use of hw.config by the PT driver makes more sense (it being a
HW PMU) than messing around with itrace_started, which is an awkward API
to begin with.
This patch replaces hw.itrace_started with an attach_state bit and an
API call for the PMU drivers to use to communicate the condition.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|