summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-08-30Merge remote-tracking branch 'lee/ib-mfd-hwmon-4.14' into hwmon-nextGuenter Roeck
2017-08-30mmc: core: Move mmc_start_areq() declarationAdrian Hunter
mmc_start_areq() is an internal mmc core API. Move the declaration accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: host: Add CQE interfaceAdrian Hunter
Add CQE host operations, capabilities, and host members. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30nvme: fix the definition of the doorbell buffer config support bitChangpeng Liu
NVMe 1.3 specification defines the Optional Admin Command Support feature flags, bit 8 set to '1' then the controller supports the Doorbell Buffer Config command. Bit 7 is used for Virtualization Mangement command. Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices") Cc: stable@vger.kernel.org
2017-08-30efi: switch to use new generic UUID APIAndy Shevchenko
There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-30mmc: core: Add members to mmc_request and mmc_data for CQE'sAdrian Hunter
Most of the information needed to issue requests to a CQE is already in struct mmc_request and struct mmc_data. Add data block address, some flags, and the task id (tag), and allow for cmd being NULL which it is for CQE tasks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: core: Remove unused MMC_CAP2_PACKED_CMDAdrian Hunter
Packed commands support was removed but some bits got left behind. Remove them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: renesas_sdhi: use extra flag for CBSY usageWolfram Sang
There is one SDHI instance on Gen2 which does not have the CBSY bit. So, turn CBSY usage into an extra flag and set it accordingly. This has the additional advantage that we can also set it for other incarnations later. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Chris Brandt <Chris.Brandt@renesas.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30clk: sunxi-ng: Add interface to query or configure MMC timing modes.Chen-Yu Tsai
Starting with the A83T SoC, Allwinner introduced a new timing mode for its MMC clocks. The new mode changes how the MMC controller sample and output clocks are delayed to match chip and board specifics. There are two controls for this, one on the CCU side controlling how the clocks behave, and one in the MMC controller controlling what inputs to take and how to route them. In the old mode, the MMC clock had 2 child clocks providing the output and sample clocks, which could be delayed by a number of clock cycles measured from the MMC clock's parent. With the new mode, the 2 delay clocks are no longer active. Instead, the delays and associated controls are moved into the MMC controller. The output of the MMC clock is also halved. The difference in how things are wired between the modes means that the clock controls and the MMC controls must match. To achieve this in a clear, explicit way, we introduce two functions for the MMC driver to use: one queries the hardware for the current mode set, and the other allows the MMC driver to request a mode. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: core: correct taac parameter according to the specificationShawn Lin
Per the spec of JESD84-B51, section 7.3, replace tacc with taac to fix the obvious typo. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: core: remove check of host->removed for rescan routineShawn Lin
The intention of this check was to prevent the conflict between hotplug and removing driver for whatever reason. Currently it doesn't improve anything and the following rescan process could still saftly perform the scan flow. So these code seems pointless now and let's remove them. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-30mmc: tmio, renesas-sdhi: add max_{segs, blk_count} to tmio_mmc_dataYoshihiro Shimoda
Allow TMIO and SDHI driver implementations to provide values for max_segs and max_blk_count. A follow-up patch will set these values for Renesas Gen3 SoCs the using an SDHI driver. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Ai Kyuse <ai.kyuse.uw@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-29rpmsg: glink: Introduce glink smem based transportBjorn Andersson
The glink protocol supports different types of transports (shared memory). With the core protocol remaining the same, the way the transport's memory is probed and accessed is different. So add support for glink's smem based transports. Adding a new smem transport register function and the fifo accessors for the same. Acked-by: Arun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2017-08-29scsi: scsi_transport_sas: switch to bsg-lib for SMP passthroughChristoph Hellwig
Simplify the SMP passthrough code by switching it to the generic bsg-lib helpers that abstract away the details of the request code, and gets drivers out of seeing struct scsi_request. For the libsas host SMP code there is a small behavior difference in that we now always clear the residual len for successful commands, similar to the three other SMP handler implementations. Given that there is no partial command handling in the host SMP handler this should not matter in practice. [mkp: typos and checkpatch fixes] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-29scsi: bsg-lib: pass the release callback through bsg_setup_queueChristoph Hellwig
The SAS code will need it. Also mark the name argument const to match bsg_register_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-29scsi: Rework handling of scsi_device.vpd_pg8[03]Bart Van Assche
Introduce struct scsi_vpd for the VPD page length, data and the RCU head that will be used to free the VPD data. Use kfree_rcu() instead of kfree() to free VPD data. Move the VPD buffer pointer check inside the RCU read lock in the sysfs code. Only annotate pointers that are shared across threads with __rcu. Use rcu_dereference() when dereferencing an RCU pointer. This patch suppresses about twenty sparse complaints about the vpd_pg8[03] pointers. This patch also fixes a race condition, namely that updating of the VPD pointers and length variables in struct scsi_device was not atomic with reference to the code reading these variables. See also "Does the update code tolerate concurrent accesses?" in Documentation/RCU/checklist.txt. Fixes: commit 09e2b0b14690 ("scsi: rescan VPD attributes") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Shane Seymour <shane.seymour@hpe.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane Seymour <shane.seymour@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-29scsi: rcu: Introduce rcu_swap_protected()Bart Van Assche
A common pattern in RCU code is to assign a new value to an RCU pointer after having read and stored the old value. Introduce a macro for this pattern. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane M Seymour <shane.seymour@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-30ACPI / APEI: Suppress message if HEST not presentPunit Agrawal
According to the ACPI specification, firmware is not required to provide the Hardware Error Source Table (HEST). When HEST is not present, the following superfluous message is printed to the kernel boot log - [ 3.460067] GHES: HEST is not enabled! Extend hest_disable variable to track whether the firmware provides this table and if it is not present skip any log output. The existing behaviour is preserved in all other cases. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-30cpuidle: Make drivers initialize polling stateRafael J. Wysocki
Make the drivers that want to include the polling state into their states table initialize it explicitly and drop the initialization of it (which in fact is conditional, but that is not obvious from the code) from the core. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30cpuidle: Move polling state initialization code to separate fileRafael J. Wysocki
Move the polling state initialization code to a separate file built conditionally on CONFIG_ARCH_HAS_CPU_RELAX to get rid of the #ifdef in driver.c. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbolRafael J. Wysocki
On some architectures the first (index 0) idle state is a polling one and it doesn't really save energy, so there is the CPUIDLE_DRIVER_STATE_START symbol allowing some pieces of cpuidle code to avoid using that state. However, this makes the code rather hard to follow. It is better to explicitly avoid the polling state, so add a new cpuidle state flag CPUIDLE_FLAG_POLLING to mark it and make the relevant code check that flag for the first state instead of using the CPUIDLE_DRIVER_STATE_START symbol. In the ACPI processor driver that cannot always rely on the state flags (like before the states table has been set up) define a new internal symbol ACPI_IDLE_STATE_START equivalent to the CPUIDLE_DRIVER_STATE_START one and drop the latter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29neigh: increase queue_len_bytes to match wmem_defaultEric Dumazet
Florian reported UDP xmit drops that could be root caused to the too small neigh limit. Current limit is 64 KB, meaning that even a single UDP socket would hit it, since its default sk_sndbuf comes from net.core.wmem_default (~212992 bytes on 64bit arches). Once ARP/ND resolution is in progress, we should allow a little more packets to be queued, at least for one producer. Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf limit and either block in sendmsg() or return -EAGAIN. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29net: remove dmaengine.h inclusion from netdevice.hDave Jiang
Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed by netdevice.h anymore. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29ipv6: Use rt6i_idev index for echo replies to a local addressDavid Ahern
Tariq repored local pings to linklocal address is failing: $ ifconfig ens8 ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255 inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link> ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet) RX packets 12 bytes 1164 (1.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 30 bytes 2484 (2.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8 PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29net: add NSH header structures and helpersYi Yang
NSH (Network Service Header)[1] is a new protocol for service function chaining, it can be handled as a L3 protocol like IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH + Inner packet are two typical use cases. This patch adds NSH header structures and helpers for NSH GSO support and Open vSwitch NSH support. [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/ [Jiri: added nsh_hdr() helper and renamed the header struct to "struct nshhdr" to match the usual pattern. Removed packet type defines, these are now shared with VXLAN-GPE.] Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29vxlan: factor out VXLAN-GPE next protocolJiri Benc
The values are shared between VXLAN-GPE and NSH. Originally probably by coincidence but I notified both working groups about this last year and they seem to keep the values in sync since then. Hopefully they'll get a single IANA registry for the values, too. (I asked them for that.) Factor out the code to be shared by the NSH implementation. NSH and MPLS values are added in this patch, too. For MPLS, the drafts incorrectly assign only a single value, while we have two MPLS ethertypes. I raised the problem with both groups. For now, I assume the value is for unicast. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29ether: add NSH ethertypeJiri Benc
The NSH draft says: An IEEE EtherType, 0x894F, has been allocated for NSH. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29if_ether: add forces ife lfb typeAlexander Aring
This patch adds the forces IFE lfb type according to IEEE registered ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more information. Since there exists the IFE subsystem it can be used there. This patch also use the correct word "ForCES" instead of "FoRCES" which is a spelling error inside the IEEE ethertype specification. Signed-off-by: Alexander Aring <aring@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29net/mlx4: Add user mac FW update supportMoshe Shemesh
Adding support for updating the FW on new port mac, when port mac change is requested by the user. This info is required by the FW as OEM management tools require this info directly from the NIC FW. Check device capability bit to verify the FW supports user mac. If the FW does support it, use set_port command to notify the FW on the new mac. The feature is relevant only to PF port mac. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29net/mlx4_core: Dynamically allocate structs at mlx4_slave_capEran Ben Elisha
In order to avoid temporary large structs on the stack, allocate them dynamically. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tal Alon <talal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29bridge: fdb add and delete tracepointsRoopa Prabhu
A few useful tracepoints to trace bridge forwarding database updates. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29drbd: switch from kmalloc() to kmalloc_array()Roland Kammerer
We had one call to kmalloc that actually allocates an array. Switch that one to the kmalloc_array() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29drbd: new disk-option disable-write-sameLars Ellenberg
Some backend devices claim to support write-same, but would fail actual write-same requests. Allow to set (or toggle) whether or not DRBD tries to support write-same. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29PCI: endpoint: Add support for configurable page sizeKishon Vijay Abraham I
pci-epc-mem uses a page size equal to *PAGE_SIZE* (usually 4KB) to manage the address space. However certain platforms like TI's K2G have a restriction that this address space should be either divided into 1MB/2MB/4MB or 8MB sizes (Ref: 11.14.4.9.1 Outbound Address Translation in K2G TRM SPRUHY8F January 2016 – Revised May 2017). Add support to handle different page sizes here. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-29leds: gpio: Allow LED to retain state at shutdownAndrew Jeffery
In some systems, such as Baseboard Management Controllers (BMCs), we want to retain the state of LEDs across a reboot of the BMC (whilst the host remains up). Implement support for the retain-state-shutdown devicetree property in leds-gpio. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Brandon Wyman <bjwyman@gmail.com> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2017-08-29Merge branch 'for-4.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Late fixes for libata. There's a minor platform driver fix but the important one is READ LOG PAGE. This is a new ATA command which is used to test some optional features but it broke probing of some devices - they locked up instead of failing the unknown command. Christoph tried blacklisting, but, after finding out there are multiple devices which fail this way, backed off to testing feature bit in IDENTIFY data first, which is a bit lossy (we can miss features on some devices) but should be a lot safer" * 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: Revert "libata: quirk read log on no-name M.2 SSD" libata: check for trusted computing in IDENTIFY DEVICE data libata: quirk read log on no-name M.2 SSD sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
2017-08-29Merge tag 'wireless-drivers-next-for-davem-2017-08-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.14 rsi driver is getting a lot of new features lately, but as usual active development happening on iwlwifi as well as other drivers. I pulled wireless-drivers to fix multiple conflicts in iwlwifi and to make it easier further development. Major changes: ath10k * initial UBS bus support (no full support yet) * add tdls support for 10.4 firmware ath9k * add Dell Wireless 1802 wil6210 * support FW RSSI reporting rsi * support legacy power save, U-APSD, rf-kill and AP mode * RTS threshold configuration brcmfmac * support CYW4373 SDIO/USB chipset iwlwifi * some more code moved to a new directory * add new PCI ID for 7265D ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29xdp: separate xdp_redirect tracepoint in map caseJesper Dangaard Brouer
Creating as specific xdp_redirect_map variant of the xdp tracepoints allow users to write simpler/faster BPF progs that get attached to these tracepoints. Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map similar enough, that a tool can read the top part of the TP_STRUCT and produce similar monitor statistics. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29xdp: separate xdp_redirect tracepoint in error caseJesper Dangaard Brouer
There is a need to separate the xdp_redirect tracepoint into two tracepoints, for separating the error case from the normal forward case. Due to the extreme speeds XDP is operating at, loading a tracepoint have a measurable impact. Single core XDP REDIRECT (ethtool tuned rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps (CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe) The overhead of loading a bpf-based tracepoint can be calculated to cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns). Using perf record on the tracepoint event, with a non-matching --filter expression, the overhead is much larger. Performance drops to 8.3 Mpps, cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74)) Having a separate tracepoint for err cases, which should be less frequent, allow running a continuous monitor for errors while not affecting the redirect forward performance (this have also been verified by measurements). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29xdp: make xdp tracepoints report bpf prog id instead of prog_tagJesper Dangaard Brouer
Given previous patch expose the map_id, it seems natural to also report the bpf prog id. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29xdp: tracepoint xdp_redirect also need a map argumentJesper Dangaard Brouer
To make sense of the map index, the tracepoint user also need to know that map we are talking about. Supply the map pointer but only expose the map->id. The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case, this is the result of the devmap lookup. The map lookup key is exposed as map_index, which is needed to troubleshoot in case the lookup failed. The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as possible. This also keeps the TP_STRUCT similar enough, that userspace can write a monitor program, that doesn't need to care about whether bpf_redirect or bpf_redirect_map were used. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29xdp: remove redundant argument to trace_xdp_redirectJesper Dangaard Brouer
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect is redundant as it is only called in-case this action was specified. Remove the argument, but keep "act" member of the tracepoint struct and populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog processing events. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29mtd: nand: complain loudly when chip->bits_per_cell is not correctly initializedLothar Waßmann
chip->bits_per_cell which is used to determine the NAND cell type (SLC/MLC) should always have a value != 0. Complain loudly if the value is 0 in nand_is_slc() to catch use before correct initialization. Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-08-29Revert "libata: quirk read log on no-name M.2 SSD"Tejun Heo
This reverts commit 35f0b6a779b8b7a98faefd7c1c660b4dac9a5c26. We now conditionalize issuing of READ LOG PAGE on the TRUSTED COMPUTING SUPPORTED bit in the identity data and this shouldn't be necessary. Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-29libata: check for trusted computing in IDENTIFY DEVICE dataChristoph Hellwig
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE command to avoid issues with buggy devices. The only downside is that we can't support Security Send / Receive for a device with an older revision due to the conflicting use of this field in earlier specifications. tj: The reason we need this is because some devices which don't support READ LOG PAGE lock up after getting issued that command. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-29sched/completion: Avoid unnecessary stack allocation for ↵Boqun Feng
COMPLETION_INITIALIZER_ONSTACK() In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the stack allocation of the caller. However, on some compilers, a temporary structure was allocated for the return value of COMPLETION_INITIALIZER_ONSTACK(). For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1): io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2462: e8 00 00 00 00 callq 2467 <write_journal+0x47> 2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax 246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx x->done = 0; 247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp) 2483: 00 00 00 init_waitqueue_head(&x->wait); 2486: 48 8d 78 18 lea 0x18(%rax),%rdi 248a: e8 00 00 00 00 callq 248f <write_journal+0x6f> if (commit_start + commit_sections <= ic->journal_sections) { 248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi 249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi 24a4: b9 17 00 00 00 mov $0x17,%ecx 24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) if (commit_start + commit_sections <= ic->journal_sections) { 24ac: 41 39 c6 cmp %eax,%r14d io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi 24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi 24bd: b9 17 00 00 00 mov $0x17,%ecx 24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) We can obviously see the temporary structure allocated, and the compiler also does two meaningless memcpy with "rep movsq". And according to: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs The return value of a statement expression is returned by value, so the temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and that's why the temporary structures are allocted. To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK() return a pointer and dereference it outside the block rather than return the whole structure, in this way, we are able to teach the compiler not to do the unnecessary stack allocation. This could also reduce the stack size even if !LOCKDEP, for example in write_journal(), compiled with gcc 7.1.1, the result of command: objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86 before: 0x0000246a write_journal [dm-integrity.o]: 696 after: 0x00002b7a write_journal [dm-integrity.o]: 296 Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29smp: Avoid using two cache lines for struct call_single_dataYing Huang
struct call_single_data is used in IPIs to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Currently it is not allocated with any explicit alignment requirements. This makes it possible for allocated call_single_data to cross two cache lines, which results in double the number of the cache lines that need to be transferred among CPUs. This can be fixed by requiring call_single_data to be aligned with the size of call_single_data. Currently the size of call_single_data is the power of 2. If we add new fields to call_single_data, we may need to add padding to make sure the size of new definition is the power of 2 as well. Fortunately, this is enforced by GCC, which will report bad sizes. To set alignment requirements of call_single_data to the size of call_single_data, a struct definition and a typedef is used. To test the effect of the patch, I used the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPIs are triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data, because of faster IPIs. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Huang, Ying <ying.huang@intel.com> [ Add call_single_data_t and align with size of call_single_data. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/lockdep: Untangle xhlock history save/restore from task independencePeter Zijlstra
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to ensure the temporal IRQ events don't interact with task state, the XHLOCK_PROC is a fundament different beast that just happens to share the interface. The purpose of XHLOCK_PROC is to annotate independent execution inside one task. For example workqueues, each work should appear to run in its own 'pristine' 'task'. Remove XHLOCK_PROC in favour of its own interface to avoid confusion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: kernel-team@lge.com Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, x86: Add PERF_SAMPLE_PHYS_ADDRKan Liang
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, pt, bts: Get rid of itrace_startedAlexander Shishkin
I just noticed that hw.itrace_started and hw.config are aliased to the same location. Now, the PT driver happens to use both, which works out fine by sheer luck: - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the program order, although there are no compiler barriers to ensure that, - to the perf_log_itrace_start() hw.itrace_start looks set at the same time as when it is intended to be set because both stores happen in the same path, - hw.config is never reset to zero in the PT driver. Now, the use of hw.config by the PT driver makes more sense (it being a HW PMU) than messing around with itrace_started, which is an awkward API to begin with. This patch replaces hw.itrace_started with an attach_state bit and an API call for the PMU drivers to use to communicate the condition. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>