summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-07-20Merge remote-tracking branch 'jens/for-4.8/core' into dm-4.8Mike Snitzer
DM's DAX support depends on block core's newly added QUEUE_FLAG_DAX.
2016-07-20block: Fix front merge checkDamien Le Moal
For a front merge, the maximum number of sectors of the request must be checked against the front merge BIO sector, not the current sector of the request. Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: do not merge requests without consulting with io schedulerTahsin Erdogan
Before merging a bio into an existing request, io scheduler is called to get its approval first. However, the requests that come from a plug flush may get merged by block layer without consulting with io scheduler. In case of CFQ, this can cause fairness problems. For instance, if a request gets merged into a low weight cgroup's request, high weight cgroup now will depend on low weight cgroup to get scheduled. If high weigt cgroup needs that io request to complete before submitting more requests, then it will also lose its timeslice. Following script demonstrates the problem. Group g1 has a low weight, g2 and g3 have equal high weights but g2's requests are adjacent to g1's requests so they are subject to merging. Due to these merges, g2 gets poor disk time allocation. cat > cfq-merge-repro.sh << "EOF" #!/bin/bash set -e IO_ROOT=/mnt-cgroup/io mkdir -p $IO_ROOT if ! mount | grep -qw $IO_ROOT; then mount -t cgroup none -oblkio $IO_ROOT fi cd $IO_ROOT for i in g1 g2 g3; do if [ -d $i ]; then rmdir $i fi done mkdir g1 && echo 10 > g1/blkio.weight mkdir g2 && echo 495 > g2/blkio.weight mkdir g3 && echo 495 > g3/blkio.weight RUNTIME=10 (echo $BASHPID > g1/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=0k &> /dev/null)& (echo $BASHPID > g2/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=64k &> /dev/null)& (echo $BASHPID > g3/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=256k &> /dev/null)& sleep $((RUNTIME+1)) for i in g1 g2 g3; do echo ---- $i ---- cat $i/blkio.time done EOF # ./cfq-merge-repro.sh ---- g1 ---- 8:16 162 ---- g2 ---- 8:16 165 ---- g3 ---- 8:16 686 After applying the patch: # ./cfq-merge-repro.sh ---- g1 ---- 8:16 90 ---- g2 ---- 8:16 445 ---- g3 ---- 8:16 471 Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: add QUEUE_FLAG_DAX for devices to advertise their DAX supportToshi Kani
Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add QUEUE_FLAG_DAX to request_queue flags to advertise their DAX support. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20scsi/osd: open code blk_make_requestChristoph Hellwig
I wish the OSD code could simply use blk_rq_map_* helpers like everyone else, but the complex nature of deciding if we have DATA IN and/or DATA OUT buffers might make this impossible (at least for a mere human like me). But using blk_rq_append_bio at least allows sharing the setup code between request with or without dat a buffers, and given that this is the last user of blk_make_request it allows getting rid of that somewhat awkward interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Boaz Harrosh <ooo@electrozaur.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: simplify and export blk_rq_append_bioChristoph Hellwig
The target SCSI passthrough backend is much better served with the low-level blk_rq_append_bio construct then the helpers built on top of it, so export it. Also use the opportunity to remove the pointless request_queue argument and make the code flow a little more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: shrink bio size againChristoph Hellwig
The recent ops split grew the bio by adding the new ioprio field. Shrink it again by using a 16-bit field for the bi_flags value and filling the holes near the beginning of the structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: simplify and cleanup bvec pool handlingChristoph Hellwig
Instead of a flag and an index just make sure an index of 0 means no need to free the bvec array. Also move the constants related to the bvec pools together and use a consistent naming scheme for them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: get rid of bio_rw and READAChristoph Hellwig
These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: introduce BLKDEV_DISCARD_ZERO to fix zerooutChristoph Hellwig
Currently blkdev_issue_zeroout cascades down from discards (if the driver guarantees that discards zero data), to WRITE SAME and then to a loop writing zeroes. Unfortunately we ignore run-time EOPNOTSUPP errors in the block layer blkdev_issue_discard helper to work around DM volumes that may have mixed discard support underneath. This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to blkdev_issue_discard that indicates we are called for zeroing operation. This allows both to ignore the EOPNOTSUPP hack and actually consolidating the discard_zeroes_data check into the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-07-19 Here's likely the last bluetooth-next pull request for the 4.8 kernel: - Fix for L2CAP setsockopt - Fix for is_suspending flag handling in btmrvl driver - Addition of Bluetooth HW & FW info fields to debugfs - Fix to use int instead of char for callback status. The last one (from Geert Uytterhoeven) is actually not purely a Bluetooth (or 802.15.4) patch, but it was agreed with other maintainers that we take it through the bluetooth-next tree. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20bpf: fix implicit declaration of bpf_prog_addBrenden Blanco
For the ifndef case of CONFIG_BPF_SYSCALL, an inline version of bpf_prog_add needs to exist otherwise the build breaks on some configs. drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2544:10: error: implicit declaration of function 'bpf_prog_add' prog = bpf_prog_add(prog, priv->rx_ring_num - 1); The function is introduced in 59d3656d5bf50 ("bpf: add bpf_prog_add api for bulk prog refcnt") and first used in 47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program"). Fixes: 47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program") Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Tariq Toukan <ttoukan.linux@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20Merge remote-tracking branches 'regulator/topic/qcom-spmi', ↵Mark Brown
'regulator/topic/rn5t618', 'regulator/topic/tps65218' and 'regulator/topic/twl' into regulator-next
2016-07-20Merge remote-tracking branches 'regulator/topic/fixed', ↵Mark Brown
'regulator/topic/headers', 'regulator/topic/lp837x', 'regulator/topic/max8973' and 'regulator/topic/mt6323' into regulator-next
2016-07-20Merge remote-tracking branches 'regulator/topic/act8865', ↵Mark Brown
'regulator/topic/can-change-voltage', 'regulator/topic/da9210' and 'regulator/topic/da9211' into regulator-next
2016-07-19net/mlx4_en: break out tx_desc write into separate functionBrenden Blanco
In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net: add ndo to setup/query xdp prog in adapter rxBrenden Blanco
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP filter. The single op is used for both setup/query of the xdp program, modelled after ndo_setup_tc. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19bpf: add XDP prog type for early driver filterBrenden Blanco
Add a new bpf prog type that is intended to run in early stages of the packet rx path. Only minimal packet metadata will be available, hence a new context type, struct xdp_md, is exposed to userspace. So far only expose the packet start and end pointers, and only in read mode. An XDP program must return one of the well known enum values, all other return codes are reserved for future use. Unfortunately, this restriction is hard to enforce at verification time, so take the approach of warning at runtime when such programs are encountered. Out of bounds return codes should alias to XDP_ABORTED. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19bpf: add bpf_prog_add api for bulk prog refcntBrenden Blanco
A subsystem may need to store many copies of a bpf program, each deserving its own reference. Rather than requiring the caller to loop one by one (with possible mid-loop failure), add a bulk bpf_prog_add api. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19bcma: define ChipCommon B MII registersRafał Miłecki
We don't have access to datasheets to document all the bits but we can name these registers at least. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-07-18ata: define ATA_PROT_* in terms of ATA_PROT_FLAG_*Christoph Hellwig
This avoid the need to always translate between the two in ata_prot_flags and generally cleans up the taskfile protocol usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-18libata: remove ATA_PROT_FLAG_DATAChristoph Hellwig
Instead we can simply check for PIO or DMA in ata_is_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-18libata: remove ata_is_nodataChristoph Hellwig
The only caller can just check for !ata_is_data instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-18ata: make lba_{28,48}_ok() use ATA_MAX_SECTORS{,_LBA48}Tom Yan
Since we set ATA_MAX_SECTORS_LBA48 to 65535 to avoid the corner case in some drives that commands with "count" set to 0000h (which reprsents 65536) does not work as expected, lba_48_ok(), which is used for number-of-blocks checking when libata pack commands, should use the same limit as well. In fact, there is no reason for the two functions not to use the macros anyway. Signed-off-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-18Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo
ath.git patches for 4.8. Major changes: ath10k * enable support for QCA9888
2016-07-18netfilter: x_tables: speed up jump target validationFlorian Westphal
The dummy ruleset I used to test the original validation change was broken, most rules were unreachable and were not tested by mark_source_chains(). In some cases rulesets that used to load in a few seconds now require several minutes. sample ruleset that shows the behaviour: echo "*filter" for i in $(seq 0 100000);do printf ":chain_%06x - [0:0]\n" $i done for i in $(seq 0 100000);do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done echo COMMIT [ pipe result into iptables-restore ] This ruleset will be about 74mbyte in size, with ~500k searches though all 500k[1] rule entries. iptables-restore will take forever (gave up after 10 minutes) Instead of always searching the entire blob for a match, fill an array with the start offsets of every single ipt_entry struct, then do a binary search to check if the jump target is present or not. After this change ruleset restore times get again close to what one gets when reverting 36472341017529e (~3 seconds on my workstation). [1] every user-defined rule gets an implicit RETURN, so we get 300k jumps + 100k userchains + 100k returns -> 500k rule entries Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps") Reported-by: Jeff Wu <wujiafu@gmail.com> Tested-by: Jeff Wu <wujiafu@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-18regulator: mt6323: Add support for MT6323 regulatorChen Zhong
The MT6323 is a regulator found on boards based on MediaTek MT7623 and probably other SoCs. It is a so called pmic and connects as a slave to SoC using SPI, wrapped inside the pmic-wrapper. Signed-off-by: Chen Zhong <chen.zhong@mediatek.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-07-18crypto: skcipher - Remove top-level givcipher interfaceHerbert Xu
This patch removes the old crypto_grab_skcipher helper and replaces it with crypto_grab_skcipher2. As this is the final entry point into givcipher this patch also removes all traces of the top-level givcipher interface, including all implicit IV generators such as chainiv. The bottom-level givcipher interface remains until the drivers using it are converted. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-18crypto: skcipher - Add low-level skcipher interfaceHerbert Xu
This patch allows skcipher algorithms and instances to be created and registered with the crypto API. They are accessible through the top-level skcipher interface, along with ablkcipher/blkcipher algorithms and instances. This patch also introduces a new parameter called chunk size which is meant for ciphers such as CTR and CTS which ostensibly can handle arbitrary lengths, but still behave like block ciphers in that you can only process a partial block at the very end. For these ciphers the block size will continue to be set to 1 as it is now while the chunk size will be set to the underlying block size. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-17drivers: misc: ti-st: Use int instead of fuzzy char for callback statusGeert Uytterhoeven
On mips and parisc: drivers/bluetooth/btwilink.c: In function 'ti_st_open': drivers/bluetooth/btwilink.c:174:21: warning: overflow in implicit constant conversion [-Woverflow] hst->reg_status = -EINPROGRESS; drivers/nfc/nfcwilink.c: In function 'nfcwilink_open': drivers/nfc/nfcwilink.c:396:31: warning: overflow in implicit constant conversion [-Woverflow] drv->st_register_cb_status = -EINPROGRESS; There are actually two issues: 1. Whether "char" is signed or unsigned depends on the architecture. As the completion callback data is used to pass a (negative) error code, it should always be signed. 2. EINPROGRESS is 150 on mips, 245 on parisc. Hence -EINPROGRESS doesn't fit in a signed 8-bit number. Change the callback status from "char" to "int" to fix these. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-07-16of_mdio: Abstract a general interface for phy connectDongpo Li
Abstract a general interface "of_phy_get_and_connect" for PHY connect. User will have no bother with getting "phy-mode" and "phy-handle" any more. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com> Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16net: ipmr/ip6mr: add support for keeping an entry ageNikolay Aleksandrov
In preparation for hardware offloading of ipmr/ip6mr we need an interface that allows to check (and later update) the age of entries. Relying on stats alone can show activity but not actual age of the entry, furthermore when there're tens of thousands of entries a lot of the hardware implementations only support "hit" bits which are cleared on read to denote that the entry was active and shouldn't be aged out, these can then be naturally translated into age timestamp and will be compatible with the software forwarding age. Using a lastuse entry doesn't affect performance because the members in that cache line are written to along with the age. Since all new users are encouraged to use ipmr via netlink, this is exported via the RTA_EXPIRES attribute. Also do a minor local variable declaration style adjustment - arrange them longest to shortest. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Shrijeet Mukherjee <shm@cumulusnetworks.com> CC: Satish Ashok <sashok@cumulusnetworks.com> CC: Donald Sharp <sharpd@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16vlan: use a valid default mtu value for vlan over macsecPaolo Abeni
macsec can't cope with mtu frames which need vlan tag insertion, and vlan device set the default mtu equal to the underlying dev's one. By default vlan over macsec devices use invalid mtu, dropping all the large packets. This patch adds a netif helper to check if an upper vlan device needs mtu reduction. The helper is used during vlan devices initialization to set a valid default and during mtu updating to forbid invalid, too bit, mtu values. The helper currently only check if the lower dev is a macsec device, if we get more users, we need to update only the helper (possibly reserving an additional IFF bit). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15bpf: avoid stack copy and use skb ctx for event outputDaniel Borkmann
This work addresses a couple of issues bpf_skb_event_output() helper currently has: i) We need two copies instead of just a single one for the skb data when it should be part of a sample. The data can be non-linear and thus needs to be extracted via bpf_skb_load_bytes() helper first, and then copied once again into the ring buffer slot. ii) Since bpf_skb_load_bytes() currently needs to be used first, the helper needs to see a constant size on the passed stack buffer to make sure BPF verifier can do sanity checks on it during verification time. Thus, just passing skb->len (or any other non-constant value) wouldn't work, but changing bpf_skb_load_bytes() is also not the proper solution, since the two copies are generally still needed. iii) bpf_skb_load_bytes() is just for rather small buffers like headers, since they need to sit on the limited BPF stack anyway. Instead of working around in bpf_skb_load_bytes(), this work improves the bpf_skb_event_output() helper to address all 3 at once. We can make use of the passed in skb context that we have in the helper anyway, and use some of the reserved flag bits as a length argument. The helper will use the new __output_custom() facility from perf side with bpf_skb_copy() as callback helper to walk and extract the data. It will pass the data for setup to bpf_event_output(), which generates and pushes the raw record with an additional frag part. The linear data used in the first frag of the record serves as programmatically defined meta data passed along with the appended sample. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15perf, events: add non-linear data support for raw recordsDaniel Borkmann
This patch adds support for non-linear data on raw records. It extends raw records to have one or multiple fragments that will be written linearly into the ring slot, where each fragment can optionally have a custom callback handler to walk and extract complex, possibly non-linear data. If a callback handler is provided for a fragment, then the new __output_custom() will be used instead of __output_copy() for the perf_output_sample() part. perf_prepare_sample() does all the size calculation only once, so perf_output_sample() doesn't need to redo the same work anymore, meaning real_size and padding will be cached in the raw record. The raw record becomes 32 bytes in size without holes; to not increase it further and to avoid doing unnecessary recalculations in fast-path, we can reuse next pointer of the last fragment, idea here is borrowed from ZERO_OR_NULL_PTR(), which should keep the perf_output_sample() path for PERF_SAMPLE_RAW minimal. This facility is needed for BPF's event output helper as a first user that will, in a follow-up, add an additional perf_raw_frag to its perf_raw_record in order to be able to more efficiently dump skb context after a linear head meta data related to it. skbs can be non-linear and thus need a custom output function to dump buffers. Currently, the skb data needs to be copied twice; with the help of __output_custom() this work only needs to be done once. Future users could be things like XDP/BPF programs that work on different context though and would thus also have a different callback function. The few users of raw records are adapted to initialize their frag data from the raw record itself, no change in behavior for them. The code is based upon a PoC diff provided by Peter Zijlstra [1]. [1] http://thread.gmane.org/gmane.linux.network/421294 Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15net: phy: micrel: Add KSZ8041FTL fiber mode supportPhilipp Zabel
We can't detect the FXEN (fiber mode) bootstrap pin, so configure it via a boolean device tree property "micrel,fiber-mode". If it is enabled, auto-negotiation is not supported. The only available modes are 100base-fx (full duplex and half duplex). Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Merge remote-tracking branches 'regmap/topic/bulk', 'regmap/topic/i2c', ↵Mark Brown
'regmap/topic/iopoll', 'regmap/topic/irq' and 'regmap/topic/maintainers' into regmap-next
2016-07-15regmap: add iopoll-like polling macroPhilipp Zabel
This patch adds a macro regmap_read_poll_timeout that works similar to the readx_poll_timeout defined in linux/iopoll.h, except that this can also return the error value returned by a failed regmap_read. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-07-15ata: Handle ATA NCQ NO-DATA commands correctlyHannes Reinecke
Add a new taskfile protocol ATA_PROT_NCQ_NODATA to handle ATA NCQ NO-DATA commands correctly. And fixup ata_scsi_zbc_out_xlat() to use it. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-15Merge branch 'x86/asm' into x86/mm, to resolve conflictsIngo Molnar
Conflicts: tools/testing/selftests/x86/Makefile Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15mm: thp: refix false positive BUG in page_move_anon_rmap()Hugh Dickins
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15mm: thp: move pmd check inside ptl for freeze_page()Naoya Horiguchi
I found a race condition triggering VM_BUG_ON() in freeze_page(), when running a testcase with 3 processes: - process 1: keep writing thp, - process 2: keep clearing soft-dirty bits from virtual address of process 1 - process 3: call migratepages for process 1, The kernel message is like this: kernel BUG at /src/linux-dev/mm/huge_memory.c:3096! invalid opcode: 0000 [#1] SMP Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000 RIP: 0010:[<ffffffff811f8e06>] [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590 RSP: 0018:ffff88007cdd3b70 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000 RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000 R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000 R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080 FS: 00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0 Call Trace: ? list_del+0xd/0x30 queue_pages_pte_range+0x4d1/0x590 __walk_page_range+0x204/0x4e0 walk_page_range+0x71/0xf0 queue_pages_range+0x75/0x90 ? queue_pages_hugetlb+0x190/0x190 ? new_node_page+0xc0/0xc0 ? change_prot_numa+0x40/0x40 migrate_to_node+0x71/0xd0 do_migrate_pages+0x1c3/0x210 SyS_migrate_pages+0x261/0x290 entry_SYSCALL_64_fastpath+0x1a/0xa4 Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff RIP split_huge_page_to_list+0x496/0x590 I'm not sure of the full scenario of the reproduction, but my debug showed that split_huge_pmd_address(freeze=true) returned without running main code of pmd splitting because pmd_present(*pmd) in precheck somehow returned 0. If this happens, the subsequent try_to_unmap() fails and returns non-zero (because page_mapcount() still > 0), and finally VM_BUG_ON() fires. This patch tries to fix it by prechecking pmd state inside ptl. Link: http://lkml.kernel.org/r/1466990929-7452-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-14net/mlx5: Introduce bulk reading of flow countersAmir Vadai
This commit utilize the ability of ConnectX-4 to bulk read flow counters. Few bulk counter queries could be done instead of issuing thousands of firmware commands per second to get statistics of all flows set to HW, such as those programmed when we offload tc filters. Counters are stored sorted by hardware id, and queried in blocks (id + number of counters). Due to hardware requirement, start of block and number of counters in a block must be four aligned. Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14net/mlx5: Store counters in rbtree instead of listAmir Vadai
In order to use bulk counters, we need to have counters sorted by id. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14libata: return boolean values from ata_is_*Christoph Hellwig
This way we don't have to worry about the exact bit postition of the test to leak out and any crazy propagation effects in the callers. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14sched/cputime: Reorganize vtime native irqtime accounting headersFrederic Weisbecker
The vtime irqtime accounting headers are very scattered and convoluted right now. Reorganize them such that it is obvious that only CONFIG_VIRT_CPU_ACCOUNTING_NATIVE does use it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14sched/cputime: Clean up the old vtime gen irqtime accounting completelyFrederic Weisbecker
Vtime generic irqtime accounting has been removed but there are a few remnants to clean up: * The vtime_accounting_cpu_enabled() check in irq entry was only used by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it. * Without the vtime_accounting_cpu_enabled(), we no longer need to have a vtime_common_account_irq_enter() indirect function. * Move vtime_account_irq_enter() implementation under CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user. * The vtime_account_user() call was only used on irq entry for CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING codeRik van Riel
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not appear to currently work right. On CPUs without nohz_full=, only tick based irq time sampling is done, which breaks down when dealing with a nohz_idle CPU. On firewalls and similar systems, no ticks may happen on a CPU for a while, and the irq time spent may never get accounted properly. This can cause issues with capacity planning and power saving, which use the CPU statistics as inputs in decision making. Remove the VTIME_GEN vtime irq time code, and replace it with the IRQ_TIME_ACCOUNTING code, when selected as a config option by the user. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14Merge branch 'sched/core' into timers/nohz, to avoid conflicts in upcoming ↵Ingo Molnar
patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14Merge tag 'iio-for-4.8c' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Third set of IIO new device support, features and cleanups for the 4.8 cycle. New core features - Selection of the clock source for IIO timestamps. This is done per device as it makes little sense to have events in one timebase and data timestamped on another. Biggest reason for this is that we currently use a clock source which is non monotonic which can result in 'interesting' data sets. (Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind in an earlier version.) - MAINTAINERS add the git tree to the list for IIO. New device support + a kind of indirect staging graduation. * Broadcom iproc-static-adc - new driver * mcp4531 - support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers * mpu6050 - support the IC20608 6 axis motion tracking device * st-sensors - support the lis3l02dq + drop the lis3l02dq driver from staging. The general purpose driver is missing event support, but good to get rid of this driver which was rather long in the tooth. New driver features * ak8975 - Add vid regulator support and refactor handling in general. - Allow a delay after enabling regulators. - Runtime and system PM. * bmg160 - filter frequency control support. * bmp280 - SPI device support. - EOC interrupt support for the BMP085 - power management support. - supply regulator support. - reset gpio support - dt bindings for reset gpio and regulators. - of table to support device tree registration * max1363 - Device tree bindings. * mcp4531 - Device tree bindings. * st-pressure - temperature channels as part of triggered buffer (previously not due probably to alignment issues - see below). - lps22hb open drain interrupt support. - lps22hb temperature channel support Cleanups and reworkings. * numerous ADC drivers - ensure the iio_dev->dev.of_node is set to the parent dev.of_node so as to allow client bindings to find the device. * ak8975 - Fix incorrect handling of missing regulator - make sure power is down and remove. * bmp280 - read the calibration data only once as it doesn't change. * isl29125 - Use a few macros to make code a touch more readable. * mma8452 - fix a memory leak on error. - drop an unecessary bit of return value handling. * potentiometer kconfig - typo fix. * st-pressure - drop some uninformative default assignments of elements of the channel array structure (aids readability). * st-sensors - Harden interrupt handling considerably. These are actually all using level interrupts, but at least two known boards have them wired to edge only interrupt chips. Hence a slightly interesting bit of handling is needed in which we first allow for the easy option (level triggered) and secondly check the status registers before reenabling edge interrupts and fall back to a tight loop in the thread until we successfully clear the interrupt. No harm is done if we never succeed in doing so. It's an odd patch that has been through a lot of revisions to reach a consensus on how to handle what is basically broken hardware (which the previous defaults allowed to kind of work). - Fix alignment to defined storagebytes boundaries. - Ensure alignment of power of 2 byte boundaries. This has always in theory been part of the ABI of IIO, but we missed a few that snuck in that need fixing. The effect was minor as they were only followed by timestamp channels which were correctly aligned, - Add some docs to explain the gain calculations.