summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2015-12-08Merge branch 'for-4.4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Nothing too interesting. All are device specific additions and workarounds" * 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata/sata_fsl.c: add ATA_FLAG_NO_LOG_PAGE to blacklist the controller for log page reads libata-eh.c: Introduce new ata port flag for controller which lockup on read log page sata_sil: disable trim AHCI: Fix softreset failed issue of Port Multiplier sata/mvebu: use #ifdef around suspend/resume code ahci: Order SATA device IDs for codename Lewisburg ahci: Add Device ID for Intel Sunrise Point PCH
2015-12-08Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes four core perf fixes for misc bugs, three fixes to x86 PMU drivers, and two updates to old email addresses" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Do not send exit event twice perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell perf: Fix PERF_EVENT_IOC_PERIOD deadlock treewide: Remove old email address perf/x86: Fix LBR call stack save/restore perf: Update email address in MAINTAINERS perf/core: Robustify the perf_cgroup_from_task() RCU checks perf/core: Fix RCU problem with cgroup context switching code
2015-12-08mlx4: Expose correct max_sge_rd limitSagi Grimberg
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', ↵Paul E. McKenney
'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD doc.2015.12.05a: Documentation updates exp.2015.12.07a: Expedited grace-period updates fixes.2015.12.07a: Miscellaneous fixes list.2015.12.04b: Linked-list updates torture.2015.12.05a: Torture-test updates
2015-12-07list: Add lockless list traversal primitivesAlexey Kardashevskiy
Although list_for_each_entry_rcu() can in theory be used anywhere preemption is disabled, it can result in calls to lockdep, which cannot be used in certain constrained execution environments, such as exception handlers that do not map the entire kernel into their address spaces. This commit therefore adds list_entry_lockless() and list_for_each_entry_lockless(), which never invoke lockdep and can therefore safely be used from these constrained environments, but only as long as those environments are non-preemptible (or items are never deleted from the list). Use synchronize_sched(), call_rcu_sched(), or synchronize_sched_expedited() in updates for the needed grace periods. Of course, if items are never deleted from the list, there is no need to wait for grace periods. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07rcu: Fix comment for rcu_dereference_raw_notraceAlexey Kardashevskiy
rcu_dereference_raw() calls indirectly rcu_read_lock_held() while rcu_dereference_raw_notrace() does not so fix the comment about the latter. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()Paul E. McKenney
This commit replaces a local_irq_save()/local_irq_restore() pair with a lockdep assertion that interrupts are already disabled. This should remove the corresponding overhead from the interrupt entry/exit fastpaths. This change was inspired by the fact that Iftekhar Ahmed's mutation testing showed that removing rcu_irq_enter()'s call to local_ird_restore() had no effect, which might indicate that interrupts were always enabled anyway. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07rcu: Remove TINY_RCU bloat from pointless boot parametersPaul E. McKenney
The rcu_expedited, rcu_normal, and rcu_normal_after_boot kernel boot parameters are pointless in the case of TINY_RCU because in that case synchronous grace periods, both expedited and normal, are no-ops. However, these three symbols contribute several hundred bytes of bloat. This commit therefore uses CPP directives to avoid compiling this code in TINY_RCU kernels. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-07qed: fix handling of concurrent ramrods.Tomer Tayar
Concurrent non-blocking slowpath ramrods can be completed out-of-order on the completion chain. Recycling completed elements, while previously sent elements are still completion pending, can lead to overriding of active elements on the chain. Furthermore, sending pending slowpath ramrods currently lacks the update of the chain element physical pointer. This patch: * Ensures that ramrods are sent to the FW with consecutive echo values. * Handles out-of-order completions by freeing only first successive completed entries. * Updates the chain element physical pointer when copying a pending element into a free element for sending. Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07qed: Fix corner case for chain in-between pagesTomer Tayar
The amount of chain next pointer elements between the producer and the consumer indices depends on which pages they currently point to. The current calculation is based only on their difference, and it can lead to a number of free elements which is higher by 1 than the actual value. Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07lightnvm: replace req queue with nvmdev for lldMatias Bjørling
In the case where a request queue is passed to the low lever lightnvm device drive integration, the device driver might pass its admin commands through another queue. Instead pass nvm_dev, and let the low level drive the appropriate queue. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-07lightnvm: comments on constantsMatias Bjørling
It is not obvious what NVM_IO_* and NVM_BLK_T_* are used for. Make sure to comment them appropriately as the other constants. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-07libata-eh.c: Introduce new ata port flag for controller which lockup on read ↵Andreas Werner
log page Some controller lockup on a ata_read_log_page. Add new ata port flag ATA_FLAG_NO_LOG_PAGE which can used to blacklist a controller. If this flag is set, any attempt to read a log page returns an error without actually issuing the command. Signed-off-by: Andreas Werner <andreas.werner@men.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-12-07Merge branch 'master' into for-4.4-fixesTejun Heo
The following commit which went into mainline through networking tree 3b13758f51de ("cgroups: Allow dynamically changing net_classid") conflicts in net/core/netclassid_cgroup.c with the following pending fix in cgroup/for-4.4-fixes. 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling") The former separates out update_classid() from cgrp_attach() and updates it to walk all fds of all tasks in the target css so that it can be used from both migration and config change paths. The latter drops @css from cgrp_attach(). Resolve the conflict by making cgrp_attach() call update_classid() with the css from the first task. We can revive @tset walking in cgrp_attach() but given that net_cls is v1 only where there always is only one target css during migration, this is fine. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Nina Schiff <ninasc@fb.com>
2015-12-06net: remove unnecessary semicolon in netdev_alloc_pcpu_stats()Felix Fietkau
This semicolon causes a build error if the function call is wrapped in parentheses. Fixes: aabc92bbe3cf ("net: add __netdev_alloc_pcpu_stats() to indicate gfp flags") Reported-by: Imre Kaloz <kaloz@openwrt.org> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06tmpfs: listxattr should include POSIX ACL xattrsAndreas Gruenbacher
When a file on tmpfs has an ACL or a Default ACL, listxattr should include the corresponding xattr name. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06tmpfs: Use xattr handler infrastructureAndreas Gruenbacher
Use the VFS xattr handler infrastructure and get rid of similar code in the filesystem. For implementing shmem_xattr_handler_set, we need a version of simple_xattr_set which removes the attribute when value is NULL. Use this to implement kernfs_iop_removexattr as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06vfs: Distinguish between full xattr names and proper prefixesAndreas Gruenbacher
Add an additional "name" field to struct xattr_handler. When the name is set, the handler matches attributes with exactly that name. When the prefix is set instead, the handler matches attributes with the given prefix and with a non-empty suffix. This patch should avoid bugs like the one fixed in commit c361016a in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06posix acls: Remove duplicate xattr name definitionsAndreas Gruenbacher
Remove POSIX_ACL_XATTR_{ACCESS,DEFAULT} and GFS2_POSIX_ACL_{ACCESS,DEFAULT} and replace them with the definitions in <include/uapi/linux/xattr.h>. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06vfs: Remove vfs_xattr_cmpAndreas Gruenbacher
This function was only briefly used in security/integrity/evm, between commits 66dbc325 and 15647eb3. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is quite a bumper crop of fixes: three from Arnd correcting various build issues in some configurations, a lock recursion in qla2xxx. Two potentially exploitable issues in hpsa and mvsas, a potential null deref in st, a revert of a bdi registration fix that turned out to cause even more problems, a set of fixes to allow people who only defined MPT2SAS to still work after the mpt2/mpt3sas merger and a couple of fixes for issues turned up by the hyper-v storvsc driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility Revert "scsi: Fix a bdi reregistration race" mpt3sas: Add dummy Kconfig option for backwards compatibility Fix a memory leak in scsi_host_dev_release() block/sd: Fix device-imposed transfer length limits scsi_debug: fix prevent_allow+verify regressions MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem. sd: Make discard granularity match logical block size when LBPRZ=1 scsi: hpsa: select CONFIG_SCSI_SAS_ATTR scsi: advansys needs ISA dma api for ISA support scsi_sysfs: protect against double execution of __scsi_remove_device() st: fix potential null pointer dereference. scsi: report 'INQUIRY result too short' once per host advansys: fix big-endian builds qla2xxx: Fix rwlock recursion hpsa: logical vs bitwise AND typo mvsas: don't allow negative timeouts mpt3sas: Fix use sas_is_tlr_enabled API before enabling MPI2_SCSIIO_CONTROL_TLR_ON flag
2015-12-04list: Use WRITE_ONCE() when initializing list_head structuresPaul E. McKenney
Code that does lockless emptiness testing of non-RCU lists is relying on INIT_LIST_HEAD() to write the list head's ->next pointer atomically, particularly when INIT_LIST_HEAD() is invoked from list_del_init(). This commit therefore adds WRITE_ONCE() to this function's pointer stores that could affect the head's ->next pointer. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04list: Introduces generic list_splice_tail_init_rcu()Petko Manolov
The list_splice_init_rcu() can be used as a stack onto which full lists are pushed, but queue-like behavior is now needed by some security policies. This requires a list_splice_tail_init_rcu(). This commit therefore supplies a list_splice_tail_init_rcu() by pulling code common it and to list_splice_init_rcu() into a new __list_splice_init_rcu() function. This new function is based on the existing list_splice_init_rcu() implementation. Signed-off-by: Petko Manolov <petkan@mip-labs.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rcu: Stop disabling interrupts in scheduler fastpathsPaul E. McKenney
We need the scheduler's fastpaths to be, well, fast, and unnecessarily disabling and re-enabling interrupts is not necessarily consistent with this goal. Especially given that there are regions of the scheduler that already have interrupts disabled. This commit therefore moves the call to rcu_note_context_switch() to one of the interrupts-disabled regions of the scheduler, and removes the now-redundant disabling and re-enabling of interrupts from rcu_note_context_switch() and the functions it calls. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested by Peter Zijlstra. ]
2015-12-04rcu: Add rcu_normal kernel parameter to suppress expeditingPaul E. McKenney
Although expedited grace periods can be quite useful, and although their OS jitter has been greatly reduced, they can still pose problems for extreme real-time workloads. This commit therefore adds a rcu_normal kernel boot parameter (which can also be manipulated via sysfs) to suppress expedited grace periods, that is, to treat requests for expedited grace periods as if they were requests for normal grace periods. If both rcu_expedited and rcu_normal are specified, rcu_normal wins. This means that if you are relying on expedited grace periods to speed up boot, you will want to specify rcu_expedited on the kernel command line, and then specify rcu_normal via sysfs once boot completes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04rhashtable: Prevent spurious EBUSY errors on insertionHerbert Xu
Thomas and Phil observed that under stress rhashtable insertion sometimes failed with EBUSY, even though this error should only ever been seen when we're under attack and our hash chain length has grown to an unacceptable level, even after a rehash. It turns out that the logic for detecting whether there is an existing rehash is faulty. In particular, when two threads both try to grow the same table at the same time, one of them may see the newly grown table and thus erroneously conclude that it had been rehashed. This is what leads to the EBUSY error. This patch fixes this by remembering the current last table we used during insertion so that rhashtable_insert_rehash can detect when another thread has also done a resize/rehash. When this is detected we will give up our resize/rehash and simply retry the insertion with the new table. Reported-by: Thomas Graf <tgraf@suug.ch> Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-04Merge tag 'pm+acpi-4.4-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These fix a recent regression in the ACPI PCI host bridge initialization code, clean up some recent changes (generic power domains framework, ACPI AML debugger support), fix three older but annoying bugs (PCI power management. generic power domains framework, cpufreq) and a build problem (device properties framework), and update a stale MAINTAINERS entry (ACPI backlight driver). Specifics: - Fix a regression in the ACPI PCI host bridge initialization code introduced by the recent consolidation of the host bridge handling on x86 and ia64 that forgot to take one special piece of code related to NUMA on x86 into account (Liu Jiang). - Improve the Kconfig help description of the new ACPI AML debugger support option to avoid possible confusion (Peter Zijlstra). - Remove a piece of code in the generic power domains framework that should have been removed by one of the recent commits modifying that code (Ulf Hansson). - Reduce the log level of a PCI PM message that generates a lot of false-positive log noise for some drivers and improve the message itself while at it (Imre Deak). - Fix the OF-based domain lookup code in the generic power domains framework to make it drop references to DT nodes correctly (Eric Anholt). - Prevent the cpufreq core from setting the policy back to the default after a CPU offline/online cycle for cpufreq drivers providing the ->setpolicy callback (Srinivas Pandruvada). - Fix a build problem for CONFIG_ACPI unset in the device properties framework (Hanjun Guo). - Fix a stale file path in the ACPI backlight driver entry in MAINTAINERS (Dan Carpenter)" * tag 'pm+acpi-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach() cpufreq: use last policy after online for drivers with ->setpolicy PCI / PM: Tune down retryable runtime suspend error messages PM / Domains: Validate cases of a non-bound driver in genpd governor MAINTAINERS: ACPI / video: update a file name in drivers/acpi/ ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n x86/PCI/ACPI: Fix regression caused by commit 4d6b4e69a245 ACPI: Better describe ACPI_DEBUGGER
2015-12-04Revert: "vfio: Include No-IOMMU mode"Alex Williamson
Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack of a user. This was originally intended to fill a need for the DPDK driver, but uptake has been slow so rather than support an unproven kernel interface revert it and revisit when userspace catches up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-12-04Merge branches 'pm-domains' and 'pm-cpufreq'Rafael J. Wysocki
* pm-domains: PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach() PM / Domains: Validate cases of a non-bound driver in genpd governor * pm-cpufreq: cpufreq: use last policy after online for drivers with ->setpolicy
2015-12-04Merge branches 'acpica', 'acpi-video' and 'device-properties'Rafael J. Wysocki
* acpica: ACPI: Better describe ACPI_DEBUGGER * acpi-video: MAINTAINERS: ACPI / video: update a file name in drivers/acpi/ * device-properties: ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n
2015-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A lot of Thanksgiving turkey leftovers accumulated, here goes: 1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg. 2) IDs for some new iwlwifi chips, from Oren Givon. 3) Fix rtlwifi lockups on boot, from Larry Finger. 4) Fix memory leak in fm10k, from Stephen Hemminger. 5) We have a route leak in the ipv6 tunnel infrastructure, fix from Paolo Abeni. 6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim. 7) Wrong lockdep annotations in tcp md5 support, fix from Eric Dumazet. 8) Work around some middle boxes which prevent proper handling of TCP Fast Open, from Yuchung Cheng. 9) TCP repair can do huge kmalloc() requests, build paged SKBs instead. From Eric Dumazet. 10) Fix msg_controllen overflow in scm_detach_fds, from Daniel Borkmann. 11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from Nikolay Aleksandrov. 12) Fix use after free in epoll with AF_UNIX sockets, from Rainer Weikusat. 13) Fix double free in VRF code, from Nikolay Aleksandrov. 14) Fix skb leaks on socket receive queue in tipc, from Ying Xue. 15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian. 16) Fix clearing of persistent array maps in bpf, from Daniel Borkmann. 17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq early enough. From Eric Dumazet. 18) Fix out of bounds accesses in bpf array implementation when updating elements, from Daniel Borkmann. 19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric Dumazet. 20) When dumping proxy neigh entries, we have to accomodate NULL device pointers properly, from Konstantin Khlebnikov. 21) SCTP doesn't release all ipv6 socket resources properly, fix from Eric Dumazet. 22) Prevent underflows of sch->q.qlen for multiqueue packet schedulers, also from Eric Dumazet. 23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey Huang and Michael Chan. 24) Don't actively scan radar channels, from Antonio Quartulli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits) net: phy: reset only targeted phy bnxt_en: Setup uc_list mac filters after resetting the chip. bnxt_en: enforce proper storing of MAC address bnxt_en: Fixed incorrect implementation of ndo_set_mac_address net: lpc_eth: remove irq > NR_IRQS check from probe() net_sched: fix qdisc_tree_decrease_qlen() races openvswitch: fix hangup on vxlan/gre/geneve device deletion ipv4: igmp: Allow removing groups from a removed interface ipv6: sctp: implement sctp_v6_destroy_sock() arm64: bpf: add 'store immediate' instruction ipv6: kill sk_dst_lock ipv6: sctp: add rcu protection around np->opt net/neighbour: fix crash at dumping device-agnostic proxy entries sctp: use GFP_USER for user-controlled kmalloc sctp: convert sack_needed and sack_generation to bits ipv6: add complete rcu protection around np->opt bpf: fix allocation warnings in bpf maps and integer overflow mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 net: mvneta: enable setting custom TX IP checksum limit net: mvneta: fix error path for building skb ...
2015-12-03Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A collection of fixes from this series. The most important here is a regression fix for an issue that some folks would hit in blk-merge.c, and the NVMe queue depth limit for the screwed up Apple "nvme" controller. In more detail, this pull request contains: - a set of fixes for null_blk, including a fix for a few corner cases where we could hang the device. From Arianna and Paolo. - lightnvm: - A build improvement from Keith. - Update the qemu pci id detection from Matias. - Error handling fixes for leaks and other little fixes from Sudip and Wenwei. - fix from Eric where BLKRRPART would not return EBUSY for whole device mounts, only when partitions were mounted. - fix from Jan Kara, where EOF O_DIRECT reads would return negatively. - remove check for rq_mergeable() when checking limits for cloned requests. The check doesn't make any sense. It's assuming that since NOMERGE is set on the request that we don't have to recalculate limits since the request didn't change, but that's not true if the request has been redirected. From Hannes. - correctly get the bio front segment value set for single segment bio's, fixing a BUG() in blk-merge. From Ming" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: temporary fix for Apple controller reset null_blk: change type of completion_nsec to unsigned long null_blk: guarantee device restart in all irq modes null_blk: set a separate timer for each command blk-merge: fix computing bio->bi_seg_front_size in case of single segment direct-io: Fix negative return from dio read beyond eof block: Always check queue limits for cloned requests lightnvm: missing nvm_lock acquire lightnvm: unconverted ppa returned in get_bb_tbl lightnvm: refactor and change vendor id for qemu lightnvm: do device max sectors boundary check first lightnvm: fix ioctl memory leaks lightnvm: free memory when gennvm register fails lightnvm: Simplify config when disabled Return EBUSY from BLKRRPART for mounted whole-dev fs
2015-12-03Merge branch 'mkp-fixes' into fixesJames Bottomley
2015-12-03cgroup: fix handling of multi-destination migration from subtree_control ↵Tejun Heo
enabling Consider the following v2 hierarchy. P0 (+memory) --- P1 (-memory) --- A \- B P0 has memory enabled in its subtree_control while P1 doesn't. If both A and B contain processes, they would belong to the memory css of P1. Now if memory is enabled on P1's subtree_control, memory csses should be created on both A and B and A's processes should be moved to the former and B's processes the latter. IOW, enabling controllers can cause atomic migrations into different csses. The core cgroup migration logic has been updated accordingly but the controller migration methods haven't and still assume that all tasks migrate to a single target css; furthermore, the methods were fed the css in which subtree_control was updated which is the parent of the target csses. pids controller depends on the migration methods to move charges and this made the controller attribute charges to the wrong csses often triggering the following warning by driving a counter negative. WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40() Modules linked in: CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29 ... ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000 ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00 ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8 Call Trace: [<ffffffff81551ffc>] dump_stack+0x4e/0x82 [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0 [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40 [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0 [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330 [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190 [<ffffffff81189016>] cgroup_attach_task+0x176/0x200 [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460 [<ffffffff81189684>] cgroup_procs_write+0x14/0x20 [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0 [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190 [<ffffffff81265f88>] __vfs_write+0x28/0xe0 [<ffffffff812666fc>] vfs_write+0xac/0x1a0 [<ffffffff81267019>] SyS_write+0x49/0xb0 [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76 This patch fixes the bug by removing @css parameter from the three migration methods, ->can_attach, ->cancel_attach() and ->attach() and updating cgroup_taskset iteration helpers also return the destination css in addition to the task being migrated. All controllers are updated accordingly. * Controllers which don't care whether there are one or multiple target csses can be converted trivially. cpu, io, freezer, perf, netclassid and netprio fall in this category. * cpuset's current implementation assumes that there's single source and destination and thus doesn't support v2 hierarchy already. The only change made by this patchset is how that single destination css is obtained. * memory migration path already doesn't do anything on v2. How the single destination css is obtained is updated and the prep stage of mem_cgroup_can_attach() is reordered to accomodate the change. * pids is the only controller which was affected by this bug. It now correctly handles multi-destination migrations and no longer causes counter underflow from incorrect accounting. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-02ipv6: add complete rcu protection around np->optEric Dumazet
This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02cpufreq: use last policy after online for drivers with ->setpolicySrinivas Pandruvada
For cpufreq drivers which use setpolicy interface, after offline->online the policy is set to default. This can be reproduced by setting the default policy of intel_pstate or longrun to ondemand and then change to "performance". After offline and online, the setpolicy will be called with the policy=ondemand. For drivers using governors this condition is handled by storing last_governor, during offline and restoring during online. The same should be done for drivers using setpolicy interface. Storing last_policy during offline and restoring during online. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-02ACPI / property: fix compile error for acpi_node_get_property_reference() ↵Hanjun Guo
when CONFIG_ACPI=n In commit 60ba032ed76e ("ACPI / property: Drop size_prop from acpi_dev_get_property_reference()"), the argument "const char *cells_name" was dropped, but forgot to update the stub function in no-ACPI case, it will lead to compile error when CONFIG_ACPI=n, easliy remove "const char *cells_name" to fix it. Fixes: 60ba032ed76e "ACPI / property: Drop size_prop from acpi_dev_get_property_reference()" Reported-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-01net: fix sock_wake_async() rcu protectionEric Dumazet
Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATAEric Dumazet
This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-29packet: Allow packets with only a header (but no payload)Martin Blumenstingl
Commit 9c7077622dd91 ("packet: make packet_snd fail on len smaller than l2 header") added validation for the packet size in packet_snd. This change enforces that every packet needs a header (with at least hard_header_len bytes) plus a payload with at least one byte. Before this change the payload was optional. This fixes PPPoE connections which do not have a "Service" or "Host-Uniq" configured (which is violating the spec, but is still widely used in real-world setups). Those are currently failing with the following message: "pppd: packet size is too short (24 <= 24)" Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-29block: Always check queue limits for cloned requestsHannes Reinecke
When a cloned request is retried on other queues it always needs to be checked against the queue limits of that queue. Otherwise the calculations for nr_phys_segments might be wrong, leading to a crash in scsi_init_sgtable(). To clarify this the patch renames blk_rq_check_limits() to blk_cloned_rq_check_limits() and removes the symbol export, as the new function should only be used for cloned requests and never exported. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Fixes: e2a60da74 ("block: Clean up special command handling logic") Cc: stable@vger.kernel.org # 3.7+ Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-29lightnvm: unconverted ppa returned in get_bb_tblMatias Bjørling
The get_bb_tbl function takes ppa as a generic address, which is converted to the ppa device address within the device driver. When the update_bbtbl callback is called from get_bb_tbl, the device specific ppa is used, instead of the generic ppa. Make sure to pass the generic ppa. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull SCSI target fixes from Nicholas Bellinger: - fix tcm-user backend driver expired cmd time processing (agrover) - eliminate kref_put_spinlock_irqsave() for I/O completion (bart) - fix iscsi login kthread failure case hung task regression (nab) - fix COMPARE_AND_WRITE completion use-after-free race (nab) - fix COMPARE_AND_WRITE with SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC non zero SGL offset data corruption. (Jan + Doug) - fix >= v4.4-rc1 regression for tcm_qla2xxx enable configfs attribute (Himanshu + HCH) * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target/stat: print full t10_wwn.model buffer target: fix COMPARE_AND_WRITE non zero SGL offset data corruption qla2xxx: Fix regression introduced by target configFS changes kref: Remove kref_put_spinlock_irqsave() target: Invoke release_cmd() callback without holding a spinlock target: Fix race for SCF_COMPARE_AND_WRITE_POST checking iscsi-target: Fix rx_login_comp hang after login failure iscsi-target: return -ENOMEM instead of -1 in case of failed kmalloc() target/user: Do not set unused fields in tcmu_ops target/user: Fix time calc in expired cmd processing
2015-11-29Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management fixes from Zhang Rui: "Specifics: - several fixes and cleanups on Rockchip thermal drivers. - add the missing support of RK3368 SoCs in Rockchip driver. - small fixes on of-thermal, power_allocator, rcar driver, IMX, and QCOM drivers, and also compilation fixes, on thermal.h, when thermal is not selected" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: imx: thermal: use CPU temperature grade info for thresholds thermal: fix thermal_zone_bind_cooling_device prototype Revert "thermal: qcom_spmi: allow compile test" thermal: rcar_thermal: remove redundant operation thermal: of-thermal: Reduce log level for message when can't fine thermal zone thermal: power_allocator: Use temperature reading from tz thermal: rockchip: Support the RK3368 SoCs in thermal driver thermal: rockchip: consistently use int for temperatures thermal: rockchip: Add the sort mode for adc value increment or decrement thermal: rockchip: improve the conversion function thermal: rockchip: trivial: fix typo in commit thermal: rockchip: better to compatible the driver for different SoCs dt-bindings: rockchip-thermal: Support the RK3368 SoCs compatible
2015-11-28kref: Remove kref_put_spinlock_irqsave()Bart Van Assche
The last user is gone. Hence remove this function. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-11-28Merge tag 'pci-v4.4-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Here are a few fixes I'd like to have in v4.4: a generic one for sysfs and three for HiSilicon and DesignWare host controllers. Summary: NUMA: - Prevent out of bounds access in numa_node override (Mathias Krause) HiSilicon host bridge driver: - Fix deferred probing (Arnd Bergmann) Synopsys DesignWare host bridge driver: - Remove incorrect io_base assignment (Stanimir Varbanov) - Move align_resource function pointer to pci_host_bridge structure (Gabriele Paoloni)" * tag 'pci-v4.4-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ARM/PCI: Move align_resource function pointer to pci_host_bridge structure PCI: hisi: Fix deferred probing PCI: designware: Remove incorrect io_base assignment PCI: Prevent out of bounds access in numa_node override
2015-11-27Merge tag 'nfs-for-4.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a NFSv4 callback identifier leak that was also causing client crashes - Fix NFSv4 callback decoding issues when incoming requests are truncated - Don't declare the attribute cache valid when we call nfs_update_inode with an empty attribute structure. - Resend LAYOUTGET when there is a race that changes the seqid Bugfixes: - Fix a number of issues with the NFSv4.2 CLONE ioctl() - Properly set NFS v4.2 NFSDBG_FACILITY - NFSv4 referrals are broken; Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding success - Use sliding delay when LAYOUTGET gets NFS4ERR_DELAY - Ensure that attrcache is revalidated after a SETATTR" * tag 'nfs-for-4.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs4: resend LAYOUTGET when there is a race that changes the seqid nfs: if we have no valid attrs, then don't declare the attribute cache valid nfs: ensure that attrcache is revalidated after a SETATTR nfs4: limit callback decoding to received bytes nfs4: start callback_ident at idr 1 nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAY NFS4: Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding success NFS: Properly set NFS v4.2 NFSDBG_FACILITY nfs: reduce the amount of ifdefs for v4.2 in nfs4file.c nfs: use btrfs ioctl defintions for clone nfs: allow intra-file CLONE nfs: offer native ioctls even if CONFIG_COMPAT is set nfs: pass on count for CLONE operations
2015-11-27Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "There is a small backlog of at91 patches here, the most significant is the addition of some sama5d2 Xplained nodes that were waiting on an MFD include file to get merged through another tree. We normally try to sort those out before the merge window opens, but the maintainer wasn't aware of that here and I decided to merge the changes this time as an exception. On OMAP a series of audio changes for dra7 missed the merge window but turned out to be necessary to fix a boot time imprecise external abort error and to get audio working. The other changes are the usual simple changes, here is a list sorted by platform: at91: removal of a useless defconfig option removal of some legacy DT pieces use of the proper watchdog compatible string update of the MAINTAINERS entries for some Atmel drivers drivers/scpi: hide get_scpi_ops in module from built-in code imx: add missing .irq_set_type for i.MX GPC irq_chip. fix the wrong spi-num-chipselects settings for Vybrid DSPI devices. fix a merge error in Vybrid dts regarding to ADC device property keystone: fix the optional PDSP firmware loading fix linking RAM setup for QMs fix crash with clk_ignore_unused mediatek: Enable SCPSYS power domain driver by default mvebu: fix QNAP TS219 power-off in dts fix legacy get_irqnr_and_base for dove and orion5x omap: fix l4 related boot time errors for dm81xx use lockless cldm/pwrdm api in omap4_boot_secondary remove t410 abort handler to avoid hiding other critical errors mark cpuidle tracepoints as _rcuidle fix module alias for omap-ocp2scp pxa: palm: Fix typos in PWM lookup table code renesas: missing __initconst annotation for r8a7793_boards_compat_dt rockchip: disable mmc-tuning on the veyron-minnie board adding the init state for the over-temperature-protection zx: only build power domain code when CONFIG_PM=y" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits) ARM: OMAP4+: SMP: use lockless clkdm/pwrdm api in omap4_boot_secondary arm: omap2+: add missing HWMOD_NO_IDLEST in 81xx hwmod data ARM: orion5x: Fix legacy get_irqnr_and_base ARM: dove: Fix legacy get_irqnr_and_base soc: Mediatek: Enable SCPSYS power domain driver by default ARM: dts: vfxxx: Fix dspi[01] spi-num-chipselects. ARM: dts: keystone: k2l: fix kernel crash when clk_ignore_unused is not in bootargs soc: ti: knav_qmss_queue: Fix linking RAM setup for queue managers soc: ti: use request_firmware_direct() as acc firmware is optional ARM: imx: add platform irq type setting in gpc ARM: dts: vfxxx: Fix erroneous property in esdhc0 node ARM: shmobile: r8a7793: proper constness with __initconst scpi: hide get_scpi_ops in module from built-in code ARM: zx: only build power domain code when CONFIG_PM=y ARM: pxa: palm: Fix typos in PWM lookup table code ARM: dts: Kirkwood: Fix QNAP TS219 power-off ARM: dts: rockchip: Add OTP gpio pinctrl to rk3288 tsadc node ARM: dts: rockchip: temporarily remove emmc hs200 speed from rk3288 minnie MAINTAINERS: Atmel drivers: change NAND and ISI entries ARM: at91/dt: sama5d2 Xplained: add several devices ...
2015-11-27Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Build fix when !CONFIG_UID16 (the patch is touching generic files but it only affects arm64 builds; submitted by Arnd Bergmann) - EFI fixes to deal with early_memremap() returning NULL and correctly mapping run-time regions - Fix CPUID register extraction of unsigned fields (not to be sign-extended) - ASID allocator fix to deal with long-running tasks over multiple generation roll-overs - Revert support for marking page ranges as contiguous PTEs (it leads to TLB conflicts and requires additional non-trivial kernel changes) - Proper early_alloc() failure check - Disable KASan for 48-bit VA and 16KB page configuration (the pgd is larger than the KASan shadow memory) - Update the fault_info table (original descriptions based on early engineering spec) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: efi: fix initcall return values arm64: efi: deal with NULL return value of early_memremap() arm64: debug: Treat the BRPs/WRPs as unsigned arm64: cpufeature: Track unsigned fields arm64: cpufeature: Add helpers for extracting unsigned values Revert "arm64: Mark kernel page ranges contiguous" arm64: mm: keep reserved ASIDs in sync with mm after multiple rollovers arm64: KASAN depends on !(ARM64_16K_PAGES && ARM64_VA_BITS_48) arm64: efi: correctly map runtime regions arm64: mm: fix fault_info table xFSC decoding arm64: fix building without CONFIG_UID16 arm64: early_alloc: Fix check for allocation failure
2015-11-25block/sd: Fix device-imposed transfer length limitsMartin K. Petersen
Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests") had the unfortunate side-effect of removing an implicit clamp to BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer code. This caused problems for some SMR drives. Debugging this issue revealed a few problems with the existing infrastructure since the block layer didn't know how to deal with device-imposed limits, only limits set by the I/O controller. - Introduce a new queue limit, max_dev_sectors, which is used by the ULD to signal the maximum sectors for a REQ_TYPE_FS request. - Ensure that max_dev_sectors is correctly stacked and taken into account when overriding max_sectors through sysfs. - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer values for later processing. - In sd_revalidate() set the queue's max_dev_sectors based on the MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value is not reported, fall back to a cap based on the CDB TRANSFER LENGTH field size. - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits VPD--if reported and sane--to signal the preferred device transfer size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS. - blk_limits_max_hw_sectors() is no longer used and can be removed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581 Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: sweeneygj@gmx.com Tested-by: Arzeets <anatol.pomozov@gmail.com> Tested-by: David Eisner <david.eisner@oriel.oxon.org> Tested-by: Mario Kicherer <dev@kicherer.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>