summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2014-11-14mm: Update generic gup implementation to handle hugepage directoryAneesh Kumar K.V
Update generic gup implementation with powerpc specific details. On powerpc at pmd level we can have hugepte, normal pmd pointer or a pointer to the hugepage directory. Tested-by: Steve Capper <steve.capper@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/chelsio/cxgb4vf/sge.c drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c sge.c was overlapping two changes, one to use the new __dev_alloc_page() in net-next, and one to use s->fl_pg_order in net. ixgbe_phy.c was a set of overlapping whitespace changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) sunhme driver lacks DMA mapping error checks, based upon a report by Meelis Roos. 2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee. 3) DMA memory allocation sizes are wrong in systemport ethernet driver, fix from Florian Fainelli. 4) Fix use after free in mac80211 defragmentation code, from Johannes Berg. 5) Some networking uapi headers missing from Kbuild file, from Stephen Hemminger. 6) TUN driver gets csum_start offset wrong when VLAN accel is enabled, and macvtap has a similar bug, from Herbert Xu. 7) Adjust several tunneling drivers to set dev->iflink after registry, because registry sets that to -1 overwriting whatever we did. From Steffen Klassert. 8) Geneve forgets to set inner tunneling type, causing GSO segmentation to fail on some NICs. From Jesse Gross. 9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and Giuseppe CAVALLARO. 10) Fix spurious timeouts with NewReno on low traffic connections, from Marcelo Leitner. 11) Fix descriptor updates in enic driver, from Govindarajulu Varadarajan. 12) PPP calls bpf_prog_create() with locks held, which isn't kosher. Fix from Takashi Iwai. 13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel Borkmann. 14) psock_fanout selftest accesses past the end of the mmap ring, fix from Shuah Khan. 15) Fix PTP timestamping for VLAN packets, from Richard Cochran. 16) netlink_unbind() calls in netlink pass wrong initial argument, from Hiroaki SHIMODA. 17) vxlan socket reuse accidently reuses a socket when the address family is different, so we have to explicitly check this, from Marcelo Lietner. 18) Fix missing include in nft_reject_bridge.c breaking the build on ppc and other architectures, from Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits) vxlan: Do not reuse sockets for a different address family smsc911x: power-up phydev before doing a software reset. lib: rhashtable - Remove weird non-ASCII characters from comments net/smsc911x: Fix delays in the PHY enable/disable routines net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode netlink: Properly unbind in error conditions. net: ptp: fix time stamp matching logic for VLAN packets. cxgb4 : dcb open-lldp interop fixes selftests/net: psock_fanout seg faults in sock_fanout_read_ring() net: bcmgenet: apply MII configuration in bcmgenet_open() net: bcmgenet: connect and disconnect from the PHY state machine net: qualcomm: Fix dependency ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx net: phy: Correctly handle MII ioctl which changes autonegotiation. ipv6: fix IPV6_PKTINFO with v4 mapped net: sctp: fix memory leak in auth key management net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet net: ppp: Don't call bpf_prog_create() in ppp_lock net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set cxgb4 : Fix bug in DCB app deletion ...
2014-11-13mem-hotplug: reset node managed pages when hot-adding a new pgdatTang Chen
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13mm/page_alloc: fix incorrect isolation behavior by rechecking migratetypeJoonsoo Kim
Before describing bugs itself, I first explain definition of freepage. 1. pages on buddy list are counted as freepage. 2. pages on isolate migratetype buddy list are *not* counted as freepage. 3. pages on cma buddy list are counted as CMA freepage, too. Now, I describe problems and related patch. Patch 1: There is race conditions on getting pageblock migratetype that it results in misplacement of freepages on buddy list, incorrect freepage count and un-availability of freepage. Patch 2: Freepages on pcp list could have stale cached information to determine migratetype of buddy list to go. This causes misplacement of freepages on buddy list and incorrect freepage count. Patch 4: Merging between freepages on different migratetype of pageblocks will cause freepages accouting problem. This patch fixes it. Without patchset [3], above problem doesn't happens on my CMA allocation test, because CMA reserved pages aren't used at all. So there is no chance for above race. With patchset [3], I did simple CMA allocation test and get below result: - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation - run kernel build (make -j16) on background - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval - Result: more than 5000 freepage count are missed With patchset [3] and this patchset, I found that no freepage count are missed so that I conclude that problems are solved. On my simple memory offlining test, these problems also occur on that environment, too. This patch (of 4): There are two paths to reach core free function of buddy allocator, __free_one_page(), one is free_one_page()->__free_one_page() and the other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page(). Each paths has race condition causing serious problems. At first, this patch is focused on first type of freepath. And then, following patch will solve the problem in second type of freepath. In the first type of freepath, we got migratetype of freeing page without holding the zone lock, so it could be racy. There are two cases of this race. 1. pages are added to isolate buddy list after restoring orignal migratetype CPU1 CPU2 get migratetype => return MIGRATE_ISOLATE call free_one_page() with MIGRATE_ISOLATE grab the zone lock unisolate pageblock release the zone lock grab the zone lock call __free_one_page() with MIGRATE_ISOLATE freepage go into isolate buddy list, although pageblock is already unisolated This may cause two problems. One is that we can't use this page anymore until next isolation attempt of this pageblock, because freepage is on isolate buddy list. The other is that freepage accouting could be wrong due to merging between different buddy list. Freepages on isolate buddy list aren't counted as freepage, but ones on normal buddy list are counted as freepage. If merge happens, buddy freepage on normal buddy list is inevitably moved to isolate buddy list without any consideration of freepage accouting so it could be incorrect. 2. pages are added to normal buddy list while pageblock is isolated. It is similar with above case. This also may cause two problems. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage would be added to normal buddy list so that it could be allocated without any restriction. And the other problem is same as case 1, that it, incorrect freepage accouting. This race condition would be prevented by checking migratetype again with holding the zone lock. Because it is somewhat heavy operation and it isn't needed in common case, we want to avoid rechecking as much as possible. So this patch introduce new variable, nr_isolate_pageblock in struct zone to check if there is isolated pageblock. With this, we can avoid to re-check migratetype in common case and do it only if there is isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above mentioned problems. Changes from v3: Add one more check in free_one_page() that checks whether migratetype is MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13rhashtable: Drop gfp_flags arg in insert/remove functionsThomas Graf
Reallocation is only required for shrinking and expanding and both rely on a mutex for synchronization and callers of rhashtable_init() are in non atomic context. Therefore, no reason to continue passing allocation hints through the API. Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow for silent fall back to vzalloc() without the OOM killer jumping in as pointed out by Eric Dumazet and Eric W. Biederman. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Support more than 64 VFsMatan Barak
We now allow up to 126 VFs. Note though that certain firmware versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs. In these cases, we limit the maximum number of VFs to 64. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for ↵Matan Barak
PF/VFs Previously, the driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13rhashtable: Add parent argument to mutex_is_heldHerbert Xu
Currently mutex_is_held can only test locks in the that are global since it takes no arguments. This prevents rhashtable from being used in places where locks are lock, e.g., per-namespace locks. This patch adds a parent field to mutex_is_held and rhashtable_params so that local locks can be used (and tested). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13rhashtable: Move mutex_is_held under PROVE_LOCKINGHerbert Xu
The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch makes the mutex_is_held field in rhashtable optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', ↵Paul E. McKenney
'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD cpu.2014.11.03a: Changes for per-CPU variables. doc.2014.11.13a: Documentation updates. fixes.2014.11.13a: Miscellaneous fixes. signal.2014.10.29a: Signal changes. rt.2014.10.29a: Real-time changes. torture.2014.11.03a: torture-test changes.
2014-11-13rcu: More info about potential deadlocks with rcu_read_unlock()Oleg Nesterov
The comment above rcu_read_unlock() explains the potential deadlock if the caller holds one of the locks taken by rt_mutex_unlock() paths, but it is not clear from this documentation that any lock which can be taken from interrupt can lead to deadlock as well and we need to take rt_mutex_lock() into account too. The problem is that rt_mutex_lock() takes wait_lock without disabling irqs, and thus an interrupt taking some LOCK can obviously race with rcu_read_unlock_special() called with the same LOCK held. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13rcu: Optimize cond_resched_rcu_qs()Paul E. McKenney
The current implementation of cond_resched_rcu_qs() can invoke rcu_note_voluntary_context_switch() twice in the should_resched() case, once via the call to __schedule() and once directly. However, as noted by Joe Lawrence in a patch to the team subsystem, cond_resched() returns an indication as to whether or not the call to __schedule() actually happened. This commit therefore changes cond_resched_rcu_qs() so as to invoke rcu_note_voluntary_context_switch() only when __schedule() was not called. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13rcu: Add sparse check for RCU_INIT_POINTER()Pranith Kumar
Add a sparse check when RCU_INIT_POINTER() is used to assign a non __rcu annotated pointer. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13PCI: Remove unused and broken to_hotplug_slot()Gavin Shan
to_hotplug_slot() is unused and wouldn't work anyway, because struct hotplug_slot no longer contains a struct kobject (it was removed by f46753c5e354 ("PCI: introduce pci_slot")). Remove to_hotplug_slot(). [bhelgaas: changelog] Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-11-13ARM: OMAP3: clock: add support for dpll4_set_rate_and_parentTero Kristo
Expand the support of omap4 per-dpll to provide set_rate_and_parent. This is required for proper behavior of clk_change_rate with determine_rate support. Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
2014-11-13ARM: OMAP4: clock: add support for determine_rate for omap4 regm4xen DPLLTero Kristo
Similarly to OMAP3 noncore DPLL, the implementation of this DPLL clock type is wrong. This patch adds basic functionality for determine_rate for this clock type which will be taken into use in the patches following later. Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
2014-11-13ARM: OMAP3: clock: add new rate changing logic support for noncore DPLLsTero Kristo
Currently, DPLL code hides the re-parenting within its internals, which is wrong. This needs to be exposed to the common clock code via determine_rate and set_rate_and_parent APIs. This patch adds support for these, which will be taken into use in the following patches. Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
2014-11-13crypto: doc - HASH API documentationStephan Mueller
The API function calls exported by the kernel crypto API for message digests to be used by consumers are documented. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13crypto: doc - CIPHER API documentationStephan Mueller
The API function calls exported by the kernel crypto API for signle block ciphers to be used by consumers are documented. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13crypto: doc - BLKCIPHER API documentationStephan Mueller
The API function calls exported by the kernel crypto API for synchronous block ciphers to be used by consumers are documented. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13crypto: doc - AEAD API documentationStephan Mueller
The API function calls exported by the kernel crypto API for AEAD ciphers to be used by consumers are documented. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13crypto: doc - ABLKCIPHER API documentationStephan Mueller
The API function calls exported by the kernel crypto API for asynchronous block ciphers to be used by consumers are documented. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13crypto: doc - cipher data structuresStephan Mueller
The data structure of struct crypto_alg together with various other data structures needed by cipher developers is documented wit all parameters that can be set by a developer of a transformation. All parameters that are internal to the crypto API are marked as such. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-13video/hdmi: Relicense header under MIT licenseThierry Reding
OpenBSD wants to reuse this file but needs the license to be more permissive. Acked-by: Alban Bedel <alban.bedel@avionic-design.de> Acked-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-11-13mac802154: add interframe spacing time handlingAlexander Aring
This patch adds a new interframe spacing time handling into mac802154 layer. Interframe spacing time is a time period between each transmit. This patch adds a high resolution timer into mac802154 and starts on xmit complete with corresponding interframe spacing expire time if ifs_handling is true. We make it variable because it depends if interframe spacing time is handled by transceiver or mac802154. At the timer complete function we wake the netdev queue again. This avoids new frame transmit in range of interframe spacing time. For synced driver we add no handling of interframe spacing time. This is currently a lack of support in all synced xmit drivers. I suppose it's working because the latency of workqueue which is needed to call spi_sync. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12Merge tag 'trace-fixes-v3.18-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Rabin Vincent found a way that tracing could cause an infinite loop in the kernel. The splice logic wants a full page from the ring buffer but the ring_buffer_wait() returns when there's any data in the ring buffer. The splice code would then continue the loop waiting for a full page. But if a full page never happens, the splice code will never sleep and just continue to loop. There's another case that Rabin fixed that could loop if there's no memory and kmalloc() constantly returns NULL" * tag 'trace-fixes-v3.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not risk busy looping in buffer splice tracing: Do not busy wait in buffer splice
2014-11-12Merge tag 'mfd-fixes-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fixes from Lee Jones: - register offset fix for stmpe - eradicate build warning when !PM in rtsx_pcr - fix device ID collision when multiple boards are connected in viperboard - use correct Regmap handle - fixing unhanded IRQs in max77693 - unmask MUIC IRQs in max77693 - clear VBUS & CHG bits so board doesn't reboot instead of poweroff in twl4030 * tag 'mfd-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: twl4030-power: Fix poweroff with PM configuration enabled mfd: max77693: Fix always masked MUIC interrupts mfd: max77693: Use proper regmap for handling MUIC interrupts mfd: viperboard: Fix platform-device id collision mfd: rtsx: Fix build warnings for !PM mfd: stmpe: Fix STMPE24xx GPMR LSB
2014-11-12cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logicDaniel Lezcano
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry method is not set. Otherwise for all the drivers, the time can be correctly measured. Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers for all the states, just invert the logic by replacing it by the flag CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle driver, remove the former flag from all the drivers and invert the logic with this flag in the different governor. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12nfs: fix pnfs direct write memory leakPeng Tao
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets. So we need to take care to free them when freeing dreq. Ideally this needs to be done inside layout driver where ds_cinfo.buckets are allocated. But buckets are attached to dreq and reused across LD IO iterations. So I feel it's OK to free them in the generic layer. Cc: stable@vger.kernel.org [v3.4+] Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12net: phy: add module_phy_driver macroJohan Hovold
Add helper macro for PHY drivers which do not do anything special in module init/exit. This will allow us to eliminate a lot of boilerplate code. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12usb: renesas_usbhs: expand USB-DMAC channels for R-Car Gen2Yoshihiro Shimoda
This patch expands USB-DMAC channels for R-Car Gen2 SoCs. The SoCs have 4 channels. If d{2,3}_{t,x}x_id are not set, this driver never uses the expanded USB-DMAC channels. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2014-11-12PCI/MSI: Rename "struct msi_chip" to "struct msi_controller"Yijing Wang
"msi_chip" isn't very descriptive, so rename it to "msi_controller". That tells a little more about what it does and is already used in device tree bindings. No functional change. [bhelgaas: changelog, change *only* the struct name so it's reviewable] Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-11-12scsi: add new scsi-command flag for tagged commandsChristoph Hellwig
Currently scsi piggy backs on the block layer to define the concept of a tagged command. But we want to be able to have block-level host-wide tags assigned even for untagged commands like the initial INQUIRY, so add a new SCSI-level flag for commands that are tagged at the scsi level, so that even commands without that set can have tags assigned to them. Note that this alredy is the case for the blk-mq code path, and this just lets the old path catch up with it. We also set this flag based upon sdev->simple_tags instead of the block queue flag, so that it is entirely independent of the block layer tagging, and thus always correct even if a driver doesn't use block level tagging yet. Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and removing it forces them to look for the proper replacement. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12blk-mq: add blk_mq_unique_tag()Bart Van Assche
The queuecommand() callback functions in SCSI low-level drivers need to know which hardware context has been selected by the block layer. Since this information is not available in the request structure, and since passing the hctx pointer directly to the queuecommand callback function would require modification of all SCSI LLDs, add a function to the block layer that allows to query the hardware context index. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12net: Remove __skb_alloc_page and __skb_alloc_pagesAlexander Duyck
Remove the two functions which are now dead code. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: Add device Rx page allocation functionAlexander Duyck
This patch implements __dev_alloc_pages and __dev_alloc_page. These are meant to replace the __skb_alloc_pages and __skb_alloc_page functions. The reason for doing this is that it occurred to me that __skb_alloc_page is supposed to be passed an sk_buff pointer, but it is NULL in all cases where it is used. Worse is that in the case of ixgbe it is passed NULL via the sk_buff pointer in the rx_buffer info structure which means the compiler is not correctly stripping it out. The naming for these functions is based on dev_alloc_skb and __dev_alloc_skb. There was originally a netdev_alloc_page, however that was passed a net_device pointer and this function is not so I thought it best to follow that naming scheme since that is the same difference between dev_alloc_skb and netdev_alloc_skb. In the case of anything greater than order 0 it is assumed that we want a compound page so __GFP_COMP is set for all allocations as we expect a compound page when assigning a page frag. The other change in this patch is to exploit the behaviors of the page allocator in how it handles flags. So for example we can always set __GFP_COMP and __GFP_MEMALLOC since they are ignored if they are not applicable or are overridden by another flag. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12ieee820154: add short_addr setting supportAlexander Aring
This patch adds support for setting short address via nl802154 framework. Also added a comment because a 0xffff seems to be valid address that we don't have a short address. This is a valid setting but we need more checks in upper layers to don't allow this address as source address. Also the current netlink interface doesn't allow to set the short_addr to 0xffff. Same for the 0xfffe short address which describes a not allocated short address. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12ieee820154: add pan_id setting supportAlexander Aring
This patch adds support for setting pan_id via nl802154 framework. Adding a comment because setting 0xffff as pan_id seems to be valid setting. The pan_id 0xffff as source pan is invalid. I am not sure now about this setting but for the current netlink interface this is an invalid setting, so we do the same now. Maybe we need to change that when we have coordinator support and association support. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-11s390/MSI: Use __msi_mask_irq() instead of default_msi_mask_irq()Yijing Wang
Now only s390/MSI use default_msi_mask_irq() and default_msix_mask_irq(), replace them with the common MSI mask IRQ functions __msi_mask_irq() and __msix_mask_irq(). Remove default_msi_mask_irq() and default_msix_mask_irq(). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com> CC: linux-s390@vger.kernel.org
2014-11-11Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()"Yijing Wang
The problem fixed by 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") has been fixed in a simpler way by a previous commit ("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask Bits"). The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505a9 are no longer needed, so revert the commit. default_msi_mask_irq() and default_msix_mask_irq() were added by 0e4ccb1505a9 and are still used by s390, so keep them for now. [bhelgaas: changelog] Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David Vrabel <david.vrabel@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: xen-devel@lists.xenproject.org
2014-11-11bcma: make it possible to specify a IRQ num in bcma_core_irq()Hauke Mehrtens
This moves bcma_core_irq() to main.c and add a extra parameter with a number so that we can return different irq number for devices with more than one. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-11Merge branch 'for-upstream' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2014-11-11net: kill netif_copy_real_num_queues()WANG Cong
vlan was the only user of netif_copy_real_num_queues(), but it no longer calls it after commit 4af429d29b341bb1735f04c2fb960178 ("vlan: lockless transmit path"). So we can just remove it. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11PM / Domains: Fix initial default state of the need_restore flagUlf Hansson
The initial state of the device's need_restore flag should'nt depend on the current state of the PM domain. For example it should be perfectly valid to attach an inactive device to a powered PM domain. The pm_genpd_dev_need_restore() API allow us to update the need_restore flag to somewhat cope with such scenarios. Typically that should have been done from drivers/buses ->probe() since it's those that put the requirements on the value of the need_restore flag. Until recently, the Exynos SOCs were the only user of the pm_genpd_dev_need_restore() API, though invoking it from a centralized location while adding devices to their PM domains. Due to that Exynos now have swithed to the generic OF-based PM domain look-up, it's no longer possible to invoke the API from a centralized location. The reason is because devices are now added to their PM domains during the probe sequence. Commit "ARM: exynos: Move to generic PM domain DT bindings" did the switch for Exynos to the generic OF-based PM domain look-up, but it also removed the call to pm_genpd_dev_need_restore(). This caused a regression for some of the Exynos drivers. To handle things more properly in the generic PM domain, let's change the default initial value of the need_restore flag to reflect that the state is unknown. As soon as some of the runtime PM callbacks gets invoked, update the initial value accordingly. Moreover, since the generic PM domain is verifying that all devices are both runtime PM enabled and suspended, using pm_runtime_suspended() while pm_genpd_poweroff() is invoked from the scheduled work, we can be sure of that the PM domain won't be powering off while having active devices. Do note that, the generic PM domain can still only know about active devices which has been activated through invoking its runtime PM resume callback. In other words, buses/drivers using pm_runtime_set_active() during ->probe() will still suffer from a race condition, potentially probing a device without having its PM domain being powered. That issue will have to be solved using a different approach. This a log from the boot regression for Exynos5, which is being fixed in this patch. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 308 at ../drivers/clk/clk.c:851 clk_disable+0x24/0x30() Modules linked in: CPU: 0 PID: 308 Comm: kworker/0:1 Not tainted 3.18.0-rc3-00569-gbd9449f-dirty #10 Workqueue: pm pm_runtime_work [<c0013c64>] (unwind_backtrace) from [<c0010dec>] (show_stack+0x10/0x14) [<c0010dec>] (show_stack) from [<c03ee4cc>] (dump_stack+0x70/0xbc) [<c03ee4cc>] (dump_stack) from [<c0020d34>] (warn_slowpath_common+0x64/0x88) [<c0020d34>] (warn_slowpath_common) from [<c0020d74>] (warn_slowpath_null+0x1c/0x24) [<c0020d74>] (warn_slowpath_null) from [<c03107b0>] (clk_disable+0x24/0x30) [<c03107b0>] (clk_disable) from [<c02cc834>] (gsc_runtime_suspend+0x128/0x160) [<c02cc834>] (gsc_runtime_suspend) from [<c0249024>] (pm_generic_runtime_suspend+0x2c/0x38) [<c0249024>] (pm_generic_runtime_suspend) from [<c024f44c>] (pm_genpd_default_save_state+0x2c/0x8c) [<c024f44c>] (pm_genpd_default_save_state) from [<c024ff2c>] (pm_genpd_poweroff+0x224/0x3ec) [<c024ff2c>] (pm_genpd_poweroff) from [<c02501b4>] (pm_genpd_runtime_suspend+0x9c/0xcc) [<c02501b4>] (pm_genpd_runtime_suspend) from [<c024a4f8>] (__rpm_callback+0x2c/0x60) [<c024a4f8>] (__rpm_callback) from [<c024a54c>] (rpm_callback+0x20/0x74) [<c024a54c>] (rpm_callback) from [<c024a930>] (rpm_suspend+0xd4/0x43c) [<c024a930>] (rpm_suspend) from [<c024bbcc>] (pm_runtime_work+0x80/0x90) [<c024bbcc>] (pm_runtime_work) from [<c0032a9c>] (process_one_work+0x12c/0x314) [<c0032a9c>] (process_one_work) from [<c0032cf4>] (worker_thread+0x3c/0x4b0) [<c0032cf4>] (worker_thread) from [<c003747c>] (kthread+0xcc/0xe8) [<c003747c>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c) ---[ end trace 40cd58bcd6988f12 ]--- Fixes: a4a8c2c4962bb655 (ARM: exynos: Move to generic PM domain DT bindings) Reported-and-tested0by: Sylwester Nawrocki <s.nawrocki@samsung.com> Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Reviewed-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-11net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETEShani Michaeli
When processing received traffic, pass CHECKSUM_COMPLETE status to the stack, with calculated checksum for non TCP/UDP packets (such as GRE or ICMP). Although the stack expects checksum which doesn't include the pseudo header, the HW adds it. To address that, we are subtracting the pseudo header checksum from the checksum value provided by the HW. In the IPv6 case, we also compute/add the IP header checksum which is not added by the HW for such packets. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11ftrace: Add more information to ftrace_bug() outputSteven Rostedt (Red Hat)
With the introduction of the dynamic trampolines, it is useful that if things go wrong that ftrace_bug() produces more information about what the current state is. This can help debug issues that may arise. Ftrace has lots of checks to make sure that the state of the system it touchs is exactly what it expects it to be. When it detects an abnormality it calls ftrace_bug() and disables itself to prevent any further damage. It is crucial that ftrace_bug() produces sufficient information that can be used to debug the situation. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11virtio: Fix comment typo 'CONFIG_S_FAILED'Paul Bolle
Without the VIRTIO_ prefix CONFIG_S_FAILED looks like a Kconfig macro. So use that prefix here too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11module: Replace module_ref with atomic_t refcntMasami Hiramatsu
Replace module_ref per-cpu complex reference counter with an atomic_t simple refcnt. This is for code simplification. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-10tracing: Do not busy wait in buffer spliceRabin Vincent
On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>