summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-15net: ethernet: tc35815: use phy_ethtool_{get|set}_link_ksettingsPhilippe Reynes
There are two generics functions phy_ethtool_{get|set}_link_ksettings, so we can use them instead of defining the same code in the driver. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15net: ethernet: tc35815: use phydev from struct net_devicePhilippe Reynes
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phy in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15lkdtm: silence warnings about function declarationsKees Cook
When building under W=1, the lack of lkdtm.h in lkdtm_usercopy.c and lkdtm_rodata.c was discovered. This fixes the issue and consolidates the common header and the pr_fmt macro for simplicity and regularity across each test source file. Signed-off-by: Kees Cook <keescook@chromium.org>
2016-07-15lkdtm: hide unused functionsArnd Bergmann
A conversion of the lkdtm core module added an "#ifdef CONFIG_KPROBES" check, but a number of functions then become unused: drivers/misc/lkdtm_core.c:340:16: error: 'lkdtm_debugfs_entry' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:122:12: error: 'jp_generic_ide_ioctl' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:114:12: error: 'jp_scsi_dispatch_cmd' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:106:12: error: 'jp_hrtimer_start' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:97:22: error: 'jp_shrink_inactive_list' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:89:13: error: 'jp_ll_rw_block' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:83:13: error: 'jp_tasklet_action' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:75:20: error: 'jp_handle_irq_event' defined but not used [-Werror=unused-function] drivers/misc/lkdtm_core.c:68:21: error: 'jp_do_irq' defined but not used [-Werror=unused-function] This adds the same #ifdef everywhere. There is probably a better way to do the same thing, but for now this avoids the new warnings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: c479e3fd8870 ("lkdtm: use struct arrays instead of enums") [kees: moved some code around to better consolidate the #ifdefs] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-07-15net: bgmac: Fix infinite loop in bgmac_dma_tx_add()Florian Fainelli
Nothing is decrementing the index "i" while we are cleaning up the fragments we could not successful transmit. Fixes: 9cde94506eacf ("bgmac: implement scatter/gather support") Reported-by: coverity (CID 1352048) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15net: fixup for tracepoint napi:napi_pollJesper Dangaard Brouer
The recent change to tracepoint napi:napi_poll changed the order of the parameters that perf scripts sees, the printk was correct. The problem was that the new parameters (work and budget) were pushed in front of dev_name. The new parameters obviously need to be appended to keep backward compatible. Fixes: 1db19db7f5ff ("net: tracepoint napi:napi_poll add work and budget") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15f2fs: reset default idle interval valueChao Yu
The default value of idle interval is 2 mins, but for most time when screen shutdown, there are still operations during the 2 mins interval, and gc's sleep time is about 30 secs to 60 secs, so there is almost no chance for GC thread to do garbage collecting. Set default value of idle interval value from 2 mins to 5 secs for fixing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: use blk_plug in all the possible pathsJaegeuk Kim
This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging), and adds blk_plug in write paths additionally. The main reason is that blk_start_plug can be used to wake up from low-power mode before submitting further bios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: fix to avoid data update racing between GC and DIOChao Yu
Datas in file can be operated by GC and DIO simultaneously, so we will face race case as below: For write case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - dio_bio_submit update user data to old block address For read case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_balance_fs - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - write_checkpoint - do_checkpoint - clear_prefree_segments - f2fs_issue_discard discard old block adress - dio_bio_submit update user buffer from obsolete block address In order to fix this, for one file, we should let DIO and GC getting exclusion against with each other. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: add maximum prefree segmentsJaegeuk Kim
In 1TB storage, we need to admit 22841 prefree segments, which can consume too much segments. This patch sets 8GB in max. prefree segments in that case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: disable extent_cache for fcollapse/finsert inodesJaegeuk Kim
This reduces the elapsed time to do xfstests/generic/017. Before: 458 s After: 390 s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: refactor __exchange_data_block for speed upJaegeuk Kim
This reduces the elapsed time to do xfstests/generic/017. Before: 715 s After: 458 s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15f2fs: fix ERR_PTR returned by bioJaegeuk Kim
This is to fix wrong error pointer handling flow reported by Dan. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A few last-minute updates for the input subsystem" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: ts4800-ts - add missing of_node_put after calling of_parse_phandle Input: synaptics-rmi4 - use of_get_child_by_name() to fix refcount Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE" Input: xpad - validate USB endpoint count during probe Input: add SW_PEN_INSERTED define
2016-07-15Input: pixcir_ts - add support for axis inversion / swappingHans de Goede
Add support for axis inversion / swapping using the new touchscreen_parse_properties() and touchscreen_set_mt_pos() functionality. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-15Input: icn8318 - use of_touchscreen helpers for inverting / swapping axesHans de Goede
Use the touchscreen_parse_properties() and touchscreen_report_pos() to perform coordinates transformation, instead of DIY code, which results in a nice cleanup. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-15Input: edt-ft5x06 - add support for inverting / swapping axesHans de Goede
Add support for inverting / swapping axes using the new touchscreen_parse_properties() and touchscreen_report_pos() functionality. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-15Input: of_touchscreen - add support for inverted / swapped axesHans de Goede
Extend touchscreen_parse_properties() with support for the touchscreen-inverted-x/y and touchscreen-swapped-x-y properties and add touchscreen_set_mt_pos() and touchscreen_report_pos() helper functions for storing coordinates into a input_mt_pos struct, or directly reporting them, taking these properties into account. This commit also modifies the existing callers of touchscreen_parse_properties() to pass in NULL for the new third argument, keeping the existing behavior. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-15Merge branch 'mlxsw-fixes'David S. Miller
Jiri Pirko says: ==================== mlxsw: Couple of fixes Couple of fixes for mlxsw driver from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Prevent invalid ingress buffer mappingIdo Schimmel
Packets entering the switch are mapped to a Switch Priority (SP) according to their PCP value (untagged frames are mapped to SP 0). The packets are classified to a priority group (PG) buffer in the port's headroom according to their SP. The switch maintains another mapping (SP to IEEE priority), which is used to generate PFC frames for lossless PGs. This mapping is initialized to IEEE = SP % 8. Therefore, when mapping SP 'x' to PG 'y' we create a situation in which an IEEE priority is mapped to two different PGs: IEEE 'x' ---> SP 'x' ---> PG 'y' IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default) Which is invalid, as a flow can use only one PG buffer. Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer. Fixes: 8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Prevent overwrite of DCB capability fieldsIdo Schimmel
The number of supported traffic classes that can have ETS and PFC simultaneously enabled is not subject to user configuration, so make sure we always initialize them to the correct values following a set operation. Fixes: 8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support") Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Don't emit errors when PFC is disabledIdo Schimmel
We can't have PAUSE frames and PFC both enabled on the same port, but the fact that ieee_setpfc() was called doesn't necessarily mean PFC is enabled. Only emit errors when PAUSE frames and PFC are enabled simultaneously. Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Indicate support for autonegotiationIdo Schimmel
The device supports link autonegotiation, so let the user know about it by indicating support via ethtool ops. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15mlxsw: spectrum: Force link training according to admin stateIdo Schimmel
When setting a new speed we need to disable and enable the port for the changes to take effect. We currently only do that if the operational state of the port is up. However, setting a new speed following link training failure will require us to explicitly set the port down and then up. Instead, disable and enable the port based on its administrative state. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15macvtap: switch to use skb arrayJason Wang
This patch switch to use skb array instead of sk_receive_queue to avoid spinlock contentions. Tests shows about 21% improvements for guest rx pps: Before: 1472731 pkts/s After: 1786289 pkts/s Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15macvtap: avoid hash calculating for single queueJason Wang
We decide the rxq through calculating its hash which is not necessary if we only have one rx queue. So this patch skip this and just return queue 0. Test shows 22% improving on guest rx pps. Before: 1201504 pkts/s After: 1472731 pkts/s Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16Merge branch 'for-4.7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "The optimization for setting unbound worker affinity masks collided with recent scheduler changes triggering warning messages. This late pull request fixes the bug by removing the optimization" * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix setting affinity of unbound worker threads
2016-07-15PM / tools: scripts: AnalyzeSuspend v4.2Todd Brandt
Update AnalyzeSuspend to v4.2: - kprobe support for function tracing - config file support in lieu of command line options - advanced callgraph support for function debug - dev mode for monitoring common sources of delay, e.g. msleep, udelay - many bug fixes and formatting upgrades Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-16xfs: fix type confusion in xfs_ioc_swapextJann Horn
Without this check, the following XFS_I invocations would return bad pointers when used on non-XFS inodes (perhaps pointers into preceding allocator chunks). This could be used by an attacker to trick xfs_swap_extents into performing locking operations on attacker-chosen structures in kernel memory, potentially leading to code execution in the kernel. (I have not investigated how likely this is to be usable for an attack in practice.) Signed-off-by: Jann Horn <jann@thejh.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2016-07-14 This series contains fixes to i40e and ixgbe. Alex fixes issues found in i40e_rx_checksum() which was broken, where the checksum was being returned valid when it was not. Kiran fixes a bug which was found when we abruptly remove a cable which caused a panic. Set the VSI broadcast promiscuous mode during VSI add sequence and prevents adding MAC filter if specified MAC address is broadcast. Paolo Abeni fixes a bug by returning the actual work done, capped to weight - 1, since the core doesn't allow to return the full budget when the driver modifies the NAPI status. Guilherme Piccoli fixes an issue where the q_vector initialization routine sets the affinity _mask of a q_vector based on v_idx value. This means a loop iterates on v_idx, which is an incremental value, and the cpumask is created based on this value. This is a problem in systems with multiple logical CPUs per core (like in SMT scenarios). Changed the way q_vector's affinity_mask is created to resolve the issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15r8152: add MODULE_VERSIONGrant Grundler
ethtool -i provides a driver version that is hard coded. Export the same value via "modinfo". Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Merge branch 'bpf-event-output-helper-improvements'David S. Miller
Daniel Borkmann says: ==================== BPF event output helper improvements This set adds improvements to the BPF event output helper to support non-linear data sampling, here specifically, for skb context. For details please see individual patches. The set is based against net-next tree. v1 -> v2: - Integrated and adapted Peter's diff into patch 1, updated the remaining ones accordingly. Thanks Peter! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15bpf: avoid stack copy and use skb ctx for event outputDaniel Borkmann
This work addresses a couple of issues bpf_skb_event_output() helper currently has: i) We need two copies instead of just a single one for the skb data when it should be part of a sample. The data can be non-linear and thus needs to be extracted via bpf_skb_load_bytes() helper first, and then copied once again into the ring buffer slot. ii) Since bpf_skb_load_bytes() currently needs to be used first, the helper needs to see a constant size on the passed stack buffer to make sure BPF verifier can do sanity checks on it during verification time. Thus, just passing skb->len (or any other non-constant value) wouldn't work, but changing bpf_skb_load_bytes() is also not the proper solution, since the two copies are generally still needed. iii) bpf_skb_load_bytes() is just for rather small buffers like headers, since they need to sit on the limited BPF stack anyway. Instead of working around in bpf_skb_load_bytes(), this work improves the bpf_skb_event_output() helper to address all 3 at once. We can make use of the passed in skb context that we have in the helper anyway, and use some of the reserved flag bits as a length argument. The helper will use the new __output_custom() facility from perf side with bpf_skb_copy() as callback helper to walk and extract the data. It will pass the data for setup to bpf_event_output(), which generates and pushes the raw record with an additional frag part. The linear data used in the first frag of the record serves as programmatically defined meta data passed along with the appended sample. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15bpf, perf: split bpf_perf_event_outputDaniel Borkmann
Split the bpf_perf_event_output() helper as a preparation into two parts. The new bpf_perf_event_output() will prepare the raw record itself and test for unknown flags from BPF trace context, where the __bpf_perf_event_output() does the core work. The latter will be reused later on from bpf_event_output() directly. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15perf, events: add non-linear data support for raw recordsDaniel Borkmann
This patch adds support for non-linear data on raw records. It extends raw records to have one or multiple fragments that will be written linearly into the ring slot, where each fragment can optionally have a custom callback handler to walk and extract complex, possibly non-linear data. If a callback handler is provided for a fragment, then the new __output_custom() will be used instead of __output_copy() for the perf_output_sample() part. perf_prepare_sample() does all the size calculation only once, so perf_output_sample() doesn't need to redo the same work anymore, meaning real_size and padding will be cached in the raw record. The raw record becomes 32 bytes in size without holes; to not increase it further and to avoid doing unnecessary recalculations in fast-path, we can reuse next pointer of the last fragment, idea here is borrowed from ZERO_OR_NULL_PTR(), which should keep the perf_output_sample() path for PERF_SAMPLE_RAW minimal. This facility is needed for BPF's event output helper as a first user that will, in a follow-up, add an additional perf_raw_frag to its perf_raw_record in order to be able to more efficiently dump skb context after a linear head meta data related to it. skbs can be non-linear and thus need a custom output function to dump buffers. Currently, the skb data needs to be copied twice; with the help of __output_custom() this work only needs to be done once. Future users could be things like XDP/BPF programs that work on different context though and would thus also have a different callback function. The few users of raw records are adapted to initialize their frag data from the raw record itself, no change in behavior for them. The code is based upon a PoC diff provided by Peter Zijlstra [1]. [1] http://thread.gmane.org/gmane.linux.network/421294 Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15tcp: enable per-socket rate limiting of all 'challenge acks'Jason Baron
The per-socket rate limit for 'challenge acks' was introduced in the context of limiting ack loops: commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock") And I think it can be extended to rate limit all 'challenge acks' on a per-socket basis. Since we have the global tcp_challenge_ack_limit, this patch allows for tcp_challenge_ack_limit to be set to a large value and effectively rely on the per-socket limit, or set tcp_challenge_ack_limit to a lower value and still prevents a single connections from consuming the entire challenge ack quota. It further moves in the direction of eliminating the global limit at some point, as Eric Dumazet has suggested. This a follow-up to: Subject: tcp: make challenge acks less predictable Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15rxrpc: checking for IS_ERR() instead of NULLDan Carpenter
The rxrpc_lookup_peer() function returns NULL on error, it never returns error pointers. Fixes: 8496af50eb38 ('rxrpc: Use RCU to access a peer's service connection tree') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16media: fix airspy usb probe error pathJames Patrick-Evans
Fix a memory leak on probe error of the airspy usb device driver. The problem is triggered when more than 64 usb devices register with v4l2 of type VFL_TYPE_SDR or VFL_TYPE_SUBDEV. The memory leak is caused by the probe function of the airspy driver mishandeling errors and not freeing the corresponding control structures when an error occours registering the device to v4l2 core. A badusb device can emulate 64 of these devices, and then through continual emulated connect/disconnect of the 65th device, cause the kernel to run out of RAM and crash the kernel, thus causing a local DOS vulnerability. Fixes CVE-2016-5400 Signed-off-by: James Patrick-Evans <james@jmp-e.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-16EDAC, sb_edac: Fix Knights LandingTony Luck
In commit 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") I broke Knights Landing because I failed to notice that it called a wrapper macro "sbridge_get_all_devices_knl" instead of "sbridge_get_all_devices" like all the other types. Now that we include the processor type in the pci_id_table structure we can skip the wrappers and just have the sbridge_get_all_devices() check the type to decide whether to allow duplicate devices and controllers to have registers spread across buses. Fixes: 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection") Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernelLv Zheng
This patch enables ACPI_MUTEX_DEBUG for Linux kernel so that the ACPICA lock order issues can be captured by ACPICA itself. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-15x86 / hibernate: Use hlt_play_dead() when resuming from hibernationRafael J. Wysocki
On Intel hardware, native_play_dead() uses mwait_play_dead() by default and only falls back to the other methods if that fails. That also happens during resume from hibernation, when the restore (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs except for the boot one offline. However, that is problematic, because the address passed to __monitor() in mwait_play_dead() is likely to be written to in the last phase of hibernate image restoration and that causes the "dead" CPU to start executing instructions again. Unfortunately, the page containing the address in that CPU's instruction pointer may not be valid any more at that point. First, that page may have been overwritten with image kernel memory contents already, so the instructions the CPU attempts to execute may simply be invalid. Second, the page tables previously used by that CPU may have been overwritten by image kernel memory contents, so the address in its instruction pointer is impossible to resolve then. A report from Varun Koyyalagunta and investigation carried out by Chen Yu show that the latter sometimes happens in practice. To prevent it from happening, temporarily change the smp_ops.play_dead pointer during resume from hibernation so that it points to a special "play dead" routine which uses hlt_play_dead() and avoids the inadvertent "revivals" of "dead" CPUs this way. A slightly unpleasant consequence of this change is that if the system is hibernated with one or more CPUs offline, it will generally draw more power after resume than it did before hibernation, because the physical state entered by CPUs via hlt_play_dead() is higher-power than the mwait_play_dead() one in the majority of cases. It is possible to work around this, but it is unclear how much of a problem that's going to be in practice, so the workaround will be implemented later if it turns out to be necessary. Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371 Reported-by: Varun Koyyalagunta <cpudebug@centtech.com> Original-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2016-07-15objtool: Initialize variable to silence old compilerArnaldo Carvalho de Melo
gcc version 4.1.2 20080704 (Red Hat 4.1.2-55) barfs with: CC /tmp/build/objtool/builtin-check.o cc1: warnings being treated as errors builtin-check.c: In function 'cmd_check': builtin-check.c:667: warning: 'prev_rela' may be used uninitialized in this function mv: cannot stat `/tmp/build/objtool/.builtin-check.o.tmp': No such file or directory make[1]: *** [/tmp/build/objtool/builtin-check.o] Error 1 Init it to NULL to silence it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qolo31rl2ojlwj1lj9dhemyz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapiArnaldo Carvalho de Melo
So that it can find asm/bitsperlong.h to get the __BITS_PER_LONG definition. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-pr3pvskh65pey4po7t122z4j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf record: Add --tail-synthesize optionWang Nan
When working with overwritable ring buffer there's a inconvenience problem: if perf dumps data after a long period after it starts, non-sample events may lost, which makes following 'perf report' unable to identify proc name and mmap layout. For example: # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \ dd if=/dev/zero of=/dev/null send SIGUSR2 after dd runs long enough. The resuling perf.data lost correct comm and mmap events: # perf script -i perf.data.2016061522374354 perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512 ^^^^ Should be 'dd' 27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux) 203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux) b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux) 7f47c417edf0 [unknown] ([unknown]) ^^^^^^^^^^^^ Fail to unwind This patch provides a '--tail-synthesize' option, allows perf to collect system status when finalizing output file. In resuling output file, the non-sample events reflect system status when dumping data. After this patch: # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \ dd if=/dev/zero of=/dev/null # perf script -i perf.data.2016061600544998 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ... ^^ Correct comm 203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms]) 203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms]) 203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms]) b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so) ^^^^^ Correct unwind This option doesn't aim to solve this problem completely. If a process terminates before SIGUSR2, we still lost its COMM and MMAP events. For example, we can't unwind correctly from the final perf.data we get from the previous example, because when perf collects the final output file (when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap' becomes empty. However, this is a cheaper choice. To completely solve this problem we need to continously output non-sample events. To satisify the requirement of daemonization, we need to merge them periodically. It is possible but requires much more code and cycles. Automatically select --tail-synthesize when --overwrite is provided. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf session: Don't warn about out of order event if write_backward is usedWang Nan
If write_backward attribute is set, records are written into kernel ring buffer from end to beginning, but read from beginning to end. To avoid 'XX out of order events recorded' warning message (timestamps of records is in reverse order when using write_backward), suppress the warning message if write_backward is selected by at lease one event. Result: Before this patch: # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \ -e raw_syscalls:sys_enter \ dd if=/dev/zero of=/dev/null count=300 300+0 records in 300+0 records out 153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s [ perf record: Woken up 5 times to write data ] Warning: 40 out of order events recorded. [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ] After this patch: # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \ -e raw_syscalls:sys_enter \ dd if=/dev/zero of=/dev/null count=300 300+0 records in 300+0 records out 153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ] Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-15-git-send-email-wangnan0@huawei.com Signed-off-by: He Kuang <hekuang@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf tools: Enable overwrite settingsWang Nan
This patch allows following config terms and option: Globally setting events to overwrite; # perf record --overwrite ... Set specific events to be overwrite or no-overwrite. # perf record --event cycles/overwrite/ ... # perf record --event cycles/no-overwrite/ ... Add missing config terms and update the config term array size because the longest string length has changed. For overwritable events, it automatically selects attr.write_backward since perf requires it to be backward for reading. Test result: # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ] # perf evlist -v syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1 # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.com Signed-off-by: He Kuang <hekuang@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf evlist: Make {pause,resume} internal helpersWang Nan
There's no user of these two function outside evlist.c. Remove them from public namespace. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: He Kuang <hekuang@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-13-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf record: Read from overwritable ring bufferWang Nan
Drive the evlist->bkw_mmap_state state machine during draining and when SIGUSR2 is received. Read the backward ring buffer in record__mmap_read_all. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-12-git-send-email-wangnan0@huawei.com Signed-off-by: He Kuang <hekuang@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf evlist: Setup backward mmap state machineWang Nan
Introduce a bkw_mmap_state state machine to evlist: .________________(forbid)_____________. | V NOTREADY --(0)--> RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY ^ ^ | ^ | | |__(forbid)____/ |___(forbid)___/| | | \_________________(3)_______________/ NOTREADY : Backward ring buffers are not ready RUNNING : Backward ring buffers are recording DATA_PENDING : We are required to collect data from backward ring buffers EMPTY : We have collected data from backward ring buffers. (0): Setup backward ring buffer (1): Pause ring buffers for reading (2): Read from ring buffers (3): Resume ring buffers for recording We can't avoid this complexity. Since we deliberately drop records from overwritable ring buffer, there's no way for us to check remaining from ring buffer itself (by checking head and old pointers). Therefore, we need DATA_PENDING and EMPTY state to help us recording what we have done to the ring buffer. In record__mmap_read_evlist(), drive this state machine from DATA_PENDING to EMPTY. In perf_evlist__mmap_per_evsel(), drive this state machine from NOTREADY to RUNNING when creating backward mmap. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: He Kuang <hekuang@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-11-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-15perf evlist: Drop evlist->backwardWang Nan
Now there's no real user of evlist->backward. Drop it. We are going to use evlist->backward_mmap as a container for backward ring buffer. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: He Kuang <hekuang@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1468485287-33422-10-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>