summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)Author
2017-11-26Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: a documentation fix, a Sparse warning fix and a debugging fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix task state recording/printout sched/deadline: Don't use dubious signed bitfields sched/deadline: Fix the description of runtime accounting in the documentation
2017-11-24sched/debug: Fix task state recording/printoutThomas Gleixner
The recent conversion of the task state recording to use task_state_index() broke the sched_switch tracepoint task state output. task_state_index() returns surprisingly an index (0-7) which is then printed with __print_flags() applying bitmasks. Not really working and resulting in weird states like 'prev_state=t' instead of 'prev_state=I'. Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a bitmask from the return value of task_state_index() and store it in entry->prev_state, which makes __print_flags() work as expected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-18Merge tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Lots of good bugfixes, including: - fix a number of races in the NFSv4+ state code - fix some shutdown crashes in multiple-network-namespace cases - relax our 4.1 session limits; if you've an artificially low limit to the number of 4.1 clients that can mount simultaneously, try upgrading" * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits) SUNRPC: Improve ordering of transport processing nfsd: deal with revoked delegations appropriately svcrdma: Enqueue after setting XPT_CLOSE in completion handlers nfsd: use nfs->ns.inum as net ID rpc: remove some BUG()s svcrdma: Preserve CB send buffer across retransmits nfds: avoid gettimeofday for nfssvc_boot time fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t lockd: double unregister of inetaddr notifiers nfsd4: catch some false session retries nfsd4: fix cached replies to solo SEQUENCE compounds sunrcp: make function _svc_create_xprt static SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status nfsd: use ARRAY_SIZE nfsd: give out fewer session slots as limit approaches nfsd: increase DRC cache limit nfsd: remove unnecessary nofilehandle checks nfs_common: convert int to bool ...
2017-11-17Merge tag 'trace-v4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from - allow module init functions to be traced - clean up some unused or not used by config events (saves space) - clean up of trace histogram code - add support for preempt and interrupt enabled/disable events - other various clean ups * tag 'trace-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits) tracing, thermal: Hide cpu cooling trace events when not in use tracing, thermal: Hide devfreq trace events when not in use ftrace: Kill FTRACE_OPS_FL_PER_CPU perf/ftrace: Small cleanup perf/ftrace: Fix function trace events perf/ftrace: Revert ("perf/ftrace: Fix double traces of perf on ftrace:function") tracing, dma-buf: Remove unused trace event dma_fence_annotate_wait_on tracing, memcg, vmscan: Hide trace events when not in use tracing/xen: Hide events that are not used when X86_PAE is not defined tracing: mark trace_test_buffer as __maybe_unused printk: Remove superfluous memory barriers from printk_safe ftrace: Clear hashes of stale ips of init memory tracing: Add support for preempt and irq enable/disable events tracing: Prepare to add preempt and irq trace events ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions ftrace: Add freeing algorithm to free ftrace_mod_maps ftrace: Save module init functions kallsyms symbols for tracing ftrace: Allow module init functions to be traced ftrace: Add a ftrace_free_mem() function for modules to use tracing: Reimplement log2 ...
2017-11-17Merge tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Revalidate "." and ".." correctly on open - Avoid RCU usage in tracepoints - Fix ugly referral attributes - Fix a typo in nomigration mount option - Revert "NFS: Move the flock open mode check into nfs_flock()" Features: - Implement a stronger send queue accounting system for NFS over RDMA - Switch some atomics to the new refcount_t type Other bugfixes and cleanups: - Clean up access mode bits - Remove special-case revalidations in nfs_opendir() - Improve invalidating NFS over RDMA memory for async operations that time out - Handle NFS over RDMA replies with a worqueue - Handle NFS over RDMA sends with a workqueue - Fix up replaying interrupted requests - Remove dead NFS over RDMA definitions - Update NFS over RDMA copyright information - Be more consistent with bool initialization and comparisons - Mark expected switch fall throughs - Various sunrpc tracepoint cleanups - Fix various OPEN races - Fix a typo in nfs_rename() - Use common error handling code in nfs_lock_and_join_request() - Check that some structures are properly cleaned up during net_exit() - Remove net pointer from dprintk()s" * tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (62 commits) NFS: Revert "NFS: Move the flock open mode check into nfs_flock()" NFS: Fix typo in nomigration mount option nfs: Fix ugly referral attributes NFS: super: mark expected switch fall-throughs sunrpc: remove net pointer from messages nfs: remove net pointer from messages sunrpc: exit_net cleanup check added nfs client: exit_net cleanup check added nfs/write: Use common error handling code in nfs_lock_and_join_requests() NFSv4: Replace closed stateids with the "invalid special stateid" NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state NFSv4: Check the open stateid when searching for expired state NFSv4: Clean up nfs4_delegreturn_done NFSv4: cleanup nfs4_close_done NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close NFSv4: Don't try to CLOSE if the stateid 'other' field has changed NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID. NFS: Fix a typo in nfs_rename() NFSv4: Fix open create exclusive when the server reboots ...
2017-11-17sunrpc: Add rpc_request static trace pointChuck Lever
Display information about the RPC procedure being requested in the trace log. This sometimes critical information cannot always be derived from other RPC trace entries. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17SUNRPC: Fix parsing failure in trace points with XIDsChuck Lever
mount.nf-11159 8.... 905.248380: xprt_transmit: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049 mount.nf-11159 8.... 905.248381: rpc_task_sleep: task:6210@1 flags=0e80 state=0005 status=0 timeout=60000 queue=xprt_pending kworker/-1591 1.... 905.248419: xprt_lookup_rqst: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049 kworker/-1591 1.... 905.248423: xprt_complete_rqst: [FAILED TO PARSE] xid=351291440 status=24 addr=192.168.2.5 port=20049 Byte swapping is not available during trace-cmd report. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-16Merge tag 'f2fs-for-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancements: - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fixes: - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status" * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits) f2fs: deny accessing encryption policy if encryption is off f2fs: inject fault in inc_valid_node_count f2fs: fix to clear FI_NO_PREALLOC f2fs: expose quota information in debugfs f2fs: separate nat entry mem alloc from nat_tree_lock f2fs: validate before set/clear free nat bitmap f2fs: avoid opened loop codes in __add_ino_entry f2fs: apply write hints to select the type of segments for buffered write f2fs: introduce scan_curseg_cache for cleanup f2fs: optimize the way of traversing free_nid_bitmap f2fs: keep scanning until enough free nids are acquired f2fs: trace checkpoint reason in fsync() f2fs: keep isize once block is reserved cross EOF f2fs: avoid race in between GC and block exchange f2fs: save a multiplication for last_nid calculation f2fs: fix summary info corruption f2fs: remove dead code in update_meta_page f2fs: remove unneeded semicolon f2fs: don't bother with inode->i_version f2fs: check curseg space before foreground GC ...
2017-11-16Merge tag 'afs-next-20171113' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "kAFS filesystem driver overhaul. The major points of the overhaul are: (1) Preliminary groundwork is laid for supporting network-namespacing of kAFS. The remainder of the namespacing work requires some way to pass namespace information to submounts triggered by an automount. This requires something like the mount overhaul that's in progress. (2) sockaddr_rxrpc is used in preference to in_addr for holding addresses internally and add support for talking to the YFS VL server. With this, kAFS can do everything over IPv6 as well as IPv4 if it's talking to servers that support it. (3) Callback handling is overhauled to be generally passive rather than active. 'Callbacks' are promises by the server to tell us about data and metadata changes. Callbacks are now checked when we next touch an inode rather than actively going and looking for it where possible. (4) File access permit caching is overhauled to store the caching information per-inode rather than per-directory, shared over subordinate files. Whilst older AFS servers only allow ACLs on directories (shared to the files in that directory), newer AFS servers break that restriction. To improve memory usage and to make it easier to do mass-key removal, permit combinations are cached and shared. (5) Cell database management is overhauled to allow lighter locks to be used and to make cell records autonomous state machines that look after getting their own DNS records and cleaning themselves up, in particular preventing races in acquiring and relinquishing the fscache token for the cell. (6) Volume caching is overhauled. The afs_vlocation record is got rid of to simplify things and the superblock is now keyed on the cell and the numeric volume ID only. The volume record is tied to a superblock and normal superblock management is used to mediate the lifetime of the volume fscache token. (7) File server record caching is overhauled to make server records independent of cells and volumes. A server can be in multiple cells (in such a case, the administrator must make sure that the VL services for all cells correctly reflect the volumes shared between those cells). Server records are now indexed using the UUID of the server rather than the address since a server can have multiple addresses. (8) File server rotation is overhauled to handle VMOVED, VBUSY (and similar), VOFFLINE and VNOVOL indications and to handle rotation both of servers and addresses of those servers. The rotation will also wait and retry if the server says it is busy. (9) Data writeback is overhauled. Each inode no longer stores a list of modified sections tagged with the key that authorised it in favour of noting the modified region of a page in page->private and storing a list of keys that made modifications in the inode. This simplifies things and allows other keys to be used to actually write to the server if a key that made a modification becomes useless. (10) Writable mmap() is implemented. This allows a kernel to be build entirely on AFS. Note that Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998)" * tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits) afs: Protect call->state changes against signals afs: Trace page dirty/clean afs: Implement shared-writeable mmap afs: Get rid of the afs_writeback record afs: Introduce a file-private data record afs: Use a dynamic port if 7001 is in use afs: Fix directory read/modify race afs: Trace the sending of pages afs: Trace the initiation and completion of client calls afs: Fix documentation on # vs % prefix in mount source specification afs: Fix total-length calculation for multiple-page send afs: Only progress call state at end of Tx phase from rxrpc callback afs: Make use of the YFS service upgrade to fully support IPv6 afs: Overhaul volume and server record caching and fileserver rotation afs: Move server rotation code into its own file afs: Add an address list concept afs: Overhaul cell database management afs: Overhaul permit caching afs: Overhaul the callback handling afs: Rename struct afs_call server member to cm_server ...
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-15mm: remove __GFP_COLDMel Gorman
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: remove cold parameter from free_hot_cold_page*Mel Gorman
Most callers users of free_hot_cold_page claim the pages being released are cache hot. The exception is the page reclaim paths where it is likely that enough pages will be freed in the near future that the per-cpu lists are going to be recycled and the cache hotness information is lost. As no one really cares about the hotness of pages being released to the allocator, just ditch the parameter. The APIs are renamed to indicate that it's no longer about hot/cold pages. It should also be less confusing as there are subtle differences between them. __free_pages drops a reference and frees a page when the refcount reaches zero. free_hot_cold_page handled pages whose refcount was already zero which is non-obvious from the name. free_unref_page should be more obvious. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. [mgorman@techsingularity.net: add pages to head, not tail] Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: remove whats left of NOTRACK flagsLevin, Alexander (Sasha Levin)
Now that kmemcheck is gone, we don't need the NOTRACK flags. Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-14Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
2017-11-14Merge branch 'for-4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There are some new user features and the usual load of invisible enhancements or cleanups. New features: - extend mount options to specify zlib compression level, -o compress=zlib:9 - v2 of ioctl "extent to inode mapping", addressing a usecase where we want to retrieve more but inaccurate results and do the postprocessing in userspace, aiding defragmentation or deduplication tools - populate compression heuristics logic, do data sampling and try to guess compressibility by: looking for repeated patterns, counting unique byte values and distribution, calculating Shannon entropy; this will need more benchmarking and possibly fine tuning, but the base should be good enough - enable indexing for btrfs as lower filesystem in overlayfs - speedup page cache readahead during send on large files Internal enhancements: - more sanity checks of b-tree items when reading them from disk - more EINVAL/EUCLEAN fixups, missing BLK_STS_* conversion, other errno or error handling fixes - remove some homegrown IO-related logic, that's been obsoleted by core block layer changes (batching, plug/unplug, own counters) - add ref-verify, optional debugging feature to verify extent reference accounting - simplify code handling outstanding extents, make it more clear where and how the accounting is done - make delalloc reservations per-inode, simplify the code and make the logic more straightforward - extensive cleanup of delayed refs code Notable fixes: - fix send ioctl on 32bit with 64bit kernel" * 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (102 commits) btrfs: Fix bug for misused dev_t when lookup in dev state hash table. Btrfs: heuristic: add Shannon entropy calculation Btrfs: heuristic: add byte core set calculation Btrfs: heuristic: add byte set calculation Btrfs: heuristic: add detection of repeated data patterns Btrfs: heuristic: implement sampling logic Btrfs: heuristic: add bucket and sample counters and other defines Btrfs: compression: separate heuristic/compression workspaces btrfs: move btrfs_truncate_block out of trans handle btrfs: don't call btrfs_start_delalloc_roots in flushoncommit btrfs: track refs in a rb_tree instead of a list btrfs: add a comp_refs() helper btrfs: switch args for comp_*_refs btrfs: make the delalloc block rsv per inode btrfs: add tracepoints for outstanding extents mods Btrfs: rework outstanding_extents btrfs: increase output size for LOGICAL_INO_V2 ioctl btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents btrfs: send: remove unused code ...
2017-11-13Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "A rather large update for the interrupt core code and the irq chip drivers: - Add a new bitmap matrix allocator and supporting changes, which is used to replace the x86 vector allocator which comes with separate pull request. This allows to replace the convoluted nested loop allocation function in x86 with a facility which supports the recently added property of managed interrupts proper and allows to switch to a best effort vector reservation scheme, which addresses problems with vector exhaustion. - A large update to the ARM GIC-V3-ITS driver adding support for range selectors. - New interrupt controllers: - Meson and Meson8 GPIO - BCM7271 L2 - Socionext EXIU If you expected that this will stop at some point, I have to disappoint you. There are new ones posted already. Sigh! - STM32 interrupt controller support for new platforms. - A pile of fixes, cleanups and updates to the MIPS GIC driver - The usual small fixes, cleanups and updates all over the place. Most visible one is to move the irq chip drivers Kconfig switches into a separate Kconfig menu" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) genirq: Fix type of shifting literal 1 in __setup_irq() irqdomain: Drop pointless NULL check in virq_debug_show_one genirq/proc: Return proper error code when irq_set_affinity() fails irq/work: Use llist_for_each_entry_safe irqchip: mips-gic: Print warning if inherited GIC base is used irqchip/mips-gic: Add pr_fmt and reword pr_* messages irqchip/stm32: Move the wakeup on interrupt mask irqchip/stm32: Fix initial values irqchip/stm32: Add stm32h7 support dt-bindings/interrupt-controllers: Add compatible string for stm32h7 irqchip/stm32: Add multi-bank management irqchip/stm32: Select GENERIC_IRQ_CHIP irqchip/exiu: Add support for Socionext Synquacer EXIU controller dt-bindings: Add description of Socionext EXIU interrupt controller irqchip/gic-v3-its: Fix VPE activate callback return value irqchip: mips-gic: Make IPI bitmaps static irqchip: mips-gic: Share register writes in gic_set_type() irqchip: mips-gic: Remove gic_vpes variable irqchip: mips-gic: Use num_possible_cpus() to reserve IPIs irqchip: mips-gic: Configure EIC when CPUs come online ...
2017-11-13afs: Protect call->state changes against signalsDavid Howells
Protect call->state changes against the call being prematurely terminated due to a signal. What can happen is that a signal causes afs_wait_for_call_to_complete() to abort an afs_call because it's not yet complete whilst afs_deliver_to_call() is delivering data to that call. If the data delivery causes the state to change, this may overwrite the state of the afs_call, making it not-yet-complete again - but no further notifications will be forthcoming from AF_RXRPC as the rxrpc call has been aborted and completed, so kAFS will just hang in various places waiting for that call or on page bits that need clearing by that call. A tracepoint to monitor call state changes is also provided. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Trace page dirty/cleanDavid Howells
Add a trace event that logs the dirtying and cleaning of pages attached to AFS inodes. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Fix directory read/modify raceDavid Howells
Because parsing of the directory wasn't being done under any sort of lock, the pages holding the directory content can get invalidated whilst the parsing is ongoing. Further, the directory page check function gets called outside of the page lock, so if the page gets cleared or updated, this may return reports of bad magic numbers in the directory page. Also, the directory may change size whilst checking and parsing are ongoing, so more care needs to be taken here. Fix this by: (1) Perform the page check from the page filling function before we set PageUptodate and drop the page lock. (2) Check for the file having shrunk and the page having been abandoned before checking the page contents. (3) Lock the page whilst parsing it for the directory iterator. Whilst we're at it, add a tracepoint to report check failure. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Trace the sending of pagesDavid Howells
Add a pair of tracepoints to log the sending of pages for an FS.StoreData or FS.StoreData64 operation. Tracepoint afs_send_pages notes each set of pages added to the operation. There may be several of these per operation as we get up at most 8 contiguous pages in one go because the bvec we're using is on the stack. Tracepoint afs_sent_pages notes the end of adding data from a whole run of pages to the operation and the completion of the request phase. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Trace the initiation and completion of client callsDavid Howells
Add tracepoints to trace the initiation and completion of client calls within the kafs filesystem. The afs_make_vl_call tracepoint watches calls to the volume location database server. The afs_make_fs_call tracepoint watches calls to the file server. The afs_call_done tracepoint watches for call completion. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-10net: allow per netns sysctl_rmem and sysctl_wmem for protosEric Dumazet
As we want to gradually implement per netns sysctl_rmem and sysctl_wmem on per protocol basis, add two new fields in struct proto, and two new helpers : sk_get_wmem0() and sk_get_rmem0() First user will be TCP. Then UDP and SCTP can be easily converted, while DECNET probably wont get this support. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-06f2fs: trace checkpoint reason in fsync()Chao Yu
This patch slightly changes need_do_checkpoint to return the detail info that indicates why we need do checkpoint, then caller could print it with trace message. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03dax: Implement dax_finish_sync_fault()Jan Kara
Implement a function that filesystems can call to finish handling of synchronous page faults. It takes care of syncing appropriare file range and insertion of page table entry. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03dax: Inline dax_pmd_insert_mapping() into the callsiteJan Kara
dax_pmd_insert_mapping() has only one callsite and we will need to further fine tune what it does for synchronous faults. Just inline it into the callsite so that we don't have to pass awkward bools around. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03tcp: add tracepoint trace_tcp_retransmit_synack()Song Liu
This tracepoint can be used to trace synack retransmits. It maintains pointer to struct request_sock. We cannot simply reuse trace_tcp_retransmit_skb() here, because the sk here is the LISTEN socket. The IP addresses and ports should be extracted from struct request_sock. Note that, like many other tracepoints, this patch uses IS_ENABLED in TP_fast_assign macro, which triggers sparse warning like: ./include/trace/events/tcp.h:274:1: error: directive in argument list ./include/trace/events/tcp.h:281:1: error: directive in argument list However, there is no good solution to avoid these warnings. To the best of our knowledge, these warnings are harmless. Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02Merge tag 'v4.14-rc3' into irq/irqchip-4.15Marc Zyngier
Required merge to get mainline irqchip updates. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01btrfs: add tracepoints for outstanding extents modsJosef Bacik
This is handy for tracing problems with modifying the outstanding extents counters. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30btrfs: remove delayed_ref_node from ref_headJosef Bacik
This is just excessive information in the ref_head, and makes the code complicated. It is a relic from when we had the heads and the refs in the same tree, which is no longer the case. With this removal I've cleaned up a bunch of the cruft around this old assumption as well. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-30btrfs: declare TRACE_DEFINE_ENUM for each of show_flush_state enumAnand Jain
So that perf can show the state symbol. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-10-26f2fs: trace f2fs_readdirChao Yu
This patch adds trace for f2fs_readdir. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: trace f2fs_lookupChao Yu
This patch adds trace for f2fs_lookup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-26f2fs: trace f2fs_remove_discardChao Yu
This patch adds tracepoint to trace f2fs_remove_discard. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-25bpf: permit multiple bpf attachments for a single perf eventYonghong Song
This patch enables multiple bpf attachments for a kprobe/uprobe/tracepoint single trace event. Each trace_event keeps a list of attached perf events. When an event happens, all attached bpf programs will be executed based on the order of attachment. A global bpf_event_mutex lock is introduced to protect prog_array attaching and detaching. An alternative will be introduce a mutex lock in every trace_event_call structure, but it takes a lot of extra memory. So a global bpf_event_mutex lock is a good compromise. The bpf prog detachment involves allocation of memory. If the allocation fails, a dummy do-nothing program will replace to-be-detached program in-place. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_set_state()Song Liu
This patch adds tracepoint trace_tcp_set_state. Besides usual fields (s/d ports, IP addresses), old and new state of the socket is also printed with TP_printk, with __print_symbolic(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_destroy_sockSong Liu
This patch adds trace event trace_tcp_destroy_sock. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_receive_resetSong Liu
New tracepoint trace_tcp_receive_reset is added and called from tcp_reset(). This tracepoint is define with a new class tcp_event_sk. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add tracepoint trace_tcp_send_resetSong Liu
New tracepoint trace_tcp_send_reset is added and called from tcp_v4_send_reset(), tcp_v6_send_reset() and tcp_send_active_reset(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: mark trace event arguments sk and skb as constSong Liu
Some functions that we plan to add trace points require const sk and/or skb. So we mark these fields as const in the tracepoint. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24tcp: add trace event class tcp_event_sk_skbSong Liu
Introduce event class tcp_event_sk_skb for tcp tracepoints that have arguments sk and skb. Existing tracepoint trace_tcp_retransmit_skb() falls into this class. This patch rewrites the definition of trace_tcp_retransmit_skb() with tcp_event_sk_skb. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21ipv6: let trace_fib6_table_lookup() dereference the fib tablePaolo Abeni
The perf traces for ipv6 routing code show a relevant cost around trace_fib6_table_lookup(), even if no trace is enabled. This is due to the fib6_table de-referencing currently performed by the caller. Let's the tracing code pay this overhead, passing to the trace helper the table pointer. This gives small but measurable performance improvement under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20tcp: Remove use of inet6_sk and add IPv6 checks to tracepointDavid Ahern
386fd5da401d ("tcp: Check daddr_cache before use in tracepoint") was the second version of the tracepoint fixup patch. This patch is the delta between v2 and v3. Specifically, remove the use of inet6_sk and check sk_family as requested by Eric and add IS_ENABLED(CONFIG_IPV6) around the use of sk_v6_rcv_saddr and sk_v6_daddr as done in sock_common (noted by Cong). Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18tcp: Check daddr_cache before use in tracepointDavid Ahern
Running perf in one window to capture tcp_retransmit_skb tracepoint: $ perf record -e tcp:tcp_retransmit_skb -a And causing a retransmission on an active TCP session (e.g., dropping packets in the receiver, changing MTU on the interface to 500 and back to 1500) triggers a panic: [ 58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145 [ 58.546770] PGD 0 P4D 0 [ 58.547472] Oops: 0000 [#1] SMP [ 58.548328] Modules linked in: vrf [ 58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26 [ 58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000 [ 58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145 [ 58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282 [ 58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098 [ 58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68 [ 58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a [ 58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0 [ 58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00 [ 58.563845] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 58.564873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0 [ 58.566538] Call Trace: [ 58.566865] <IRQ> [ 58.567140] __tcp_retransmit_skb+0x4ab/0x4c6 [ 58.567704] ? tcp_set_ca_state+0x22/0x3f [ 58.568231] tcp_retransmit_skb+0x14/0xa3 [ 58.568754] tcp_retransmit_timer+0x472/0x5e3 [ 58.569324] ? tcp_write_timer_handler+0x1e9/0x1e9 [ 58.569946] tcp_write_timer_handler+0x95/0x1e9 [ 58.570548] tcp_write_timer+0x2a/0x58 Check that daddr_cache is non-NULL before de-referencing. Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18tcp: Use pI6c in tcp tracepointDavid Ahern
The compact form for IPv6 addresses is more user friendly than the full version. For example: compact: 2001:db8:1::1 full: 2001:0db8:0001:0000:0000:0000:0000:0004i Update the tcp tracepoint to show the compact form. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: cpumap add tracepointsJesper Dangaard Brouer
This adds two tracepoint to the cpumap. One for the enqueue side trace_xdp_cpumap_enqueue() and one for the kthread dequeue side trace_xdp_cpumap_kthread(). To mitigate the tracepoint overhead, these are invoked during the enqueue/dequeue bulking phases, thus amortizing the cost. The obvious use-cases are for debugging and monitoring. The non-intuitive use-case is using these as a feedback loop to know the system load. One can imagine auto-scaling by reducing, adding or activating more worker CPUs on demand. V4: tracepoint remove time_limit info, instead add sched info V8: intro struct bpf_cpu_map_entry members cpu+map_id in this patch Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>