linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-03-19	bpf: sk_msg program helper bpf_sk_msg_pull_data	John Fastabend
	Currently, if a bpf sk msg program is run the program can only parse data that the (start,end) pointers already consumed. For sendmsg hooks this is likely the first scatterlist element. For sendpage this will be the range (0,0) because the data is shared with userspace and by default we want to avoid allowing userspace to modify data while (or after) BPF verdict is being decided. To support pulling in additional bytes for parsing use a new helper bpf_sk_msg_pull(start, end, flags) which works similar to cls tc logic. This helper will attempt to point the data start pointer at 'start' bytes offest into msg and data end pointer at 'end' bytes offset into message. After basic sanity checks to ensure 'start' <= 'end' and 'end' <= msg_length there are a few cases we need to handle. First the sendmsg hook has already copied the data from userspace and has exclusive access to it. Therefor, it is not necessesary to copy the data. However, it may be required. After finding the scatterlist element with 'start' offset byte in it there are two cases. One the range (start,end) is entirely contained in the sg element and is already linear. All that is needed is to update the data pointers, no allocate/copy is needed. The other case is (start, end) crosses sg element boundaries. In this case we allocate a block of size 'end - start' and copy the data to linearize it. Next sendpage hook has not copied any data in initial state so that data pointers are (0,0). In this case we handle it similar to the above sendmsg case except the allocation/copy must always happen. Then when sending the data we have possibly three memory regions that need to be sent, (0, start - 1), (start, end), and (end + 1, msg_length). This is required to ensure any writes by the BPF program are correctly transmitted. Lastly this operation will invalidate any previous data checks so BPF programs will have to revalidate pointers after making this BPF call. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	bpf: sockmap, add msg_cork_bytes() helper	John Fastabend
	In the case where we need a specific number of bytes before a verdict can be assigned, even if the data spans multiple sendmsg or sendfile calls. The BPF program may use msg_cork_bytes(). The extreme case is a user can call sendmsg repeatedly with 1-byte msg segments. Obviously, this is bad for performance but is still valid. If the BPF program needs N bytes to validate a header it can use msg_cork_bytes to specify N bytes and the BPF program will not be called again until N bytes have been accumulated. The infrastructure will attempt to coalesce data if possible so in many cases (most my use cases at least) the data will be in a single scatterlist element with data pointers pointing to start/end of the element. However, this is dependent on available memory so is not guaranteed. So BPF programs must validate data pointer ranges, but this is the case anyways to convince the verifier the accesses are valid. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	bpf: sockmap, add bpf_msg_apply_bytes() helper	John Fastabend
	A single sendmsg or sendfile system call can contain multiple logical messages that a BPF program may want to read and apply a verdict. But, without an apply_bytes helper any verdict on the data applies to all bytes in the sendmsg/sendfile. Alternatively, a BPF program may only care to read the first N bytes of a msg. If the payload is large say MB or even GB setting up and calling the BPF program repeatedly for all bytes, even though the verdict is already known, creates unnecessary overhead. To allow BPF programs to control how many bytes a given verdict applies to we implement a bpf_msg_apply_bytes() helper. When called from within a BPF program this sets a counter, internal to the BPF infrastructure, that applies the last verdict to the next N bytes. If the N is smaller than the current data being processed from a sendmsg/sendfile call, the first N bytes will be sent and the BPF program will be re-run with start_data pointing to the N+1 byte. If N is larger than the current data being processed the BPF verdict will be applied to multiple sendmsg/sendfile calls until N bytes are consumed. Note1 if a socket closes with apply_bytes counter non-zero this is not a problem because data is not being buffered for N bytes and is sent as its received. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data	John Fastabend
	This implements a BPF ULP layer to allow policy enforcement and monitoring at the socket layer. In order to support this a new program type BPF_PROG_TYPE_SK_MSG is used to run the policy at the sendmsg/sendpage hook. To attach the policy to sockets a sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. Similar to previous sockmap usages when a sock is added to a sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT program type attached then the BPF ULP layer is created on the socket and the attached BPF_PROG_TYPE_SK_MSG program is run for every msg in sendmsg case and page/offset in sendpage case. BPF_PROG_TYPE_SK_MSG Semantics/API: BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and SK_DROP. Returning SK_DROP free's the copied data in the sendmsg case and in the sendpage case leaves the data untouched. Both cases return -EACESS to the user. Returning SK_PASS will allow the msg to be sent. In the sendmsg case data is copied into kernel space buffers before running the BPF program. The kernel space buffers are stored in a scatterlist object where each element is a kernel memory buffer. Some effort is made to coalesce data from the sendmsg call here. For example a sendmsg call with many one byte iov entries will likely be pushed into a single entry. The BPF program is run with data pointers (start/end) pointing to the first sg element. In the sendpage case data is not copied. We opt not to copy the data by default here, because the BPF infrastructure does not know what bytes will be needed nor when they will be needed. So copying all bytes may be wasteful. Because of this the initial start/end data pointers are (0,0). Meaning no data can be read or written. This avoids reading data that may be modified by the user. A new helper is added later in this series if reading and writing the data is needed. The helper call will do a copy by default so that the page is exclusively owned by the BPF call. The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg in the sendmsg() case and the entire page/offset in the sendpage case. This avoids ambiguity on how to handle mixed return codes in the sendmsg case. Again a helper is added later in the series if a verdict needs to apply to multiple system calls and/or only a subpart of the currently being processed message. The helper msg_redirect_map() can be used to select the socket to send the data on. This is used similar to existing redirect use cases. This allows policy to redirect msgs. Pseudo code simple example: The basic logic to attach a program to a socket is as follows, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); Adding sockets to the map is done in the normal way, // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above any socket attached to "my_sock_map", in this case 'fd', will run the BPF msg verdict program (msg_prog) on every sendmsg and sendpage system call. For a complete example see BPF selftests or sockmap samples. Implementation notes: It seemed the simplest, to me at least, to use a refcnt to ensure psock is not lost across the sendmsg copy into the sg, the bpf program running on the data in sg_data, and the final pass to the TCP stack. Some performance testing may show a better method to do this and avoid the refcnt cost, but for now use the simpler method. Another item that will come after basic support is in place is supporting MSG_MORE flag. At the moment we call sendpages even if the MSG_MORE flag is set. An enhancement would be to collect the pages into a larger scatterlist and pass down the stack. Notice that bpf_tcp_sendmsg() could support this with some additional state saved across sendmsg calls. I built the code to support this without having to do refactoring work. Other features TBD include ZEROCOPY and the TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series shortly. Future work could improve size limits on the scatterlist rings used here. Currently, we use MAX_SKB_FRAGS simply because this was being used already in the TLS case. Future work could extend the kernel sk APIs to tune this depending on workload. This is a trade-off between memory usage and throughput performance. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	net: generalize sk_alloc_sg to work with scatterlist rings	John Fastabend
	The current implementation of sk_alloc_sg expects scatterlist to always start at entry 0 and complete at entry MAX_SKB_FRAGS. Future patches will want to support starting at arbitrary offset into scatterlist so add an additional sg_start parameters and then default to the current values in TLS code paths. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG	John Fastabend
	When calling do_tcp_sendpages() from in kernel and we know the data has no references from user side we can omit SKBTX_SHARED_FRAG flag. This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used to omit setting SKBTX_SHARED_FRAG. The flag is not exposed to userspace because the sendpage call from the splice logic masks out all bits except MSG_MORE. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	sockmap: convert refcnt to an atomic refcnt	John Fastabend
	The sockmap refcnt up until now has been wrapped in the sk_callback_lock(). So its not actually needed any locking of its own. The counter itself tracks the lifetime of the psock object. Sockets in a sockmap have a lifetime that is independent of the map they are part of. This is possible because a single socket may be in multiple maps. When this happens we can only release the psock data associated with the socket when the refcnt reaches zero. There are three possible delete sock reference decrement paths first through the normal sockmap process, the user deletes the socket from the map. Second the map is removed and all sockets in the map are removed, delete path is similar to case 1. The third case is an asyncronous socket event such as a closing the socket. The last case handles removing sockets that are no longer available. For completeness, although inc does not pose any problems in this patch series, the inc case only happens when a psock is added to a map. Next we plan to add another socket prog type to handle policy and monitoring on the TX path. When we do this however we will need to keep a reference count open across the sendmsg/sendpage call and holding the sk_callback_lock() here (on every send) seems less than ideal, also it may sleep in cases where we hit memory pressure. Instead of dealing with these issues in some clever way simply make the reference counting a refcnt_t type and do proper atomic ops. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	sock: make static tls function alloc_sg generic sock helper	John Fastabend
	The TLS ULP module builds scatterlists from a sock using page_frag_refill(). This is going to be useful for other ULPs so move it into sock file for more general use. In the process remove useless goto at end of while loop. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19	RDMA/verbs: Remove restrack entry from XRCD structure	Leon Romanovsky
	XRCD object is not implemented in the restrack, so lets remove it. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19	RDMA/ucma: Fix use-after-free access in ucma_close	Leon Romanovsky
	The error in ucma_create_id() left ctx in the list of contexts belong to ucma file descriptor. The attempt to close this file descriptor causes to use-after-free accesses while iterating over such list. Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19	power: reset: at91-reset: Switch from the pr_() to the dev_() logging ↵	Ladislav Michl
	functions Use dev_info() instead of pr_info(). Signed-off-by: Ladislav Michl <ladis@linux-mips.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
2018-03-19	power: reset: at91-poweroff: Remove redundant dev_err call in ↵	Ladislav Michl
	at91_poweroff_probe() There is an error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundancy. Signed-off-by: Ladislav Michl <ladis@linux-mips.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
2018-03-19	power: reset: at91-poweroff: Switch from the pr_() to the dev_() logging ↵	Ladislav Michl
	functions Use dev_info() instead of pr_info(). Signed-off-by: Ladislav Michl <ladis@linux-mips.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
2018-03-19	Merge tag 'perf-core-for-mingo-4.17-20180319' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Fixes for problems experienced with new GCC 8 warnings, that treated as errors, broke the build, related to snprintf and casting issues. (Arnaldo Carvalho de Melo, Jiri Olsa, Josh Poinboeuf) - Fix build of new breakpoint 'perf test' entry with clang < 6, noticed on fedora 25, 26 and 27 (Arnaldo Carvalho de Melo) - Workaround problem with symbol resolution in 'perf annotate', using the symbol name already present in the objdump output (Arnaldo Carvalho de Melo) - Document 'perf top --ignore-vmlinux' (Arnaldo Carvalho de Melo) - Fix out of bounds access on array fd when cnt is 100 in one of the 'perf test' entries, detected using 'cpptest' (Colin Ian King) - Add support for the forced leader feature, i.e. 'perf report --group' for a group of events not really grouped when scheduled (without using {} to enclose the list of events in the command line) in pipe mode, e.g.: $ perf record -e cycles,instructions -o - kill \| perf report --group -i - - Use right type to access array elements in 'perf probe' (Masami Hiramatsu) - Update POWER9 vendor events (those described in JSON format) (Sukadev Bhattiprolu) - Discard head in overwrite_rb_find_range() (Yisheng Xie) - Avoid setting 'quiet' to 'true' unnecessarily (Yisheng Xie) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-19	Merge tag 'v4.16-rc6' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-19	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-03-19 This series contains updates to i40e and i40evf only. Alex fixes a potential deadlock in the configure_clsflower function in i40evf, where we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of flower filters. Jan fixed an issue where it was possible to set a mode that is not allowed which resulted in link being down, so fixed the parity between i40e_set_link_ksettings() and i40e_get_link_ksettings(). Patryk fixes a bug where a backplane device was allowing the setting of link settings, which is not allowed. Shiraz fixes a crash when entering S3 because the client interface was freeing the MSIx vectors while they are still in use. Jake fixes up a function header comment to document a newly added parameter. Also cleaned up flags that were never used. Doug fixes the incorrect return type for i40e_aq_add_cloud_filters(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19	drm/amd/pp: Remove unneeded void * casts for Vega10	Rex Zhu
	Removes unneeded void * casts for the following pointers: hwmgr->backend hwmgr->smu_backend Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/pp: Delete get_xclk function in powerplay (v2)	Rex Zhu
	use asic's callback function get_xclk in amdgpu v2: squash in removal of leftover debug info (drm/amd/pp: Delete debug info in smu7_hwmgr.c) (Rex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/pp: Clean up header file for Vega10	Rex Zhu
	Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/pp: Move functions to smu backend table for vega10	Rex Zhu
	Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/pp: Mark bunches of functins in vega10_smumgr.c static	Rex Zhu
	Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/pp: Remove dead functions in vega10_smumgr.c	Rex Zhu
	use smc_table_manager function to copy/save tables to/from smu. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amdgpu: Delete dead code when early init	Rex Zhu
	Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	fs/aio: Use rcu_work instead of explicit rcu and work item	Tejun Heo
	Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19	cgroup: Use rcu_work instead of explicit rcu and work item	Tejun Heo
	Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19	RCU, workqueue: Implement rcu_work	Tejun Heo
	There are cases where RCU callback needs to be bounced to a sleepable context. This is currently done by the RCU callback queueing a work item, which can be cumbersome to write and confusing to read. This patch introduces rcu_work, a workqueue work variant which gets executed after a RCU grace period, and converts the open coded bouncing in fs/aio and kernel/cgroup. v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior after queue_rcu_work(). v2: Use rcu_barrier() instead of synchronize_rcu() to wait for completion of previously queued rcu callback as per Paul. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-19	percpu_ref: Update doc to dissuade users from depending on internal RCU ↵	Tejun Heo
	grace periods percpu_ref internally uses sched-RCU to implement the percpu -> atomic mode switching and the documentation suggested that this could be depended upon. This doesn't seem like a good idea. * percpu_ref uses sched-RCU which has different grace periods regular RCU. Users may combine percpu_ref with regular RCU usage and incorrectly believe that regular RCU grace periods are performed by percpu_ref. This can lead to, for example, use-after-free due to premature freeing. * percpu_ref has a grace period when switching from percpu to atomic mode. It doesn't have one between the last put and release. This distinction is subtle and can lead to surprising bugs. * percpu_ref allows starting in and switching to atomic mode manually for debugging and other purposes. This means that there may not be any grace periods from kill to release. This patch makes it clear that the grace periods are percpu_ref's internal implementation detail and can't be depended upon by the users. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19	i40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE	Paweł Jabłoński
	This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the PF Reset path when Global Reset is in progress. While the driver is polling for the end of the PF Reset and the Global Reset is triggered, abandon the PF Reset path and prepare for the upcoming Global Reset. Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40evf: remove flags that are never used	Jacob Keller
	These flags were defined, but there is no use within the driver code, so we don't need to keep them. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40e: Prevent setting link speed on I40E_DEV_ID_25G_B	Patryk Małek
	Setting link settings on backplane devices shouldn't be allowed. This patch adds one more device id to the list which we check that against. Signed-off-by: Patryk Małek <patryk.malek@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40e: Fix incorrect return types	Doug Dziggel
	Fix return types from i40e_status to enum i40e_status_code. Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	perf tests bp_account: Fix build with clang-6	Arnaldo Carvalho de Melo
	To shut up this compiler warning: CC /tmp/build/perf/tests/bp_account.o CC /tmp/build/perf/tests/task-exit.o CC /tmp/build/perf/tests/sw-clock.o tests/bp_account.c:106:20: error: pointer type mismatch ('int ()(void)' and 'void ') [-Werror,-Wpointer-type-mismatch] void addr = is_x ? test_function : (void ) &the_var; ^ ~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ 1 error generated. Noticed with clang 6 on fedora rawhide. [perfbuilder@44490f0e7241 perf]$ clang -v clang version 6.0.0 (tags/RELEASE_600/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/8 Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/8 Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/8 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 [perfbuilder@44490f0e7241 perf]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 032db28e5fa3 ("perf tests: Add breakpoint accounting/modify test") Link: https://lkml.kernel.org/n/tip-a3jnkzh4xam0l954de5tn66d@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	objtool, perf: Fix GCC 8 -Wrestrict error	Josh Poimboeuf
	Starting with recent GCC 8 builds, objtool and perf fail to build with the following error: ../str_error_r.c: In function ‘str_error_r’: ../str_error_r.c:25:3: error: passing argument 1 to restrict-qualified parameter aliases with argument 5 [-Werror=restrict] snprintf(buf, buflen, "INTERNAL ERROR: strerror_r(%d, %p, %zd)=%d", errnum, buf, buflen, err); The code seems harmless, but there's probably no benefit in printing the 'buf' pointer in this situation anyway, so just remove it to make GCC happy. Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180316031154.juk2uncs7baffctp@treble Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	perf probe: Use right type to access array elements	Masami Hiramatsu
	Current 'perf probe' converts the type of array-elements incorrectly. It always converts the types as a pointer of array. This passes the "array" type DIE to the type converter so that it can get correct "element of array" type DIE from it. E.g. ==== $ cat hello.c #include <stdio.h> void foo(int a[]) { printf("%d\n", a[1]); } void main() { int a[3] = {4, 5, 6}; printf("%d\n", a[0]); foo(a); } $ gcc -g hello.c -o hello $ perf probe -x ./hello -D "foo a[1]" ==== Without this fix, above outputs ==== p:probe_hello/foo /tmp/hello:0x4d3 a=+4(-8(%bp)):u64 ==== The "u64" means "int *", but a[1] is "int". With this, ==== p:probe_hello/foo /tmp/hello:0x4d3 a=+4(-8(%bp)):s32 ==== So, "int" correctly converted to "s32" Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: linux-kselftest@vger.kernel.org Cc: linux-trace-users@vger.kernel.org Fixes: b2a3c12b7442 ("perf probe: Support tracing an entry of array") Link: http://lkml.kernel.org/r/152129114502.31874.2474068470011496356.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	perf annotate: Use ops->target.name when available for unresolved call targets	Arnaldo Carvalho de Melo
	There is a bug where when using 'perf annotate timerqueue_add' the target for its only routine called with the 'callq' instruction, 'rb_insert_color', doesn't get resolved from its address when parsing that 'callq' instruction. That symbol resolution works when using 'perf report --tui' and then doing annotation for 'timerqueue_add' from there, the vmlinux dso->symbols rb_tree somehow gets in a state that we can't find that address, that is a bug that has to be further investigated. But since the objdump output has the function name, i.e. the raw objdump disassembled line looks like: So, before: # perf annotate timerqueue_add │ mov %rbx,%rdi │ mov %rbx,(%rdx) │ → callq *ffffffff8184dc80 │ mov 0x8(%rbp),%rdx │ test %rdx,%rdx │ ↓ je 67 # perf report │ mov %rbx,%rdi │ mov %rbx,(%rdx) │ → callq rb_insert_color │ mov 0x8(%rbp),%rdx │ test %rdx,%rdx │ ↓ je 67 And after both look the same: # perf annotate timerqueue_add │ mov %rbx,%rdi │ mov %rbx,(%rdx) │ → callq rb_insert_color │ mov 0x8(%rbp),%rdx │ test %rdx,%rdx │ ↓ je 67 From 'perf report' one can annotate and navigate to that 'rb_insert_color' function, but not directly from 'perf annotate timerqueue_add', that remains to be investigated and fixed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-nkktz6355rhqtq7o8atr8f8r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	perf top: Document --ignore-vmlinux	Arnaldo Carvalho de Melo
	We've had this since 2013, document it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Willy Tarreau <w@1wt.eu> Fixes: fc2be6968e99 ("perf symbols: Add new option --ignore-vmlinux for perf top") Link: https://lkml.kernel.org/n/tip-0jwfueooddwfsw9r603belxi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	i40e: add doxygen comment for new mode parameter	Jacob Keller
	A recent patch updated the signature for i40e_aq_set_switch_config() to add a new parameter 'mode'. It forgot to document the parameter in the doxygen function header comment. Add the parameter to the function description now. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40e: Close client on suspend and restore client MSIx on resume	Shiraz Saleem
	During suspend client MSIx vectors are freed while they are still in use causing a crash on entering S3. Fix this calling client close before freeing up its MSIx vectors. Also update the client MSIx vectors on resume before client open is called. Fixes commit b980c0634fe5 ("i40e: shutdown all IRQs and disable MSI-X when suspended") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40e: Prevent setting link speed on KX_X722	Patryk Małek
	Setting link settings on backplane devices shouldn't be allowed. This patch adds one more device id to the list which we check that against. Signed-off-by: Patryk Małek <patryk.malek@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	i40e: Properly check allowed advertisement capabilities	Jan Sokolowski
	The i40e_set_link_ksettings and i40e_get_link_ksettings use different codepaths to check available and supported advertisement modes. This creates scenarios where it's possible to set a mode that's not allowed, resulting in a link down. Fix setting advertisement in i40e_set_link_ksettings by calling i40e_get_link_ksettings to check what modes are allowed. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	perf tools: Fix python extension build for gcc 8	Jiri Olsa
	The gcc 8 compiler won't compile the python extension code with the following errors (one example): python.c:830:15: error: cast between incompatible function types from \ ‘PyObject * ()(struct pyrf_evsel , PyObject , PyObject )’ \ uct _object * ()(struct pyrf_evsel , struct _object , struct _object )’} to \ ‘PyObject * ()(PyObject , PyObject )’ {aka ‘struct _object ()(struct _objeuct \ _object )’} [-Werror=cast-function-type] .ml_meth = (PyCFunction)pyrf_evsel__open, The problem with the PyMethodDef::ml_meth callback is that its type is determined based on the PyMethodDef::ml_flags value, which we set as METH_VARARGS \| METH_KEYWORDS. That indicates that the callback is expecting an extra PyObject* arg, and is actually PyCFunctionWithKeywords type, but the base PyMethodDef::ml_meth type stays PyCFunction. Previous gccs did not find this, gcc8 now does. Fixing this by silencing this warning for python.c build. Commiter notes: Do not do that for CC=clang, as it breaks the build in some clang versions, like the ones in fedora up to fedora27: fedora:25:error: unknown warning option '-Wno-cast-function-type'; did you mean '-Wno-bad-function-cast'? [-Werror,-Wunknown-warning-option] fedora:26:error: unknown warning option '-Wno-cast-function-type'; did you mean '-Wno-bad-function-cast'? [-Werror,-Wunknown-warning-option] fedora:27:error: unknown warning option '-Wno-cast-function-type'; did you mean '-Wno-bad-function-cast'? [-Werror,-Wunknown-warning-option] # those have: clang version 3.9.1 (tags/RELEASE_391/final) The one in rawhide accepts that: clang version 6.0.0 (tags/RELEASE_600/final) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Link: http://lkml.kernel.org/r/20180319082902.4518-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19	mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()	Kirill Tkhai
	In case of memory deficit and low percpu memory pages, pcpu_balance_workfn() takes pcpu_alloc_mutex for a long time (as it makes memory allocations itself and waits for memory reclaim). If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex. The patch makes pcpu_alloc() to care about killing signal and use mutex_lock_killable(), when it's allowed by GFP flags. This guarantees, a task does not miss SIGKILL from OOM killer. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19	percpu: include linux/sched.h for cond_resched()	Tejun Heo
	microblaze build broke due to missing declaration of the cond_resched() invocation added recently. Let's include linux/sched.h explicitly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: kbuild test robot <fengguang.wu@intel.com>
2018-03-19	i40evf: Reorder configure_clsflower to avoid deadlock on error	Alexander Duyck
	While doing some code review I noticed that we can get into a state where we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of flower filters. This patch is meant to address that plus tweak the ordering of the while loop waiting on it slightly so that we don't wait an extra period after we have failed for the last time. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19	clk: bcm2835: Protect sections updating shared registers	Boris Brezillon
	CM_PLLx and A2W_XOSC_CTRL registers are accessed by different clock handlers and must be accessed with ->regs_lock held. Update the sections where this protection is missing. Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-03-19	clk: bcm2835: Fix ana->maskX definitions	Boris Brezillon
	ana->maskX values are already '~'-ed in bcm2835_pll_set_rate(). Remove the '~' in the definition to fix ANA setup. Note that this commit fixes a long standing bug preventing one from using an HDMI display if it's plugged after the FW has booted Linux. This is because PLLH is used by the HDMI encoder to generate the pixel clock. Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-03-19	drm/amd/display: fix dereferencing possible ERR_PTR()	Shirish S
	This patch fixes static checker warning caused by "36cc549d5986: "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	drm/amd/display: Refine disable VGA	Clark Zheng
	bad case won't follow normal sense, it will not enable vga1 as usual, but vga2,3,4 is on. Signed-off-by: Clark Zheng <clark.zheng@amd.com> Reviewed-by: Tony Cheng <tony.cheng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-19	ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit	Kirill Marinushkin
	Currently, the offsets in the UAC2 processing unit descriptor are calculated incorrectly. It causes an issue when connecting the device which provides such a feature: ~~~~ [84126.724420] usb 1-1.3.1: invalid Processing Unit descriptor (id 18) ~~~~ After this patch is applied, the UAC2 processing unit inits w/o this error. Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0") Signed-off-by: Kirill Marinushkin <k.marinushkin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-03-19	libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version	Hans de Goede
	When commit 9c7be59fc519af ("libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs") was added it inherited the ATA_HORKAGE_NO_NCQ_TRIM quirk from the existing "Crucial_CTMX100" entry, but that entry sets model_rev to "MU01", where as the entry adding the NOLPM quirk sets it to NULL. This means that after this commit we no apply the NO_NCQ_TRIM quirk to all "Crucial_CT512MX100" SSDs even if they have the fixed "MU02" firmware. This commit splits the "Crucial_CT512MX100" quirk into 2 quirks, one for the "MU01" firmware and one for all other firmware versions, so that we once again only apply the NO_NCQ_TRIM quirk to the "MU01" firmware version. Fixes: 9c7be59fc519af ("libata: Apply NOLPM quirk to ... MX100 512GB SSDs") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>