summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-24xdp: add tracepoint for devmap like cpumap haveJesper Dangaard Brouer
Notice how this allow us get XDP statistic without affecting the XDP performance, as tracepoint is no-longer activated on a per packet basis. V5: Spotted by John Fastabend. Fix 'sent' also counted 'drops' in this patch, a later patch corrected this, but it was a mistake in this intermediate step. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24bpf: devmap prepare xdp frames for bulkingJesper Dangaard Brouer
Like cpumap create queue for xdp frames that will be bulked. For now, this patch simply invoke ndo_xdp_xmit foreach frame. This happens, either when the map flush operation is envoked, or when the limit DEV_MAP_BULK_SIZE is reached. V5: Avoid memleak on error path in dev_map_update_elem() Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24bpf: devmap introduce dev_map_enqueueJesper Dangaard Brouer
Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robot <lkp@intel.com> V5: Cleanups requested by Daniel - Newlines before func definition - Use BUILD_BUG_ON checks - Remove unnecessary use return value store in dev_map_enqueue Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24Merge branch 'bpf-task-fd-query'Alexei Starovoitov
Yonghong Song says: ==================== Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, this command will return bpf related information to user space. Right now it only supports tracepoint/kprobe/uprobe perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Patch #1 adds function perf_get_event() in kernel/events/core.c. Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY. Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query() in the libbpf library for samples/selftests/bpftool to use. Patch #4 adds ksym_get_addr() utility function. Patch #5 add a test in samples/bpf for querying k[ret]probes and u[ret]probes. Patch #6 add a test in tools/testing/selftests/bpf for querying raw_tracepoint and tracepoint. Patch #7 add a new subcommand "perf" to bpftool. Changelogs: v4 -> v5: . return strlen(buf) instead of strlen(buf) + 1 in the attr.buf_len. As long as user provides non-empty buffer, it will be filed with empty string, truncated string, or full string based on the buffer size and the length of to-be-copied string. v3 -> v4: . made attr buf_len input/output. The length of actual buffter is written to buf_len so user space knows what is actually needed. If user provides a buffer with length >= 1 but less than required, do partial copy and return -ENOSPC. . code simplification with put_user. . changed query result attach_info to fd_type. . add tests at selftests/bpf to test zero len, null buf and insufficient buf. v2 -> v3: . made perf_get_event() return perf_event pointer const. this was to ensure that event fields are not meddled. . detect whether newly BPF_TASK_FD_QUERY is supported or not in "bpftool perf" and warn users if it is not. v1 -> v2: . changed bpf subcommand name from BPF_PERF_EVENT_QUERY to BPF_TASK_FD_QUERY. . fixed various "bpftool perf" issues and added documentation and auto-completion. ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24tools/bpftool: add perf subcommandYonghong Song
The new command "bpftool perf [show | list]" will traverse all processes under /proc, and if any fd is associated with a perf event, it will print out related perf event information. Documentation is also added. Below is an example to show the results using bcc commands. Running the following 4 bcc commands: kprobe: trace.py '__x64_sys_nanosleep' kretprobe: trace.py 'r::__x64_sys_nanosleep' tracepoint: trace.py 't:syscalls:sys_enter_nanosleep' uprobe: trace.py 'p:/home/yhs/a.out:main' The bpftool command line and result: $ bpftool perf pid 21711 fd 5: prog_id 5 kprobe func __x64_sys_write offset 0 pid 21765 fd 5: prog_id 7 kretprobe func __x64_sys_nanosleep offset 0 pid 21767 fd 5: prog_id 8 tracepoint sys_enter_nanosleep pid 21800 fd 5: prog_id 9 uprobe filename /home/yhs/a.out offset 1159 $ bpftool -j perf [{"pid":21711,"fd":5,"prog_id":5,"fd_type":"kprobe","func":"__x64_sys_write","offset":0}, \ {"pid":21765,"fd":5,"prog_id":7,"fd_type":"kretprobe","func":"__x64_sys_nanosleep","offset":0}, \ {"pid":21767,"fd":5,"prog_id":8,"fd_type":"tracepoint","tracepoint":"sys_enter_nanosleep"}, \ {"pid":21800,"fd":5,"prog_id":9,"fd_type":"uprobe","filename":"/home/yhs/a.out","offset":1159}] $ bpftool prog 5: kprobe name probe___x64_sys tag e495a0c82f2c7a8d gpl loaded_at 2018-05-15T04:46:37-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 4 7: kprobe name probe___x64_sys tag f2fdee479a503abf gpl loaded_at 2018-05-15T04:48:32-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 7 8: tracepoint name tracepoint__sys tag 5390badef2395fcf gpl loaded_at 2018-05-15T04:48:48-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 8 9: kprobe name probe_main_1 tag 0a87bdc2e2953b6d gpl loaded_at 2018-05-15T04:49:52-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 9 $ ps ax | grep "python ./trace.py" 21711 pts/0 T 0:03 python ./trace.py __x64_sys_write 21765 pts/0 S+ 0:00 python ./trace.py r::__x64_sys_nanosleep 21767 pts/2 S+ 0:00 python ./trace.py t:syscalls:sys_enter_nanosleep 21800 pts/3 S+ 0:00 python ./trace.py p:/home/yhs/a.out:main 22374 pts/1 S+ 0:00 grep --color=auto python ./trace.py Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progsYonghong Song
The new tests are added to query perf_event information for raw_tracepoint and tracepoint attachment. For tracepoint, both syscalls and non-syscalls tracepoints are queries as they are treated slightly differently inside the kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24samples/bpf: add a samples/bpf test for BPF_TASK_FD_QUERYYonghong Song
This is mostly to test kprobe/uprobe which needs kernel headers. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24tools/bpf: add ksym_get_addr() in trace_helpersYonghong Song
Given a kernel function name, ksym_get_addr() will return the kernel address for this function, or 0 if it cannot find this function name in /proc/kallsyms. This function will be used later when a kernel address is used to initiate a kprobe perf event. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpfYonghong Song
Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and implement bpf_task_fd_query() in libbpf. The test programs in samples/bpf and tools/testing/selftests/bpf, and later bpftool will use this libbpf function to query kernel. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24bpf: introduce bpf subcommand BPF_TASK_FD_QUERYYonghong Song
Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, if the <pid, fd> is associated with a tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24perf/core: add perf_get_event() to return perf_event given a struct fileYonghong Song
A new extern function, perf_get_event(), is added to return a perf event given a struct file. This function will be used in later patches. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25Merge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-next Last feature request for 4.18. Mostly vega20 support. - Vega20 support - clock and powergating for VCN - misc bug fixes Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180524152427.32713-1-alexander.deucher@amd.com
2018-05-25Merge branch 'vmwgfx-fixes-4.17' of ↵Dave Airlie
git://people.freedesktop.org/~thomash/linux into drm-fixes Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its error paths on 32-bit VMs. One is a fix for a hibernate flaw introduced with the 4.17 merge window. * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Schedule an fb dirty update after resume drm/vmwgfx: Fix host logging / guestinfo reading error paths drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
2018-05-24Merge branch 'stable/for-linus-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "One single fix in here: under Xen the DMA32 heap (in the hypervisor) would end up looking like swiss cheese. The reason being that for every coherent DMA allocation we didn't do the proper hypercall to tell Xen to return the page back to the DMA32 heap. End result was (eventually) no DMA32 space if you (for example) continously unloaded and loaded modules" * 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
2018-05-24RDMA/hns: Increase checking CMQ status timeout valueWei Hu(Xavier)
This patch increases checking CMQ status timeout value and uses the same value with NIC driver to avoid deficiency of time. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaustWei Hu(Xavier)
This patch modified uar allocation algorithm in hns_roce_uar_alloc function to avoid bitmap exhaust. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24net/mlx5: IPSec, Fix a race between concurrent sandbox QP commandsYossi Kuperman
Sandbox QP Commands are retired in the order they are sent. Outstanding commands are stored in a linked-list in the order they appear. Once a response is received and the callback gets called, we pull the first element off the pending list, assuming they correspond. Sending a message and adding it to the pending list is not done atomically, hence there is an opportunity for a race between concurrent requests. Bind both send and add under a critical section. Fixes: bebb23e6cb02 ("net/mlx5: Accel, Add IPSec acceleration interface") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: When RXFCS is set, add FCS data into checksum calculationEran Ben Elisha
When RXFCS feature is enabled, the HW do not strip the FCS data, however it is not present in the checksum calculated by the HW. Fix that by manually calculating the FCS checksum and adding it to the SKB checksum field. Add helper function to find the FCS data for all SKB forms (linear, one fragment or more). Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Receive buffer support for DCBXHuy Nguyen
Add dcbnl's set/get buffer configuration callback that allows user to set/get buffer size configuration and priority to buffer mapping. By default, firmware controls receive buffer configuration and priority of buffer mapping based on the changes in pfc settings. When set buffer call back is triggered, the buffer configuration changes to manual mode. The manual mode means mlx5 driver will adjust the buffer configuration accordingly based on the changes in pfc settings. ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple of 128, the buffer size will be rounded down to the nearest multiple of 128. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Receive buffer configurationHuy Nguyen
Add APIs for buffer configuration based on the changes in pfc configuration, cable len, buffer size configuration, and priority to buffer mapping. Note that the xoff fomula is as below xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B] xoff_threshold = buffer_size - xoff xon_threshold = xoff_threshold - MTU Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5: PPTB and PBMC register firmware command supportHuy Nguyen
Add firmware command interface to read and write PPTB and PBMC registers. PPTB register enables mappings priority to a specific receive buffer. PBMC registers enables changing the receive buffer's configuration such as buffer size, xon/xoff thresholds, buffer's lossy property and buffer's shared property. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5: Add pbmc and pptb in the port_access_reg_cap_maskHuy Nguyen
Add pbmc and pptb in the port_access_reg_cap_mask. These two bits determine if device supports receive buffer configuration. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Move port speed code from en_ethtool.c to en/port.cHuy Nguyen
Move four below functions from en_ethtool.c to en/port.c. These functions are used by both en_ethtool.c and en_main.c. Future code can use these functions without ethtool link mode dependency. u32 mlx5e_port_ptys2speed(u32 eth_proto_oper); int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); u32 mlx5e_port_speed2linkmodes(u32 speed); Delete the speed field from table mlx5e_build_ptys2ethtool_map. This table only keeps the mapping between the mlx5e link mode and ethtool link mode. Add new table mlx5e_link_speed for translation from mlx5e link mode to actual speed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/dcb: Add dcbnl buffer attributeHuy Nguyen
In this patch, we add dcbnl buffer attribute to allow user change the NIC's buffer configuration such as priority to buffer mapping and buffer size of individual buffer. This attribute combined with pfc attribute allows advanced user to fine tune the qos setting for specific priority queue. For example, user can give dedicated buffer for one or more priorities or user can give large buffer to certain priorities. The dcb buffer configuration will be controlled by lldptool. lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6 lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0 sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to choose dcbnl over devlink interface since this feature is intended to set port attributes which are governed by the netdev instance of that port, where devlink API is more suitable for global ASIC configurations. We present an use case scenario where dcbnl buffer attribute configured by advance user helps reduce the latency of messages of different sizes. Scenarios description: On ConnectX-5, we run latency sensitive traffic with small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive traffic with large messages sizes 512KB and 1MB. We group small, medium, and large message sizes to their own pfc enables priorities as follow. Priorities 1 & 2 (64B, 256B and 1KB) Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB) Priorities 5 & 6 (512KB and 1MB) By default, ConnectX-5 maps all pfc enabled priorities to a single lossless fixed buffer size of 50% of total available buffer space. The other 50% is assigned to lossy buffer. Using dcbnl buffer attribute, we create three equal size lossless buffers. Each buffer has 25% of total available buffer space. Thus, the lossy buffer size reduces to 25%. Priority to lossless buffer mappings are set as follow. Priorities 1 & 2 on lossless buffer #1 Priorities 3 & 4 on lossless buffer #2 Priorities 5 & 6 on lossless buffer #3 We observe improvements in latency for small and medium message sizes as follows. Please note that the large message sizes bandwidth performance is reduced but the total bandwidth remains the same. 256B message size (42 % latency reduction) 4K message size (21% latency reduction) 64K message size (16% latency reduction) CC: Ido Schimmel <idosch@idosch.org> CC: Jakub Kicinski <jakub.kicinski@netronome.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Or Gerlitz <gerlitz.or@gmail.com> CC: Parav Pandit <parav@mellanox.com> CC: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "This is pretty much just the usual array of smallish driver bugs. - remove bouncing addresses from the MAINTAINERS file - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers - various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers - two fixes for patches already sent during the merge window - a long-standing bug related to not decreasing the pinned pages count in the right MM was found and fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits) RDMA/hns: Move the location for initializing tmp_len RDMA/hns: Bugfix for cq record db for kernel IB/uverbs: Fix uverbs_attr_get_obj RDMA/qedr: Fix doorbell bar mapping for dpi > 1 IB/umem: Use the correct mm during ib_umem_release iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()' RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint RDMA/i40iw: Avoid reference leaks when processing the AEQ RDMA/i40iw: Avoid panic when objects are being created and destroyed RDMA/hns: Fix the bug with NULL pointer RDMA/hns: Set NULL for __internal_mr RDMA/hns: Enable inner_pa_vld filed of mpt RDMA/hns: Set desc_dma_addr for zero when free cmq desc RDMA/hns: Fix the bug with rq sge RDMA/hns: Not support qp transition from reset to reset for hip06 RDMA/hns: Add return operation when configured global param fail RDMA/hns: Update convert function of endian format RDMA/hns: Load the RoCE dirver automatically RDMA/hns: Bugfix for rq record db for kernel RDMA/hns: Add rq inline flags judgement ...
2018-05-24drm/amdgpu: Use dev_info() to report amdkfd is not supported for this ASICTom Stellard
This is an important message, so it should be visible to users without having to enable extra debugging. Signed-off-by: Tom Stellard <tstellar@redhat.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-24ipmi: Properly release srcu locks on error conditionsCorey Minyard
When SRCU was added for handling hotplug, some error conditions were not handled properly. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2018-05-24leds: class: ensure workqueue is initialized before setting brightnessLuis Henriques
An application can try to set brightness before all the initialization is done, in particular before the workqueue is initialized with the call to led_init_core(). Here's a WARNING easy to trigger: [ 36.780813] WARNING: CPU: 3 PID: 1411 at ../kernel/workqueue.c:1444 __queue_work+0x37b/0x420 [ 36.780815] Modules linked in: ... [ 36.780868] CPU: 3 PID: 1411 Comm: systemd-backlig Not tainted 4.16.9-1-default #1 openSUSE Tumbleweed (unreleased) [ 36.780868] Hardware name: Dell Inc. Precision 5510/0N8J4R, BIOS 1.6.1 12/11/2017 [ 36.780870] RIP: 0010:__queue_work+0x37b/0x420 [ 36.780871] RSP: 0018:ffffaced048b7d78 EFLAGS: 00010086 [ 36.780873] RAX: 0000000000000000 RBX: ffffffffb3f01440 RCX: 0000000000000000 [ 36.780873] RDX: ffffffffc05a90d8 RSI: 0000000000000000 RDI: ffff8eac7dce2700 [ 36.780874] RBP: ffff8ea547c16400 R08: ffff8ea547800000 R09: ffff8eac7dc22700 [ 36.780875] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000003 [ 36.780876] R13: 0000000000000200 R14: ffffffffc05a90d0 R15: ffff8eac7dce8600 [ 36.780877] FS: 00007f871e61cf40(0000) GS:ffff8eac7dcc0000(0000) knlGS:0000000000000000 [ 36.780878] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.780879] CR2: 000055c91115e308 CR3: 0000000883ee0005 CR4: 00000000003606e0 [ 36.780880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 36.780880] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 36.780881] Call Trace: [ 36.780886] queue_work_on+0x81/0x90 [ 36.780889] brightness_store+0x5d/0x90 [ 36.780892] kernfs_fop_write+0x105/0x180 [ 36.780894] __vfs_write+0x26/0x150 [ 36.780897] ? common_file_perm+0x51/0x150 [ 36.780900] ? security_file_permission+0x3c/0xb0 [ 36.780901] vfs_write+0xad/0x1a0 [ 36.780903] SyS_write+0x42/0x90 [ 36.780906] do_syscall_64+0x76/0x140 [ 36.780908] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 36.780910] RIP: 0033:0x7f871dd04c94 [ 36.780910] RSP: 002b:00007ffeb3a57d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 36.780912] RAX: ffffffffffffffda RBX: 000055c91115c810 RCX: 00007f871dd04c94 [ 36.780912] RDX: 0000000000000001 RSI: 000055c91115c810 RDI: 0000000000000004 [ 36.780913] RBP: 00007ffeb3a57e10 R08: 0000000000000003 R09: 0000000000000000 [ 36.780914] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 36.780914] R13: 000055c911158f30 R14: 000055c90f3a9a4e R15: 0000000000000004 [ 36.780917] Code: 74 18 e8 49 80 00 00 48 85 c0 74 0e 48 8b 40 20 48 3b 68 08 0f 84 c2 fc ff ff 0f 0b 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 82 fd ff ff 83 cd 02 49 8d 57 60 e9 69 fd ff ff 80 3d [ 36.780942] ---[ end trace 1fce4edad54c4017 ]--- This patch initializes and acquires the led_access mutex early in the of_led_classdev_register function, so that any application trying to write to sysfs to set brightness will block until initialization ends. Signed-off-by: Luis Henriques <lhenriques@suse.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2018-05-24block drivers/block: Use octal not symbolic permissionsJoe Perches
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24net: phy: replace bool members in struct phy_device with bit-fieldsHeiner Kallweit
In struct phy_device we have a number of flags being defined as type bool. Similar to e.g. struct pci_dev we can save some space by using bit-fields. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24Merge tag 'for-4.17-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A one-liner that prevents leaking an internal error value 1 out of the ftruncate syscall. This has been observed in practice. The steps to reproduce make a common pattern (open/write/fync/ftruncate) but also need the application to not check only for negative values and happens only for compressed inlined files. The conditions are narrow but as this could break userspace I think it's better to merge it now and not wait for the merge window" * tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix error handling in btrfs_truncate()
2018-05-24ALSA: hda - Fix runtime PMLukas Wunner
Before commit 3b5b899ca67d ("ALSA: hda: Make use of core codec functions to sync power state"), hda_set_power_state() returned the response to the Get Power State verb, a 32-bit unsigned integer whose expected value is 0x233 after transitioning a codec to D3, and 0x0 after transitioning it to D0. The response value is significant because hda_codec_runtime_suspend() does not clear the codec's bit in the codec_powered bitmask unless the AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value. That in turn prevents the HDA controller from runtime suspending because azx_runtime_idle() checks that the codec_powered bitmask is zero. Since commit 3b5b899ca67d, hda_set_power_state() only returns 0x0 or 0x1, thereby breaking runtime PM for any HDA controller. That's because an inline function introduced by the commit returns a bool instead of a 32-bit unsigned int. The change was likely erroneous and resulted from copying and pasting snd_hda_check_power_state(), which is immediately preceding the newly introduced inline function. Fix it. Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597 Fixes: 3b5b899ca67d ("ALSA: hda: Make use of core codec functions to sync power state") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Abhijeet Kumar <abhijeet.kumar@intel.com> Reported-and-tested-by: Gunnar Krüger <taijian@posteo.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-05-24PCI: rcar: Remove IRQ mappings in rcar_pcie_enable_msi() failpathMarek Vasut
The rcar_pcie_enable_msi() creates IRQ mappings using irq_create_mapping() before requesting the IRQs using devm_request_irq(). If devm_request_irq() fails for some reason, rcar_pcie_enable_msi() does not remove the mapping. Pull out the code for disposing IRQ mappings from rcar_pcie_teardown_msi() into a separate function and call it from both rcar_pcie_teardown_msi() and rcar_pcie_enable_msi() failpath to remove the mappings correctly. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org
2018-05-24PCI: rcar: Teardown MSI setup if rcar_pcie_enable() failsMarek Vasut
If the rcar_pcie_enable() fails and MSIs are enabled, the setup done in rcar_pcie_enable_msi() is never undone. Add a function to tear down the MSI setup by disabling the MSI handling in the PCIe block, deallocating the pages requested for the MSIs and zapping the IRQ mapping. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org
2018-05-24PCI: rcar: Add missing irq_dispose_mapping() into failpathMarek Vasut
The rcar_pcie_get_resources() is another misnomer with a side effect. The function does not only get resources, but also maps MSI IRQs via irq_of_parse_and_map(). In case anything fails afterward, the IRQ mapping must be disposed through irq_dispose_mapping() which is not done. This patch handles irq_of_parse_and_map() failures in by disposing of the mapping in rcar_pcie_get_resources() as well as in probe. Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org
2018-05-24PCI: rcar: Pull bus clock enable/disable from rcar_pcie_get_resources()Marek Vasut
The rcar_pcie_get_resources() is another misnomer with a side effect. The function does not only get resources, but also enables/disables bus clock. This is forgotten in the probe() function though and if anything in probe() fails after rcar_pcie_get_resources() is called, the bus clock are never disabled. This patch pulls the clock handling out of the rcar_pcie_get_resources() and enables clock after all the resources were requested. Moreover, this patch also always disables the clock in case of failure. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org
2018-05-24PM / Domain: Return 0 on error from of_genpd_opp_to_performance_state()Viresh Kumar
of_genpd_opp_to_performance_state() should return 0 on errors, as its doc comment describes. While it follows that mostly, it returns a negative error number on one of the failures. Fix that. Fixes: 6e41766a6a50 "PM / Domain: Implement of_genpd_opp_to_performance_state()" Reported-by: Rajendra Nayak <rnayak@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-24Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"Joonsoo Kim
This reverts the following commits that change CMA design in MM. 3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") 1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA") bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystemsSeth Forshee
The user in control of a super block should be allowed to freeze and thaw it. Relax the restrictions on the FIFREEZE and FITHAW ioctls to require CAP_SYS_ADMIN in s_user_ns. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24capabilities: Allow privileged user in s_user_ns to set security.* xattrsEric W. Biederman
A privileged user in s_user_ns will generally have the ability to manipulate the backing store and insert security.* xattrs into the filesystem directly. Therefore the kernel must be prepared to handle these xattrs from unprivileged mounts, and it makes little sense for commoncap to prevent writing these xattrs to the filesystem. The capability and LSM code have already been updated to appropriately handle xattrs from unprivileged mounts, so it is safe to loosen this restriction on setting xattrs. The exception to this logic is that writing xattrs to a mounted filesystem may also cause the LSM inode_post_setxattr or inode_setsecurity callbacks to be invoked. SELinux will deny the xattr update by virtue of applying mountpoint labeling to unprivileged userns mounts, and Smack will deny the writes for any user without global CAP_MAC_ADMIN, so loosening the capability check in commoncap is safe in this respect as well. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24fs: Allow superblock owner to access do_remount_sb()Eric W. Biederman
Superblock level remounts are currently restricted to global CAP_SYS_ADMIN, as is the path for changing the root mount to read only on umount. Loosen both of these permission checks to also allow CAP_SYS_ADMIN in any namespace which is privileged towards the userns which originally mounted the filesystem. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24PCI: rcar: Poll more often in rcar_pcie_wait_for_dl()Marek Vasut
The data link active signal usually takes ~20 uSec to be asserted, poll the bit more often to avoid useless delays in this function. Use udelay() instead of usleep() for such a small delay as suggested by the timer documentation. Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: linux-renesas-soc@vger.kernel.org
2018-05-24blk-mq: avoid starving tag allocation after allocating process migratesMing Lei
When the allocation process is scheduled back and the mapped hw queue is changed, fake one extra wake up on previous queue for compensating wake up miss, so other allocations on the previous queue won't be starved. This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request. The race is as follows: 1) 2 hw queues, nr_requests are 2, and wake_batch is one 2) there are 3 waiters on hw queue 0 3) two in-flight requests in hw queue 0 are completed, and only two waiters of 3 are waken up because of wake_batch, but both the two waiters can be scheduled to another CPU and cause to switch to hw queue 1 4) then the 3rd waiter will wait for ever, since no in-flight request is in hw queue 0 any more. 5) this patch fixes it by the fake wakeup when waiter is scheduled to another hw queue Cc: <stable@vger.kernel.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Modified commit message to make it clearer, and make it apply on top of the 4.18 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24fs: Allow superblock owner to replace invalid owners of inodesEric W. Biederman
Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to chown files when inode owner is invalid. Ordinarily the capable_wrt_inode_uidgid check is sufficient to allow access to files but when the underlying filesystem has uids or gids that don't map to the current user namespace it is not enough, so the chown permission checks need to be extended to allow this case. Calling chown on filesystem nodes whose uid or gid don't map is necessary if those nodes are going to be modified as writing back inodes which contain uids or gids that don't map is likely to cause filesystem corruption of the uid or gid fields. Once chown has been called the existing capable_wrt_inode_uidgid checks are sufficient to allow the owner of a superblock to do anything the global root user can do with an appropriate set of capabilities. An ordinary filesystem mountable by a userns root will limit all uids and gids in s_user_ns or the INVALID_UID and INVALID_GID to flag all others. So having this added permission limited to just INVALID_UID and INVALID_GID is sufficient to handle every case on an ordinary filesystem. Of the virtual filesystems at least proc is known to set s_user_ns to something other than &init_user_ns, while at the same time presenting some files owned by GLOBAL_ROOT_UID. Those files the mounter of proc in a user namespace should not be able to chown to get access to. Limiting the relaxation in permission to just the minimum of allowing changing INVALID_UID and INVALID_GID prevents problems with cases like that. The original version of this patch was written by: Seth Forshee. I have rewritten and rethought this patch enough so it's really not the same thing (certainly it needs a different description), but he deserves credit for getting out there and getting the conversation started, and finding the potential gotcha's and putting up with my semi-paranoid feedback. Inspired-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24Input: atmel_mxt_ts - fix reset-gpio for level based irqsSebastian Reichel
The current reset-gpio support triggers an interrupt storm on platforms using the maxtouch with level based interrupt. The Motorola Droid 4, which I used for some of the tests is not affected, since it uses a edge based interrupt. This change avoids the interrupt storm by enabling the device while its interrupt is disabled. Afterwards we wait 100ms. This is important for two reasons: The device is unresponsive for some time (~22ms for mxt224E) and the CHG (interrupt) line is not working properly for 100ms. We don't need to wait for any following interrupts, since the following mxt_initialize() checks for bootloader mode anyways. This fixes a boot issue on GE PPD (watchdog kills device due to interrupt storm) and does not cause regression on Motorola Droid 4. Fixes: f657b00df22e ("Input: atmel_mxt_ts - add support for reset line") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-05-24vfs: Allow userns root to call mknod on owned filesystems.Eric W. Biederman
These filesystems already always set SB_I_NODEV so mknod will not be useful for gaining control of any devices no matter their permissions. This will allow overlayfs and applications like to fakeroot to use device nodes to represent things on disk. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-05-24vfs: Don't allow changing the link count of an inode with an invalid uid or gidEric W. Biederman
Changing the link count of an inode via unlink or link will cause a write back of that inode. If the uids or gids are invalid (aka not known to the kernel) writing the inode back may change the uid or gid in the filesystem. To prevent possible filesystem and to avoid the need for filesystem maintainers to worry about it don't allow operations on inodes with an invalid uid or gid. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-05-24PCI: vmd: Add an additional VMD device id to driver device id tableJon Derrick
Allow VMD devices with PCI id 8086:28c0 to bind to VMD driver. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> [lorenzo.pieralisi@arm.com: updated commit subject] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-05-24x86/PCI: Add additional VMD device root ports to VMD AER quirkJon Derrick
VMD devices change the source id of messages from child devices to the VMD endpoint. This patch adds additional VMD root port device ids to the AER quirk which requires walking the bus to determine which devices were throwing the error. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-24PCI: vmd: Add offset to bus numbers if necessaryJon Derrick
Depending on platform configuration, certain VMD devices may have an additional configuration option which specifies the range of bus numbers allowed in a VMD PCIe domain. We determine this requirement by checking the value of two vendor specific config registers in the VMD endpoint: VMCAP[0] | VMCONFIG[9:8] | Bus Numbers ---------------------------------------- 0 | * | 0-255 1 | 00 | 0-127 1 | 01 | 128-255 1 | 10 | 0-255 This feature is also added as a bit in driver_data, to allow future conforming device ids which support these features to be enabled through sysfs new_id. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> [lorenzo.pieralisi@arm.com: updated commit subject] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>