summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-09Merge branch 'unsupported-map-lookup'Alexei Starovoitov
Prashant Bhole says: ==================== Currently when map a lookup fails, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes in bpftool to print strerror() message when lookup error is occured. This will result in appropriate message like "Operation not supported" when map doesn't support lookup. Patch 5: test_verifier: change fixup map naming convention as suggested by Alexei Patch 6: Added verifier tests to check whether verifier rejects call to bpf_map_lookup_elem from bpf program. For all map types those do not support map lookup. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf progPrashant Bhole
map_lookup_elem isn't supported by certain map types like: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Let's add verfier tests to check whether verifier prevents bpf_map_lookup_elem call on above programs from bpf program. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09selftests/bpf: test_verifier, change names of fixup mapsPrashant Bhole
Currently fixup map are named like fixup_map1, fixup_map2, and so on. As suggested by Alexei let's change change map names such that we can identify map type by looking at the name. This patch is basically a find and replace change: fixup_map1 -> fixup_map_hash_8b fixup_map2 -> fixup_map_hash_48b fixup_map3 -> fixup_map_hash_16b fixup_map4 -> fixup_map_array_48b Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09tools/bpf: bpftool, print strerror when map lookup error occursPrashant Bhole
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print strerror() as error message in normal and JSON output. This patch adds helper function print_entry_error() to print entry from lookup error occurs Example: Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: Operation not supported key: 0b 00 00 00 value: Operation not supported Found 0 elements Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09tools/bpf: bpftool, split the function do_dump()Prashant Bhole
do_dump() function in bpftool/map.c has deep indentations. In order to reduce deep indent, let's move element printing code out of do_dump() into dump_map_elem() function. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09bpf: return EOPNOTSUPP when map lookup isn't supportedPrashant Bhole
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09bpf: error handling when map_lookup_elem isn't supportedPrashant Bhole
The error value returned by map_lookup_elem doesn't differentiate whether lookup was failed because of invalid key or lookup is not supported. Lets add handling for -EOPNOTSUPP return value of map_lookup_elem() method of map, with expectation from map's implementation that it should return -EOPNOTSUPP if lookup is not supported. The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set to EOPNOTSUPP if map lookup is not supported. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09bpf: btf: Fix a missing check bugWenwen Wang
In btf_parse_hdr(), the length of the btf data header is firstly copied from the user space to 'hdr_len' and checked to see whether it is larger than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise, the whole header is copied again from the user space to 'btf->hdr'. However, after the second copy, there is no check between 'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the same value. Given that the btf data is in the user space, a malicious user can race to change the data between the two copies. By doing so, the user can provide malicious data to the kernel and cause undefined behavior. This patch adds a necessary check after the second copy, to make sure 'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error code EINVAL will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09mfd: cros-ec: copy the whole event in get_next_event_xferEmil Karlson
Commit 57e94c8b974db2d83c60e1139c89a70806abbea0 caused cros-ec keyboard events be truncated on many chromebooks so that Left and Right keys on Column 12 were always 0. Use ret as memcpy len to fix this. The old code was using ec_dev->event_size, which is the event payload/data size excluding event_type header, for the length of the memcpy operation. Use ret as memcpy length to avoid the off by one and copy the whole msg->data. Fixes: 57e94c8b974d ("mfd: cros-ec: Increase maximum mkbp event size") Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Tested-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Emil Karlson <jekarlson@gmail.com> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-10-09ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()Al Viro
We get lower layer dentries, find their parents, do lock_rename() and proceed to vfs_rename(). However, we do not check that dentries still have the same parents and are not unlinked. Need to check that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-09sparc: Wire up io_pgetevents system call.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10HID: google: add dependency on Cros EC for HammerDmitry Torokhov
Whiskers tablet mode support needs access to Chrome Embedded Controller, so we need to add dependency on MFD_CROS_EC. Reported-by: kbuild test robot <lkp@intel.com> Fixes: eb1aac4c8744 ("HID: google: add support tablet mode switch for Whiskers") Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-10-09security: fix LSM description locationRandy Dunlap
Acked-by: Kees Cook <keescook@chromium.org> Fix Documentation location reference for where LSM descriptions should be placed. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Signed-off-by: James Morris <james.morris@microsoft.com>
2018-10-09mm: Preserve _PAGE_DEVMAP across mprotect() callsJan Kara
Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a result we will see warnings such as: BUG: Bad page map in process JobWrk0013 pte:800001803875ea25 pmd:7624381067 addr:00007f0930720000 vm_flags:280000f9 anon_vma: (null) mapping:ffff97f2384056f0 index:0 file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage: (null) CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G W 4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased) Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 Call Trace: dump_stack+0x5a/0x75 print_bad_pte+0x217/0x2c0 ? enqueue_task_fair+0x76/0x9f0 _vm_normal_page+0xe5/0x100 zap_pte_range+0x148/0x740 unmap_page_range+0x39a/0x4b0 unmap_vmas+0x42/0x90 unmap_region+0x99/0xf0 ? vma_gap_callbacks_rotate+0x1a/0x20 do_munmap+0x255/0x3a0 vm_munmap+0x54/0x80 SyS_munmap+0x1d/0x30 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ... when mprotect(2) gets used on DAX mappings. Also there is a wide variety of other failures that can result from the missing _PAGE_DEVMAP flag when the area gets used by get_user_pages() later. Fix the problem by including _PAGE_DEVMAP in a set of flags that get preserved by mprotect(2). Fixes: 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP") Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64") Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-09dm: fix report zone remapping to account for partition offsetDamien Le Moal
If dm-linear or dm-flakey are layered on top of a partition of a zoned block device, remapping of the start sector and write pointer position of the zones reported by a report zones BIO must be modified to account for the target table entry mapping (start offset within the device and entry mapping with the dm device). If the target's backing device is a partition of a whole disk, the start sector on the physical device of the partition must also be accounted for when modifying the zone information. However, dm_remap_zone_report() was not considering this last case, resulting in incorrect zone information remapping with targets using disk partitions. Fix this by calculating the target backing device start sector using the position of the completed report zones BIO and the unchanged position and size of the original report zone BIO. With this value calculated, the start sector and write pointer position of the target zones can be correctly remapped. Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-09cxgb4: Add thermal zone supportGanesh Goudar
Add thermal zone support to monitor ASIC's temperature. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net/mlx4_en: Use minimal rx and tx ring sizes on kdump kernelAlaa Hleihel
When memory is limited (on kdump kernel), reduce size of rx and tx rings. Also reduce the number of rx rings. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09dm cache: destroy migration_cache if cache target registration failedShenghui Wang
Commit 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") inadvertently introduced this bug when it moved dm_register_target() after the call to KMEM_CACHE(). Fixes: 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") Cc: stable@vger.kernel.org Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-09Merge branch 'ena-fixes'David S. Miller
Arthur Kiyanovski says: ==================== minor bug fixes for ENA Ethernet driver Arthur Kiyanovski (4): net: ena: fix warning in rmmod caused by double iounmap net: ena: fix rare bug when failed restart/resume is followed by driver removal net: ena: fix NULL dereference due to untimely napi initialization net: ena: fix auto casting to boolean ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix auto casting to booleanArthur Kiyanovski
Eliminate potential auto casting compilation error. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix NULL dereference due to untimely napi initializationArthur Kiyanovski
napi poll functions should be initialized before running request_irq(), to handle a rare condition where there is a pending interrupt, causing the ISR to fire immediately while the poll function wasn't set yet, causing a NULL dereference. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix rare bug when failed restart/resume is followed by driver removalArthur Kiyanovski
In a rare scenario when ena_device_restore() fails, followed by device remove, an FLR will not be issued. In this case, the device will keep sending asynchronous AENQ keep-alive events, even after driver removal, leading to memory corruption. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix warning in rmmod caused by double iounmapArthur Kiyanovski
Memory mapped with devm_ioremap is automatically freed when the driver is disconnected from the device. Therefore there is no need to explicitly call devm_iounmap. Fixes: 0857d92f71b6 ("net: ena: add missing unmap bars on device removal") Fixes: 411838e7b41c ("net: ena: fix rare kernel crash when bar memory remap fails") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09MIPS: Provide actually relaxed MMIO accessorsMaciej W. Rozycki
Improve performance for the relevant systems and remove the DMA ordering barrier from `readX_relaxed' and `writeX_relaxed' MMIO accessors, where it is not needed according to our requirements[1]. For consistency make the same arrangement with low-level port I/O accessors, but do not actually provide any accessors making use of it. References: [1] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20865/ Cc: Ralf Baechle <ralf@linux-mips.org>
2018-10-09MIPS: Enforce strong ordering for MMIO accessorsMaciej W. Rozycki
Architecturally the MIPS ISA does not specify ordering requirements for uncached bus accesses such as MMIO operations normally use and therefore explicit barriers have to be inserted between MMIO accesses where unspecified ordering of operations would cause unpredictable results. For example the R2020 write buffer implements write gathering and combining[1] and as used with the DECstation models 2100 and 3100 for MMIO accesses it bypasses the read buffer entirely, because conflicts are resolved by the memory controller for DRAM accesses only[2] (NB the R2020 and R3020 buffers are the same except for the maximum clock rate). Consequently if a device has say a 16-bit control register at offset 0, a 16-bit event mask register at offset 2 and a 16-bit reset register at offset 4, and the initial value of the control register is 0x1111, then in the absence of barriers a hypothetical code sequence like this: u16 init_dev(u16 __iomem *dev); u16 x; write16(dev + 2, 0xffff); write16(dev + 0, 0x2222); x = read16(dev + 0); write16(dev + 1, 0x3333); write16(dev + 0, 0x4444); return x; } will return 0x1111 and issue a single 32-bit write of 0x33334444 (in the little-endian bus configuration) to offset 0 on the system bus. This is because the read to set `x' from offset 0 bypasses the write of 0x2222 that is still in the write buffer pending the completion of the write of 0xffff to the reset register. Then the write of 0x3333 to the event mask register is merged with the preceding write to the control register as they share the same word address, making it a 32-bit write of 0x33332222 to offset 0. Finally the write of 0x4444 to the control register is combined with the outstanding 32-bit write of 0x33332222 to offset 0, because, again, it shares the same address. This is an example from a legacy system, given here because it is well documented and affects a machine we actually support. But likewise modern MIPS systems may implement weak MMIO ordering, possibly even without having it clearly documented except for being compliant with the architecture specification with respect to the currently defined SYNC instruction variants[3]. Considering the above and that we are required to implement MMIO accessors such that individual accesses made with them are strongly ordered with respect to each other[4], add the necessary barriers to our `inX', `outX', `readX' and `writeX' handlers, as well the associated special use variants. It's up to platforms then to possibly define the respective barriers so as to expand to nil if no ordering enforcement is actually needed for a given system; SYNC is supposed to be as cheap as a NOP on strongly ordered MIPS implementations though. Retain the option to generate weakly-ordered accessors, so that the arrangement for `war_io_reorder_wmb' is not lost in case we need it for fully raw accessors in the future. The reason for this is that it is unclear from commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") and especially commit 8faca49a6731 ("MIPS: Modify core io.h macros to account for the Octeon Errata Core-301.") why they are needed there under the previous assumption that these accessors can be weakly ordered. References: [1] "LR3020 Write Buffer", LSI Logic Corporation, September 1988, Section "Byte Gathering", pp. 6-7 [2] "DECstation 3100 Desktop Workstation Functional Specification", Digital Equipment Corporation, Revision 1.3, August 28, 1990, Section 6.1 "Processor", p. 4 [3] "MIPS Architecture For Programmers, Volume II-A: The MIPS32 Instruction Set Manual", Imagination Technologies LTD, Document Number: MD00086, Revision 6.06, December 15, 2016, Table 5.5 "Encodings of the Bits[10:6] of the SYNC instruction; the SType Field", p. 409 [4] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> References: 8faca49a6731 ("MIPS: Modify core io.h macros to account for the Octeon Errata Core-301.") References: 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20864/ Cc: Ralf Baechle <ralf@linux-mips.org>
2018-10-09MIPS: Correct `mmiowb' barrier for `wbflush' platformsMaciej W. Rozycki
Redefine `mmiowb' in terms of `iobarrier_w' so that it works correctly for MIPS I platforms, which have no SYNC machine instruction and use a call to `wbflush' instead. This doesn't change the semantics for CONFIG_CPU_CAVIUM_OCTEON, because `iobarrier_w' expands to `wmb', which is ultimately the same as the current arrangement. For MIPS I platforms this not only makes any code that would happen to use `mmiowb' build and run, but it actually enforces the ordering required as well, as `iobarrier_w' has it already covered with the use of `wmb'. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20863/ Cc: Ralf Baechle <ralf@linux-mips.org>
2018-10-09MIPS: Define MMIO ordering barriersMaciej W. Rozycki
Define MMIO ordering barriers as separate operations so as to allow making places where such a barrier is required distinct from places where a memory or a DMA barrier is needed. Architecturally MIPS does not specify ordering requirements for uncached bus accesses such as MMIO operations normally use and therefore explicit barriers have to be inserted between MMIO accesses where unspecified ordering of operations would cause unpredictable results. MIPS MMIO ordering barriers are implemented using the same underlying mechanism that memory or a DMA barrier ordering barriers use, that is either a suitable SYNC instruction or a platform-specific `wbflush' call. However platforms may implement different ordering rules for different kinds of bus activity, so having a separate API makes it possible to remove unnecessary barriers and avoid a performance hit they may cause due to unrelated bus activity by making their implementation expand to nil while keeping the necessary ones. Also having distinct barriers for each kind of use makes it easier for the reader to understand what code has been intended to do. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20862/ Cc: Ralf Baechle <ralf@linux-mips.org>
2018-10-09MIPS: mscc: add PCB120 to the ocelot fitImageQuentin Schulz
PCB120 and PCB123 are both development boards based on Microsemi Ocelot so let's use the same fitImage for both. Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20871/ Cc: ralf@linux-mips.org Cc: jhogan@kernel.org Cc: robh+dt@kernel.org Cc: mark.rutland@arm.com Cc: davem@davemloft.net Cc: andrew@lunn.ch Cc: f.fainelli@gmail.com Cc: allan.nielsen@microchip.com Cc: linux-mips@linux-mips.org Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: thomas.petazzoni@bootlin.com Cc: antoine.tenart@bootlin.com
2018-10-09MIPS: mscc: add DT for Ocelot PCB120Quentin Schulz
The Ocelot PCB120 evaluation board is different from the PCB123 in that it has 4 external VSC8584 (or VSC8574) PHYs. It uses the SoC's second MDIO bus for external PHYs which have a reversed address on the bus (i.e. PHY4 is on address 3, PHY5 is on address 2, PHY6 on 1 and PHY7 on 0). Here is how the PHYs are connected to the switch ports: port 0: phy0 (internal) port 1: phy1 (internal) port 2: phy2 (internal) port 3: phy3 (internal) port 4: phy7 port 5: phy4 port 6: phy6 port 9: phy5 Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20869/ Cc: ralf@linux-mips.org Cc: jhogan@kernel.org Cc: robh+dt@kernel.org Cc: mark.rutland@arm.com Cc: davem@davemloft.net Cc: andrew@lunn.ch Cc: f.fainelli@gmail.com Cc: allan.nielsen@microchip.com Cc: linux-mips@linux-mips.org Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: thomas.petazzoni@bootlin.com Cc: antoine.tenart@bootlin.com
2018-10-09MIPS: memset: Limit excessive `noreorder' assembly mode useMaciej W. Rozycki
Rewrite to use the `reorder' assembly mode and remove manually scheduled delay slots except where GAS cannot schedule a delay-slot instruction due to a data dependency or a section switch (as is the case with the EX macro). No change in machine code produced. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> [paul.burton@mips.com: Fix conflict with commit 932afdeec18b ("MIPS: Add Kconfig variable for CPUs with unaligned load/store instructions")] Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20834/ Cc: Ralf Baechle <ralf@linux-mips.org>
2018-10-09EDAC, skx_edac: Fix logical channel intermediate decodingQiuxu Zhuo
The code "lchan = (lchan << 1) | ~lchan" for logical channel intermediate decoding is wrong. The wrong intermediate decoding result is {0xffffffff, 0xfffffffe}. Fix it by replacing '~' with '!'. The correct intermediate decoding result is {0x1, 0x2}. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Aristeu Rozanski <aris@redhat.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20181009172025.18594-1-tony.luck@intel.com
2018-10-09MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regressionMaciej W. Rozycki
Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") regression and remove assembly warnings: arch/mips/lib/memset.S: Assembler messages: arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot triggering with the CPU_DADDI_WORKAROUNDS option set and this code: PTR_SUBU a2, t1, a0 jr ra PTR_ADDIU a2, 1 This is because with that option in place the DADDIU instruction, which the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn expands to an LI/DADDU (or actually ADDIU/DADDU) sequence: 13c: 01a4302f dsubu a2,t1,a0 140: 03e00008 jr ra 144: 24010001 li at,1 148: 00c1302d daddu a2,a2,at ... Correct this by switching off the `noreorder' assembly mode and letting GAS schedule this jump's delay slot, as there is nothing special about it that would require manual scheduling. With this change in place correct code is produced: 13c: 01a4302f dsubu a2,t1,a0 140: 24010001 li at,1 144: 03e00008 jr ra 148: 00c1302d daddu a2,a2,at ... Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") Patchwork: https://patchwork.linux-mips.org/patch/20833/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: stable@vger.kernel.org # 4.17+
2018-10-09Merge tag 'kvmarm-fixes-for-4.19-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for 4.19, take #2 - Correctly order GICv3 SGI registers in the cp15 array
2018-10-09KVM: x86: support CONFIG_KVM_AMD=y with CONFIG_CRYPTO_DEV_CCP_DD=mPaolo Bonzini
SEV requires access to the AMD cryptographic device APIs, and this does not work when KVM is builtin and the crypto driver is a module. Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable SEV in that case, but it does not work because the actual crypto calls are not culled, only sev_hardware_setup() is. This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining SEV code; it fixes this particular configuration, and drops 5 KiB of code when CONFIG_KVM_AMD_SEV=n. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-09gfs2: Fix iomap buffered write support for journaled filesAndreas Gruenbacher
Commit 64bc06bb32ee broke buffered writes to journaled files (chattr +j): we'll try to journal the buffer heads of the page being written to in gfs2_iomap_journaled_page_done. However, the iomap code no longer creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix that by creating buffer heads ourself when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-09arm64: mm: Drop the unused cpu parameterShaokun Zhang
Cpu parameter is never used in flush_context, remove it. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-10-09resource: Clean it up a bitBorislav Petkov
- Drop BUG_ON()s and do normal error handling instead, in find_next_iomem_res(). - Align function arguments on opening braces. - Get rid of local var sibling_only in find_next_iomem_res(). - Shorten unnecessarily long first_level_children_only arg name. Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com Link: <new submission>
2018-10-09resource: Fix find_next_iomem_res() iteration issueBjorn Helgaas
Previously find_next_iomem_res() used "*res" as both an input parameter for the range to search and the type of resource to search for, and an output parameter for the resource we found, which makes the interface confusing. The current callers use find_next_iomem_res() incorrectly because they allocate a single struct resource and use it for repeated calls to find_next_iomem_res(). When find_next_iomem_res() returns a resource, it overwrites the start, end, flags, and desc members of the struct. If we call find_next_iomem_res() again, we must update or restore these fields. The previous code restored res.start and res.end, but not res.flags or res.desc. Since the callers did not restore res.flags, if they searched for flags IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would incorrectly skip resources unless they were also marked as IORESOURCE_SYSRAM. Fix this by restructuring the interface so it takes explicit "start, end, flags" parameters and uses "*res" only as an output parameter. Based on a patch by Lianbo Jiang <lijiang@redhat.com>. [ bp: While at it: - make comments kernel-doc style. - Originally-by: http://lore.kernel.org/lkml/20180921073211.20097-2-lijiang@redhat.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09resource: Include resource end in walk_*() interfacesBjorn Helgaas
find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x10000; for (p = next_resource(...)) { # p->start = 0x10000; # p->end = 0x10000; # we *should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one errorBjorn Helgaas
The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage *image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the first address *past* the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9ffff], not [0-0xa0000]. Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Ingo Molnar <mingo@redhat.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: baiyaowei@cmss.chinamobile.com CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09x86/mm: Remove spurious fault pkey checkDave Hansen
Spurious faults only ever occur in the kernel's address space. They are also constrained specifically to faults with one of these error codes: X86_PF_WRITE | X86_PF_PROT X86_PF_INSTR | X86_PF_PROT So, it's never even possible to reach spurious_kernel_fault_check() with X86_PF_PK set. In addition, the kernel's address space never has pages with user-mode protections. Protection Keys are only enforced on pages with user-mode protection. This gives us lots of reasons to not check for protection keys in our sprurious kernel fault handling. But, let's also add some warnings to ensure that these assumptions about protection keys hold true. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com
2018-10-09x86/mm/vsyscall: Consider vsyscall page part of user address spaceDave Hansen
The vsyscall page is weird. It is in what is traditionally part of the kernel address space. But, it has user permissions and we handle faults on it like we would on a user page: interrupts on. Right now, we handle vsyscall emulation in the "bad_area" code, which is used for both user-address-space and kernel-address-space faults. Move the handling to the user-address-space code *only* and ensure we get there by "excluding" the vsyscall page from the kernel address space via a check in fault_in_kernel_space(). Since the fault_in_kernel_space() check is used on 32-bit, also add a 64-bit check to make it clear we only use this path on 64-bit. Also move the unlikely() to be in is_vsyscall_vaddr() itself. This helps clean up the kernel fault handling path by removing a case that can happen in normal[1] operation. (Yeah, yeah, we can argue about the vsyscall page being "normal" or not.) This also makes sanity checks easier, like the "we never take pkey faults in the kernel address space" check in the next patch. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com
2018-10-09x86/mm: Add vsyscall address helperDave Hansen
We will shortly be using this check in two locations. Put it in a helper before we do so. Let's also insert PAGE_MASK instead of the open-coded ~0xfff. It is easier to read and also more obviously correct considering the implicit type conversion that has to happen when it is not an implicit 'unsigned long'. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com
2018-10-09x86/mm: Fix exception table commentsDave Hansen
The comments here are wrong. They are too absolute about where faults can occur when running in the kernel. The comments are also a bit hard to match up with the code. Trim down the comments, and make them more precise. Also add a comment explaining why we are doing the bad_area_nosemaphore() path here. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com
2018-10-09x86/mm: Add clarifying comments for user addr spaceDave Hansen
The SMAP and Reserved checking do not have nice comments. Add some to clarify and make it match everything else. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com
2018-10-09x86/mm: Break out user address space handlingDave Hansen
The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
2018-10-09x86/mm: Break out kernel address space handlingDave Hansen
The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
2018-10-09x86/mm: Clarify hardware vs. software "error_code"Dave Hansen
We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
2018-10-09x86/mm/tlb: Make lazy TLB mode lazierRik van Riel
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-8-riel@surriel.com
2018-10-09x86/mm/tlb: Add freed_tables element to flush_tlb_infoRik van Riel
Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com