summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-12lightnvm: introduce ioctl to initialize deviceMatias Bjørling
Based on the previous patch, we now introduce an ioctl to initialize the device using nvm_init_sysblock and create the necessary system blocks. The user may specify the media manager that they wish to instantiate on top. Default from user-space will be "gennvm". Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: core on-disk initializationMatias Bjørling
An Open-Channel SSD shall be initialized before use. To initialize, we define an on-disk format, that keeps a small set of metadata to bring up the media manager on top of the device. The initial step is introduced to allow a user to format the disks for a given media manager. During format, a system block is stored on one to three separate luns on the device. Each lun has the system block duplicated. During initialization, the system block can be retrieved and the appropriate media manager can initialized. The on-disk format currently covers (struct nvm_system_block): - Magic value "NVMS". - Monotonic increasing sequence number. - The physical block erase count. - Version of the system block format. - Media manager type. - Media manager superblock physical address. The interface provides three functions to manage the system block: int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *) int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *) int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *) Each implement a part of the logic to manage the system block. The initialization creates the first system blocks and mark them on the device. Get retrieves the latest system block by scanning all pages in the associated system blocks. The update sysblock writes new metadata and allocates new block if necessary. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: introduce mlc lower page table mappingsMatias Bjørling
NAND MLC memories have both lower and upper pages. When programming, both of these must be written, before data can be read. However, these lower and upper pages might not placed at even and odd flash pages, but can be skipped. Therefore each flash memory has its lower pages defined, which can then be used when programming and to know when padding are necessary. This patch implements the lower page definition in the specification, and exposes it through a simple lookup table at dev->lptbl. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: add mccap supportMatias Bjørling
Some flash media has extended capabilities, such as programming SLC pages on MLC/TLC flash, erase/program suspend, scramble and encryption. MCCAP is introduced to detect support for these capabilities in the command set. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: manage open and closed blocks separatelyJavier González
LightNVM targets need to know the state of the flash block when doing flash optimizations. An example is implementing a write buffer to respect the flash page size. Currently, block state is not accounted for; the media manager only differentiates among free, bad and in-use blocks. This patch adds the logic in the generic media manager to enable targets manage blocks into open and close separately, and it implements such management in rrpc. It also adds a set of flags to describe the state of the block (open, closed, free, bad). In order to avoid taking two locks (nvm_lun and rrpc_lun) consecutively, we introduce lockless get_/put_block primitives so that the open and close list locks and future common logic is handled within the nvm_lun lock. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: fix missing grown bad block typeMatias Bjørling
The get/set bad block interface defines good block, factory bad block, grown bad block, device reserved block, and host reserved block. Unfortunately the grown bad block was missing, leaving the offsets wrong for device and host side reserved blocks. This patch adds the missing type and corrects the offsets. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: reference rrpc lun in rrpc blockJavier González
Currently, a rrpc block only points to its nvm_lun. If a user wants to find the associated rrpc lun, it will have to calculate the index and look it up manually. By referencing the rrpc lun directly, this step can be omitted, at the cost of a larger memory footprint. This is important for upcoming patches that implement write buffering in rrpc. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: introduce nvm_submit_ppaMatias Bjørling
Internal logic for both core and media managers, does not have a backing bio for issuing I/Os. Introduce nvm_submit_ppa to allow raw I/Os to be submitted to the underlying device driver. The function request the device, ppa, data buffer and its length and will submit the I/O synchronously to the device. The return value may therefore be used to detect any errors regarding the issued I/O. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: move rq->error to nvm_rq->errorMatias Bjørling
Instead of passing request error into the LightNVM modules, incorporate it into the nvm_rq. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: support multiple ppas in nvm_erase_ppaMatias Bjørling
Sometimes a user want to erase multiple PPAs at the same time. Extend nvm_erase_ppa to take multiple ppas and number of ppas to be erased. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: move the pages per block check out of the loopWenwei Tao
There is no need to check whether dev's pages per block is beyond rrpc support every time we init a lun, we only need to check it once before enter the lun init loop. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: sectors first in ppa listMatias Bjørling
The Westlake controller requires that the PPA list has sectors defined sequentially. Currently, the PPA list is created with planes first, then sectors. Change this to sectors first, then planes. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: fix locking and mempool in rrpc_lun_gcWenwei Tao
This patch fix two issues in rrpc_lun_gc 1. prio_list is protected by rrpc_lun's lock not nvm_lun's, so acquire rlun's lock instead of lun's before operate on the list. 2. we delete block from prio_list before allocating gcb, but gcb allocation may fail, we end without putting it back to the list, this makes the block won't get reclaimed in the future. To solve this issue, delete block after gcb allocation. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: put block back to gc list on its reclaim failWenwei Tao
We delete a block from the gc list before reclaim it, so put it back to the list on its reclaim fail, otherwise this block will not get reclaimed and be programmable in the future. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: check bi_error in gcWenwei Tao
We should check last io completion status before starting another one. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: return the get_bb_tbl return valueMatias Bjørling
During get_bb_tbl, a callback is used to allow an user-specific scan function to be called. The callback may return an error, and in that case, the return value is overridden. However, the callback error is needed when the fault is a user error and not a kernel error. For example, when a user tries to initialize the same device twice. The get_bb_tbl callback should be able to communicate this. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: refactor end_io functions for syncMatias Bjørling
To implement sync I/O support within the LightNVM core, the end_io functions are refactored to take an end_io function pointer instead of testing for initialized media manager, followed by calling its end_io function. Sync I/O can then be implemented using a callback that signal I/O completion. This is similar to the logic found in blk_to_execute_io(). By implementing it this way, the underlying device I/Os submission logic is abstracted away from core, targets, and media managers. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: refactor rqd ppa list into set/freeMatias Bjørling
A device may be driven in single, double or quad plane mode. In that case, the rqd must have either one, two, or four PPAs set for a single PPA sent to the device. Refactor this logic into their own functions to be shared by program/erase/read in the core. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: move ppa erase logic to coreMatias Bjørling
A device may function in single, dual or quad plane mode. The gennvm media manager manages this with explicit helpers. They convert a single ppa to 1, 2 or 4 separate ppas in a ppa list. To aid implementation of recovery and system blocks, this functionality can be moved directly into the core. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: unlock rq and free ppa_list on submission failWenwei Tao
When rrpc_write_ppalist_rq and rrpc_read_ppalist_rq succeed, we setup rq correctly, but nvm_submit_io may afterward fail since it cannot allocate request or nvme_nvm_command, we return error but forget to cleanup the previous work. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: add check after mempool allocationJavier Gonzalez
The mempool allocation might fail. Make sure to return error when it does, instead of causing a kernel panic. Signed-off-by: Javier Gonzalez <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: fix incorrect nr_free_blocks statChao Yu
When initing bad block list in gennvm_block_bb, once we move bad block from free_list to bb_list, we should maintain both stat info nr_free_blocks and nr_bad_blocks. So this patch fixes to add missing operation related to nr_free_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12lightnvm: fix bio submission issueWenwei Tao
Put bio when submission fails, since we get it before submission. And return error when backend device driver doesn't provide a submit_io method, thus we can end IO properly. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-13drm/exynos: fix kernel panic issue at drm releasingInki Dae
This patch fixes a kernel panic issue which happened when drm driver is closed while modetest. This issue could be reproduced easily by launching modetest with page flip repeatedly. The reason is that invalid drm_file object could be accessed by send_vblank_event function when finishing page flip if the drm_file object was removed by drm_release and there was a pended page flip event which was already committed to hardware. So this patch makes the pended page flip event to be cancelled by preclose callback which is called at front of drm_release function. Changelog v2: - free vblank event objects belonging to the request process, increment event space and decrease pending_update when cancelling the event Signed-off-by: Inki Dae <inki.dae@samsung.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Acked-by: Daniel Vetter <daniel@ffwll.ch>
2016-01-13drm/exynos: crtc: do not wait for the scanout completionInki Dae
This patch removes exynos_drm_crtc_complete_scanout function call which makes sure for overlay data to be updated to real hardware when drm driver is released. With atomic modeset support, it doesn't need the funtion anymore because atomic modeset interface makes sure that. Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: properly update all planes on the same vblank eventMarek Szyprowski
This patch also moves mixer_vsync_set_update() to newly introduced mixer_atomic_begin/flush callbacks. This ensures that all mixer planes will be updated on the same vsync event. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: crtc: rework atomic_{begin,flush}Marek Szyprowski
Some CRTC drivers (like Exynos DRM Mixer) can handle blocking register updates only on per-device level, not per-plane level. This patch changes exynos_crts atomic_begin/atomic_flush callbacks to handle the entire crtc, instead of given planes, so driver can handle both cases on their own. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: unify a check for video-processor windowMarek Szyprowski
Always use macro instead of hard-coded '2' value in conditions related to video processor window. Additional checks are not needed, because video layer is registered only when video processor is available. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: also allow ARGB1555 and ARGB4444Tobias Jakobi
Allow the remaining alpha formats now that blending is properly setup. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: refactor layer setupMarek Szyprowski
Properly configure blending properties of given hardware layer based on the selected pixel format. Currently only per-pixel-based alpha is possible when respective pixel format has been selected. Configuration of global, per-plane alpha value, color key and background color will be added later. This patch is heavily inspired by earlier work done by Tobias Jakobi <tjakobi@math.uni-bielefeld.de>. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: remove all static blending setupTobias Jakobi
Previously blending setup was static and most of it was done in mixer_win_reset(). Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: mixer: set window priority based on zposMarek Szyprowski
'zpos' plane property is configurable, so adjust hardware layers priority based on the zpos value. 'zpos' value shifted by one can be used directly as hw priority value and stored to the registers, because mixer accepts priority values from 1 to 15 (0 means that layer is disabled). This patch also changes the default layer priority to match already exposed initial zpos values. The initial configuration is now: [top] video > gfx layer1 > gfx layer0 [bottom]. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: make zpos property configurableMarek Szyprowski
This patch adds all infrastructure to make zpos plane property configurable from userspace. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-13drm/exynos: rename zpos to indexMarek Szyprowski
This patch renames zpos entry to index, because in most places it is used as index for selecting hardware layer/window instead of configurable layer position. This will later enable to make the zpos property configurable. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-01-12kbuild: Demote 'sign-compare' warning to W=2Lee Jones
Ideally, a kernel compile with W=1 enabled should complete cleanly; however, when we run one currently we are presented with ~25k warnings. 'sign-compare' accounts for ~22k of those ~25k. In this patch we're demoting 'sign-compare' warnings to W=2, with a view to fixing the remaining 3k W=1 warnings required for a clean build. Arnd adds: "As per our discussion, I'd add that this was inadvertedly introduced by Behan when he moved the clang specific warnings into an ifdef block and did not notice that -Wsign-compare was interpreted by both gcc and clang. Earlier, it was introduced in just the same way by Jan-Simon as part of 3d3d6b847420 ("kbuild: LLVMLinux: Adapt warnings for compilation with clang")." Acked-by: Arnd Bergmann <arnd@arndb.de> Fixes: 26ea6bb1fef0 ("kbuild, LLVMLinux: Supress warnings unless W=1-3") Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Michal Marek <mmarek@suse.com>
2016-01-12perf tools: Fix mmap2 event allocation in synthesize codeWang Nan
perf_event__synthesize_mmap_events() issues mmap2 events, but the memory of that event is allocated using: mmap_event = malloc(sizeof(mmap_event->mmap) + machine->id_hdr_size); If path of mmap source file is long (near PATH_MAX), random crash would happen. Should use sizeof(mmap_event->mmap2). Fix two memory allocations. Signed-off-by: Wang Nan <wangnan0@huawei.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: He Kuang <hekuang@huawei.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1452593524-138970-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-12perf stat: Fix recort_usage typoJiri Olsa
Markus reported gcc 6 complains when compiling perf stat command: builtin-stat.c:1591:27: error: ‘recort_usage’ defined but not used [-Werror=unused-const-variable] static const char * const recort_usage[] = { ^~~~~~~~~~~~ I fixed the typo and realized we already export record_usage, so I also prefixed it with stat (and included report_usage). Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1452591329-27620-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-12Merge branch 'devel-stable' into for-linusRussell King
2016-01-12ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnectTakashi Iwai
ALSA PCM may still have a leftover instance after disconnection and it delays its release. The problem is that the PCM close code path of USB-audio driver has a call of snd_usb_autosuspend(). This involves with the call of usb_autopm_put_interface() and it may lead to a kernel Oops due to the NULL object like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000190 IP: [<ffffffff815ae7ef>] usb_autopm_put_interface+0xf/0x30 PGD 0 Call Trace: [<ffffffff8173bd94>] snd_usb_autosuspend+0x14/0x20 [<ffffffff817461bc>] snd_usb_pcm_close.isra.14+0x5c/0x90 [<ffffffff8174621f>] snd_usb_playback_close+0xf/0x20 [<ffffffff816ef58a>] snd_pcm_release_substream.part.36+0x3a/0x90 [<ffffffff816ef6b3>] snd_pcm_release+0xa3/0xb0 [<ffffffff816debb0>] snd_disconnect_release+0xd0/0xe0 [<ffffffff8114d417>] __fput+0x97/0x1d0 [<ffffffff8114d589>] ____fput+0x9/0x10 [<ffffffff8109e452>] task_work_run+0x72/0x90 [<ffffffff81088510>] do_exit+0x280/0xa80 [<ffffffff8108996a>] do_group_exit+0x3a/0xa0 [<ffffffff8109261f>] get_signal+0x1df/0x540 [<ffffffff81040903>] do_signal+0x23/0x620 [<ffffffff8114c128>] ? do_readv_writev+0x128/0x200 [<ffffffff810012e1>] prepare_exit_to_usermode+0x91/0xd0 [<ffffffff810013ba>] syscall_return_slowpath+0x9a/0x120 [<ffffffff817587cd>] ? __sys_recvmsg+0x5d/0x70 [<ffffffff810d2765>] ? ktime_get_ts64+0x45/0xe0 [<ffffffff8115dea0>] ? SyS_poll+0x60/0xf0 [<ffffffff818d2327>] int_ret_from_sys_call+0x25/0x8f We have already a check of disconnection in snd_usb_autoresume(), but the check is missing its counterpart. The fix is just to put the same check in snd_usb_autosuspend(), too. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109431 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-01-12x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]Mario Kleiner
Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12lguest: Map switcher text R/ORusty Russell
Pavel noted that lguest maps the switcher code executable and read-write. This is a bad idea for any kernel text, but particularly for text mapped at a fixed address. Create two vmas, one for the text (PAGE_KERNEL_RX) and another for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them adjacent (as expected by the rest of the code). Reported-by: Pavel Machek <pavel@ucw.cz> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/boot: Hide local labels in verify_cpu()Borislav Petkov
... from the final ELF image's symbol table as they're not really needed there. Before: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu 45: ffffffff8100028f 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_no_longmode 46: ffffffff810001de 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_noamd 47: ffffffff8100022b 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_check 48: ffffffff8100021c 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_clear_xd 49: ffffffff81000263 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_test 50: ffffffff81000296 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_ok After: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/fpu: Disable AVX when eagerfpu is offyu-cheng yu
When "eagerfpu=off" is given as a command-line input, the kernel should disable AVX support. The Task Switched bit used for lazy context switching does not support AVX. If AVX is enabled without eagerfpu context switching, one task's AVX state could become corrupted or leak to other tasks. This is a bug and has bad security implications. This only affects systems that have AVX/AVX2/AVX512 and this issue will be found only when one actually uses AVX/AVX2/AVX512 _AND_ does eagerfpu=off. Reference: Intel Software Developer's Manual Vol. 3A Sec. 2.5 Control Registers: TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task. Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87 FPU and SSE State When the TS flag is set, the processor monitors the instruction stream for x87 FPU, MMX, SSE instructions. When the processor detects one of these instructions, it raises a device-not-available exeception (#NM) prior to executing the instruction. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/fpu: Disable MPX when eagerfpu is offyu-cheng yu
This issue is a fallout from the command-line parsing move. When "eagerfpu=off" is given as a command-line input, the kernel should disable MPX support. The decision for turning off MPX was made in fpu__init_system_ctx_switch(), which is after the selection of the XSAVE format. This patch fixes it by getting that decision done earlier in fpu__init_system_xstate(). Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/fpu: Disable XGETBV1 when no XSAVEyu-cheng yu
When "noxsave" is given as a command-line input, the kernel should disable XGETBV1. This issue currently does not cause any actual problems. XGETBV1 is only useful if we have something using the 'init optimization' (i.e. xsaveopt, xsaves). We already clear both of those in fpu__xstate_clear_all_cpu_caps(). But this is good for completeness. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/fpu: Fix early FPU command-line parsingyu-cheng yu
The function fpu__init_system() is executed before parse_early_param(). This causes wrong FPU configuration. This patch fixes this issue by parsing boot_command_line in the beginning of fpu__init_system(). With all four patches in this series, each parameter disables features as the following: eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx no387: fpu nofxsr: fxsr, fxsropt, xmm noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512, mpx, xgetbv1 noxsaveopt: xsaveopt noxsaves: xsaves Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROLHuaitong Han
vmx_cpuid_tries to update SECONDARY_VM_EXEC_CONTROL in the VMCS, but it will cause a vmwrite error on older CPUs because the code does not check for the presence of CPU_BASED_ACTIVATE_SECONDARY_CONTROLS. This will get rid of the following trace on e.g. Core2 6600: vmwrite error: reg 401e value 10 (err 12) Call Trace: [<ffffffff8116e2b9>] dump_stack+0x40/0x57 [<ffffffffa020b88d>] vmx_cpuid_update+0x5d/0x150 [kvm_intel] [<ffffffffa01d8fdc>] kvm_vcpu_ioctl_set_cpuid2+0x4c/0x70 [kvm] [<ffffffffa01b8363>] kvm_arch_vcpu_ioctl+0x903/0xfa0 [kvm] Fixes: feda805fe7c4ed9cf78158e73b1218752e3b4314 Cc: stable@vger.kernel.org Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-12x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNEDKefeng Wang
Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <guohanjun@huawei.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12selftests/x86: Disable the ldt_gdt_64 test for nowAndy Lutomirski
ldt_gdt.c relies on cross-cpu invalidation of SS to do one of its tests. On 32-bit builds, this works fine, but on 64-bit builds, it only works if the kernel has proper SS sigcontext handling for 64-bit user programs. Since the SS fixes are currently reverted, restrict the test case to 32 bits for now. In principle, I could change the test to use a different segment register, but it would be messy: CS can't point to the LDT for 64-bit code, and the other registers don't result in immediate faults because they aren't reloaded on kernel -> user transitions. When we fix sigcontext (in 4.6?), we can revert this. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/231591d9122d282402d8f53175134f8db5b3bc73.1452561752.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/mm/pat: Make split_page_count() check for empty levels to fix ↵Dave Jones
/proc/meminfo output In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages. Unfortunatly when we split up mappings during boot, split_page_count() doesn't take this into account, and starts decrementing an empty direct_pages_count[] level. This results in /proc/meminfo showing crazy things like: DirectMap2M: 18446744073709543424 kB Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>