summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-06-25x86/resctrl: Cleanup cbm_ensure_valid()Reinette Chatre
A recent fix to the cbm_ensure_valid() function left some coding style issues that are now addressed: - Return a value instead of using a function parameter as input and output - Use if (!val) instead of if (val == 0) - Follow reverse fir tree ordering of variable declarations Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/15ba03856f1d944468ee6f44e3fd7aa548293ede.1561408280.git.reinette.chatre@intel.com
2019-06-25Merge branch 'x86/urgent' into x86/cacheThomas Gleixner
Pick up pending upstream fixes to meet dependencies
2019-06-25x86/jump_label: Make tp_vec_nr staticYueHaibing
Fix sparse warning: arch/x86/kernel/jump_label.c:106:5: warning: symbol 'tp_vec_nr' was not declared. Should it be static? It's only used in jump_label.c, so make it static. Fixes: ba54f0c3f7c4 ("x86/jump_label: Batch jump label updates") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <bp@alien8.de> Cc: <hpa@zytor.com> Cc: <peterz@infradead.org> Cc: <bristot@redhat.com> Cc: <namit@vmware.com> Link: https://lkml.kernel.org/r/20190625034548.26392-1-yuehaibing@huawei.com
2019-06-24perf/x86/rapl: Get quirk state from new probe frameworkJiri Olsa
Getting the apply_quirk bool from new rapl_model_match array. And because apply_quirk was the last remaining piece of data in rapl_cpu_match, replacing it with rapl_model_match as device table. The switch to new perf_msr_probe detection API is done. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-9-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/rapl: Get attributes from new probe frameworkJiri Olsa
We no longer need model specific attribute arrays, because we get all this detected in rapl_events_attrs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-8-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/rapl: Get MSR values from new probe frameworkJiri Olsa
There's no need to have special code for getting the bit and MSR value for given event. We can now easily get it from rapl_msrs array. Also getting rid of RAPL_IDX_*, which is no longer needed and replacing INTEL_RAPL* with PERF_RAPL* enums. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-7-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/rapl: Get rapl_cntr_mask from new probe frameworkJiri Olsa
We get rapl_cntr_mask from perf_msr_probe call, as a replacement for current intel_rapl_init_fun::cntr_mask value for each model. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-6-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/rapl: Use new MSR detection interfaceJiri Olsa
Using perf_msr_probe function to probe for RAPL MSRs. Adding new rapl_model_match device table, that gathers events info for given model, following the MSR and cstate module design. It will replace the current rapl_cpu_match device table and detection code in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-5-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/cstate: Use new probe functionJiri Olsa
Using perf_msr_probe function to probe for cstate events. The functionality is the same, with one exception, that perf_msr_probe checks for rdmsr to return value != 0 for given MSR register. Using the new attribute groups and adding the events via pmu::attr_update. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-4-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/msr: Use new probe functionJiri Olsa
Using perf_msr_probe function to probe for msr events. The functionality is the same, with one exception, that perf_msr_probe checks for rdmsr to return value != 0 for given MSR register. Using the new attribute groups and adding the events via pmu::attr_update. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-3-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Add MSR probe interfaceJiri Olsa
Adding perf_msr_probe function to provide interface for checking up on MSR register and set the related attribute group visibility. User defines following struct for each MSR register: struct perf_msr { u64 msr; struct attribute_group *grp; bool (*test)(int idx, void *data); bool no_check; }; Where: msr - is the MSR address attrs - is attribute groups array to add if the check passed test - is test function pointer no_check - is bool that bypass the check and adds the attribute without any test The array of struct perf_msr is passed into: perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data) Together with: cnt - which is the number of struct msr array elements data - which is user pointer passed to the test function zero - allow counters that returns zero on rdmsr The perf_msr_probe will executed test code, read the MSR and check the value is != 0. If all these tests pass, related attribute group is kept visible. Also adding PMU_EVENT_GROUP macro helper to define attribute group for single attribute. It will be used in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan <kan.liang@linux.intel.com> Cc: Liang Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190616140358.27799-2-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24Merge branch 'x86/cpu' into perf/core, to pick up dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24Merge tag 'v5.2-rc6' into perf/core, to refresh branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Remove pmu->pebs_no_xmm_regsKan Liang
We don't need pmu->pebs_no_xmm_regs anymore, the capabilities PERF_PMU_CAP_EXTENDED_REGS can be used to check if XMM registers collection is supported. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Clean up PEBS_XMM_REGSKan Liang
Use generic macro PERF_REG_EXTENDED_MASK to replace PEBS_XMM_REGS to avoid duplication. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86/regs: Check reserved bitsKan Liang
The perf fuzzer triggers a warning which map to: if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset))) return 0; The bits between XMM registers and generic registers are reserved. But perf_reg_validate() doesn't check these bits. Add PERF_REG_X86_RESERVED for reserved bits on X86. Check the reserved bits in perf_reg_validate(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24x86/umwait: Add sysfs interface to control umwait maximum timeFenghua Yu
IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta that processor can stay in C0.1 or C0.2. A zero value means no maximum time. Each instruction sets its own deadline in the instruction's implicit input EDX:EAX value. The instruction wakes up if the time-stamp counter reaches or exceeds the specified deadline, or the umwait maximum time expires, or a store happens in the monitored address range in umwait. The administrator can write an unsigned 32-bit number to /sys/devices/system/cpu/umwait_control/max_time to change the default value. Note that a value of zero means there is no limit. The lower two bits of the value must be zero. [ tglx: Simplify the write function. Massage changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Andy Lutomirski" <luto@kernel.org> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
2019-06-24x86/umwait: Add sysfs interface to control umwait C0.2 stateFenghua Yu
C0.2 state in umwait and tpause instructions can be enabled or disabled on a processor through IA32_UMWAIT_CONTROL MSR register. By default, C0.2 is enabled and the user wait instructions results in lower power consumption with slower wakeup time. But in real time systems which require faster wakeup time although power savings could be smaller, the administrator needs to disable C0.2 and all umwait invocations from user applications use C0.1. Create a sysfs interface which allows the administrator to control C0.2 state during run time. Andy Lutomirski suggested to turn off local irqs before writing the MSR to ensure the cached control value is not changed by a concurrent sysfs write from a different CPU via IPI. [ tglx: Simplified the update logic in the write function and got rid of all the convoluted type casts. Added a shared update function and made the namespace consistent. Moved the sysfs create invocation. Massaged changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Andy Lutomirski" <luto@kernel.org> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
2019-06-24x86/umwait: Initialize umwait control valuesFenghua Yu
umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-24x86/cpufeatures: Enumerate user wait instructionsFenghua Yu
umonitor, umwait, and tpause are a set of user wait instructions. umonitor arms address monitoring hardware using an address. The address range is determined by using CPUID.0x5. A store to an address within the specified address range triggers the monitoring hardware to wake up the processor waiting in umwait. umwait instructs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The optimized state may be either a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state). tpause instructs the processor to enter an implementation-dependent optimized state C0.1 or C0.2 state and wake up when time-stamp counter reaches specified timeout. The three instructions may be executed at any privilege level. The instructions provide power saving method while waiting in user space. Additionally, they can allow a sibling hyperthread to make faster progress while this thread is waiting. One example of an application usage of umwait is when waiting for input data from another application, such as a user level multi-threaded packet processing engine. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. Detailed information on the instructions and CPUID feature WAITPKG flag can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference and Intel 64 and IA-32 Architectures Software Developer's Manual. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
2019-06-24x86/vdso: Give the [ph]vclock_page declarations real typesAndy Lutomirski
Clean up the vDSO code a bit by giving pvclock_page and hvclock_page their actual types instead of u8[PAGE_SIZE]. This shouldn't materially affect the generated code. Heavily based on a patch from Linus. [ tglx: Adapted to the unified VDSO code ] Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/6920c5188f8658001af1fc56fd35b815706d300c.1561241273.git.luto@kernel.org
2019-06-23smp: Remove smp_call_function() and on_each_cpu() return valuesNadav Amit
The return value is fixed. Remove it and amend the callers. [ tglx: Fixup arm/bL_switcher and powerpc/rtas ] Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20190613064813.8102-2-namit@vmware.com
2019-06-23x86/apic: Use non-atomic operations when possibleNadav Amit
Using __clear_bit() and __cpumask_clear_cpu() is more efficient than using their atomic counterparts. Use them when atomicity is not needed, such as when manipulating bitmasks that are on the stack. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20190613064813.8102-10-namit@vmware.com
2019-06-22x86/vdso: Add clock_gettime64() entry pointVincenzo Frascino
Linux 5.1 gained the new clock_gettime64() syscall to address the Y2038 problem on 32bit systems. The x86 VDSO is missing support for this variant of clock_gettime(). Update the x86 specific vDSO library accordingly so it exposes the new time getter. [ tglx: Massaged changelog ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-25-vincenzo.frascino@arm.com
2019-06-22x86/vdso: Add clock_getres() entry pointVincenzo Frascino
The generic vDSO library provides an implementation of clock_getres() that can be leveraged by each architecture. Add the clock_getres() VDSO entry point on x86. [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-24-vincenzo.frascino@arm.com
2019-06-22x86/vdso: Switch to generic vDSO implementationVincenzo Frascino
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library. Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Cc: Shijith Thotton <sthotton@marvell.com> Cc: Andre Przywara <andre.przywara@arm.com> Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@arm.com
2019-06-22x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUsKonstantin Khlebnikov
Since commit 7d5905dc14a8 ("x86 / CPU: Always show current CPU frequency in /proc/cpuinfo") open and read of /proc/cpuinfo sends IPI to all CPUs. Many applications read /proc/cpuinfo at the start for trivial reasons like counting cores or detecting cpu features. While sensitive workloads like DPDK network polling don't like any interrupts. Integrates this feature with cpu isolation and do not send IPIs to CPUs without housekeeping flag HK_FLAG_MISC (set by nohz_full). Code that requests cpu frequency like show_cpuinfo() falls back to the last frequency set by the cpufreq driver if this method returns 0. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/155790354043.1104.15333317408370209.stgit@buzz
2019-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
2019-06-22x86/apic: Fix integer overflow on 10 bit left shift of cpu_khzColin Ian King
The left shift of unsigned int cpu_khz will overflow for large values of cpu_khz, so cast it to a long long before shifting it to avoid overvlow. For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com
2019-06-22x86/asm: Pin sensitive CR0 bitsKees Cook
With sensitive CR4 bits pinned now, it's possible that the WP bit for CR0 might become a target as well. Following the same reasoning for the CR4 pinning, pin CR0's WP bit. Contrary to the cpu feature dependend CR4 pinning this can be done with a constant value. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: kernel-hardening@lists.openwall.com Link: https://lkml.kernel.org/r/20190618045503.39105-4-keescook@chromium.org
2019-06-22x86/asm: Pin sensitive CR4 bitsKees Cook
Several recent exploits have used direct calls to the native_write_cr4() function to disable SMEP and SMAP before then continuing their exploits using userspace memory access. Direct calls of this form can be mitigate by pinning bits of CR4 so that they cannot be changed through a common function. This is not intended to be a general ROP protection (which would require CFI to defend against properly), but rather a way to avoid trivial direct function calling (or CFI bypasses via a matching function prototype) as seen in: https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html (https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) The goals of this change: - Pin specific bits (SMEP, SMAP, and UMIP) when writing CR4. - Avoid setting the bits too early (they must become pinned only after CPU feature detection and selection has finished). - Pinning mask needs to be read-only during normal runtime. - Pinning needs to be checked after write to validate the cr4 state Using __ro_after_init on the mask is done so it can't be first disabled with a malicious write. Since these bits are global state (once established by the boot CPU and kernel boot parameters), they are safe to write to secondary CPUs before those CPUs have finished feature detection. As such, the bits are set at the first cr4 write, so that cr4 write bugs can be detected (instead of silently papered over). This uses a few bytes less storage of a location we don't have: read-only per-CPU data. A check is performed after the register write because an attack could just skip directly to the register write. Such a direct jump is possible because of how this function may be built by the compiler (especially due to the removal of frame pointers) where it doesn't add a stack frame (function exit may only be a retq without pops) which is sufficient for trivial exploitation like in the timer overwrites mentioned above). The asm argument constraints gain the "+" modifier to convince the compiler that it shouldn't make ordering assumptions about the arguments or memory, and treat them as changed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: kernel-hardening@lists.openwall.com Link: https://lkml.kernel.org/r/20190618045503.39105-3-keescook@chromium.org
2019-06-22x86/acpi/cstate: Add Zhaoxin processors support for cache flush policy in C3Tony W Wang-oc
Same as Intel, Zhaoxin MP CPUs support C3 share cache and on all recent Zhaoxin platforms ARB_DISABLE is a nop. So set related flags correctly in the same way as Intel does. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "hpa@zytor.com" <hpa@zytor.com> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net> Cc: "lenb@kernel.org" <lenb@kernel.org> Cc: David Wang <DavidWang@zhaoxin.com> Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com> Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com> Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com> Link: https://lkml.kernel.org/r/a370503660994669991a7f7cda7c5e98@zhaoxin.com
2019-06-22x86/cpu: Create Zhaoxin processors architecture support fileTony W Wang-oc
Add x86 architecture support for new Zhaoxin processors. Carve out initialization code needed by Zhaoxin processors into a separate compilation unit. To identify Zhaoxin CPU, add a new vendor type X86_VENDOR_ZHAOXIN for system recognition. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "hpa@zytor.com" <hpa@zytor.com> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net> Cc: "lenb@kernel.org" <lenb@kernel.org> Cc: David Wang <DavidWang@zhaoxin.com> Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com> Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com> Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com> Link: https://lkml.kernel.org/r/01042674b2f741b2aed1f797359bdffb@zhaoxin.com
2019-06-22x86/cpu: Split Tremont based Atoms from the restAndy Shevchenko
Split Tremont based Atoms from the rest to keep logical grouping. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190617115537.33309-1-andriy.shevchenko@linux.intel.com
2019-06-22x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2Andi Kleen
The kernel needs to explicitly enable FSGSBASE. So, the application needs to know if it can safely use these instructions. Just looking at the CPUID bit is not enough because it may be running in a kernel that does not enable the instructions. One way for the application would be to just try and catch the SIGILL. But that is difficult to do in libraries which may not want to overwrite the signal handlers of the main application. Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF aux vector. AT_HWCAP2 is already used by PPC for similar purposes. The application can access it open coded or by using the getauxval() function in newer versions of glibc. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bitAndy Lutomirski
Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/entry/64: Handle FSGSBASE enabled paranoid entry/exitChang S. Bae
Without FSGSBASE, user space cannot change GSBASE other than through a PRCTL. The kernel enforces that the user space GSBASE value is postive as negative values are used for detecting the kernel space GSBASE value in the paranoid entry code. If FSGSBASE is enabled, user space can set arbitrary GSBASE values without kernel intervention, including negative ones, which breaks the paranoid entry assumptions. To avoid this, paranoid entry needs to unconditionally save the current GSBASE value independent of the interrupted context, retrieve and write the kernel GSBASE and unconditionally restore the saved value on exit. The restore happens either in paranoid_exit or in the special exit path of the NMI low level code. All other entry code pathes which use unconditional SWAPGS are not affected as they do not depend on the actual content. [ tglx: Massaged changelogs and comments ] Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/1557309753-24073-13-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/entry/64: Introduce the FIND_PERCPU_BASE macroChang S. Bae
GSBASE is used to find per-CPU data in the kernel. But when GSBASE is unknown, the per-CPU base can be found from the per_cpu_offset table with a CPU NR. The CPU NR is extracted from the limit field of the CPUNODE entry in GDT, or by the RDPID instruction. This is a prerequisite for using FSGSBASE in the low level entry code. Also, add the GAS-compatible RDPID macro as binutils 2.21 do not support it. Support is added in version 2.27. [ tglx: Massaged changelog ] Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/1557309753-24073-12-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/entry/64: Switch CR3 before SWAPGS in paranoid entryChang S. Bae
When FSGSBASE is enabled, the GSBASE handling in paranoid entry will need to retrieve the kernel GSBASE which requires that the kernel page table is active. As the CR3 switch to the kernel page tables (PTI is active) does not depend on kernel GSBASE, move the CR3 switch in front of the GSBASE handling. Comment the EBX content while at it. No functional change. [ tglx: Rewrote changelog and comments ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-11-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/process/64: Use FSGSBASE instructions on thread copy and ptraceChang S. Bae
When FSGSBASE is enabled, copying threads and reading fsbase and gsbase using ptrace must read the actual values. When copying a thread, use save_fsgs() and copy the saved values. For ptrace, the bases must be read from memory regardless of the selector if FSGSBASE is enabled. [ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ] [ luto: Massage changelog ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/process/64: Use FSBSBASE in switch_to() if availableAndy Lutomirski
With the new FSGSBASE instructions, FS and GSABSE can be efficiently read and writen in __switch_to(). Use that capability to preserve the full state. This will enable user code to do whatever it wants with the new instructions without any kernel-induced gotchas. (There can still be architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if WRGSBASE was used, but users are expected to read the CPU manual before doing things like that.) This is a considerable speedup. It seems to save about 100 cycles per context switch compared to the baseline 4.6-rc1 behavior on a Skylake laptop. [ chang: 5~10% performance improvements were seen with a context switch benchmark that ran threads with different FS/GSBASE values (to the baseline 4.16). Minor edit on the changelog. ] [ tglx: Masaage changelog ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/fsgsbase/64: Enable FSGSBASE instructions in helper functionsChang S. Bae
Add cpu feature conditional FSGSBASE access to the relevant helper functions. That allows to accelerate certain FS/GS base operations in subsequent changes. Note, that while possible, the user space entry/exit GSBASE operations are not going to use the new FSGSBASE instructions. The reason is that it would require additional storage for the user space value which adds more complexity to the low level code and experiments have shown marginal benefit. This may be revisited later but for now the SWAPGS based handling in the entry code is preserved except for the paranoid entry/exit code. To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive() helpers. Note, for Xen PV, paravirt hooks can be added later as they might allow a very efficient but different implementation. [ tglx: Massaged changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/fsgsbase/64: Add intrinsics for FSGSBASE instructionsAndi Kleen
[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and make <asm/fsgsbase.h> safe to include on 32-bit kernels. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-6-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASEAndy Lutomirski
This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com
2019-06-22x86/ptrace: Prevent ptrace from clearing the FS/GS selectorChang S. Bae
When a ptracer writes a ptracee's FS/GSBASE with a different value, the selector is also cleared. This behavior is not correct as the selector should be preserved. Update only the base value and leave the selector intact. To simplify the code further remove the conditional checking for the same value as this code is not performance critical. The only recognizable downside of this change is when the selector is already nonzero on write. The base will be reloaded according to the selector. But the case is highly unexpected in real usages. [ tglx: Massage changelog ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com
2019-06-21x86/vdso: Prevent segfaults due to hoisted vclock readsAndy Lutomirski
GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock pages before the vclock mode checks. This creates a path through vclock_gettime() in which no vclock is enabled at all (due to disabled TSC on old CPUs, for example) but the pvclock or hvclock page nevertheless read. This will segfault on bare metal. This fixes commit 459e3a21535a ("gcc-9: properly declare the {pv,hv}clock_page storage") in the sense that, before that commit, GCC didn't seem to generate the offending code. There was nothing wrong with that commit per se, and -stable maintainers should backport this to all supported kernels regardless of whether the offending commit was present, since the same crash could just as easily be triggered by the phase of the moon. On GCC 9.1.1, this doesn't seem to affect the generated code at all, so I'm not too concerned about performance regressions from this fix. Cc: stable@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Reported-by: Duncan Roe <duncan_roe@optusnet.com.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-21x86/defconfigs: Remove useless UEVENT_HELPER_PATHKrzysztof Kozlowski
Remove the CONFIG_UEVENT_HELPER_PATH because: 1. It is disabled since commit 1be01d4a5714 ("driver: base: Disable CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was made default to 'n', 2. It is not recommended (help message: "This should not be used today [...] creates a high system load") and was kept only for ancient userland, 3. Certain userland specifically requests it to be disabled (systemd README: "Legacy hotplug slows down the system and confuses udev"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Adam Borowski <kilobyte@angband.pl> Cc: "Ahmed S. Darwish" <darwish.07@gmail.com> Cc: Alexey Brodkin <alexey.brodkin@synopsys.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1559635284-21696-1-git-send-email-krzk@kernel.org