Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.
The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.
There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"
* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
|
|
git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull dma-mapping update from Marek Szyprowski:
"Provide the dma write coherent api (available previously on ARM
architecture) for all other architectures, which use dma_ops-based dma
mapping implementation.
This lets one to use the same code in the device drivers regardless of
the selected architecture"
* 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
dma-mapping: Provide write-combine allocations
s390: Implement dma_{alloc,free}_attrs()
|
|
The different architectures used their own (and different) declarations:
extern __visible const void __nosave_begin, __nosave_end;
extern const void __nosave_begin, __nosave_end;
extern long __nosave_begin, __nosave_end;
Consolidate them using the first variant in <asm/sections.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Main changes:
- Fix the deadlock reported by Dave Jones et al
- Clean up and fix nohz_full interaction with arch abilities
- nohz init code consolidation/cleanup"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: nohz full depends on irq work self IPI support
nohz: Consolidate nohz full init code
arm64: Tell irq work about self IPI support
arm: Tell irq work about self IPI support
x86: Tell irq work about self IPI support
irq_work: Force raised irq work to run on irq work interrupt
irq_work: Introduce arch_irq_work_has_interrupt()
nohz: Move nohz full init call to tick init
|
|
Pull networking updates from David Miller:
"Most notable changes in here:
1) By far the biggest accomplishment, thanks to a large range of
contributors, is the addition of multi-send for transmit. This is
the result of discussions back in Chicago, and the hard work of
several individuals.
Now, when the ->ndo_start_xmit() method of a driver sees
skb->xmit_more as true, it can choose to defer the doorbell
telling the driver to start processing the new TX queue entires.
skb->xmit_more means that the generic networking is guaranteed to
call the driver immediately with another SKB to send.
There is logic added to the qdisc layer to dequeue multiple
packets at a time, and the handling mis-predicted offloads in
software is now done with no locks held.
Finally, pktgen is extended to have a "burst" parameter that can
be used to test a multi-send implementation.
Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
virtio_net
Adding support is almost trivial, so export more drivers to
support this optimization soon.
I want to thank, in no particular or implied order, Jesper
Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
David Tat, Hannes Frederic Sowa, and Rusty Russell.
2) PTP and timestamping support in bnx2x, from Michal Kalderon.
3) Allow adjusting the rx_copybreak threshold for a driver via
ethtool, and add rx_copybreak support to enic driver. From
Govindarajulu Varadarajan.
4) Significant enhancements to the generic PHY layer and the bcm7xxx
driver in particular (EEE support, auto power down, etc.) from
Florian Fainelli.
5) Allow raw buffers to be used for flow dissection, allowing drivers
to determine the optimal "linear pull" size for devices that DMA
into pools of pages. The objective is to get exactly the
necessary amount of headers into the linear SKB area pre-pulled,
but no more. The new interface drivers use is eth_get_headlen().
From WANG Cong, with driver conversions (several had their own
by-hand duplicated implementations) by Alexander Duyck and Eric
Dumazet.
6) Support checksumming more smoothly and efficiently for
encapsulations, and add "foo over UDP" facility. From Tom
Herbert.
7) Add Broadcom SF2 switch driver to DSA layer, from Florian
Fainelli.
8) eBPF now can load programs via a system call and has an extensive
testsuite. Alexei Starovoitov and Daniel Borkmann.
9) Major overhaul of the packet scheduler to use RCU in several major
areas such as the classifiers and rate estimators. From John
Fastabend.
10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
Duyck.
11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
Dumazet.
12) Add Datacenter TCP congestion control algorithm support, From
Florian Westphal.
13) Reorganize sk_buff so that __copy_skb_header() is significantly
faster. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
netlabel: directly return netlbl_unlabel_genl_init()
net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
net: description of dma_cookie cause make xmldocs warning
cxgb4: clean up a type issue
cxgb4: potential shift wrapping bug
i40e: skb->xmit_more support
net: fs_enet: Add NAPI TX
net: fs_enet: Remove non NAPI RX
r8169:add support for RTL8168EP
net_sched: copy exts->type in tcf_exts_change()
wimax: convert printk to pr_foo()
af_unix: remove 0 assignment on static
ipv6: Do not warn for informational ICMP messages, regardless of type.
Update Intel Ethernet Driver maintainers list
bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
tipc: fix bug in multicast congestion handling
net: better IFF_XMIT_DST_RELEASE support
net/mlx4_en: remove NETDEV_TX_BUSY
3c59x: fix bad split of cpu_to_le32(pci_map_single())
net: bcmgenet: fix Tx ring priority programming
...
|
|
Pull KVM updates from Paolo Bonzini:
"Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest
and via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put
firmware in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization
improvements (including improved Windows support on Intel and
Jailhouse hypervisor support on AMD), adaptive PLE which helps
overcommitting of huge guests. Also included are some patches that
make KVM more friendly to memory hot-unplug, and fixes for rare
caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
kvm: do not handle APIC access page if in-kernel irqchip is not in use
KVM: s390: count vcpu wakeups in stat.halt_wakeup
KVM: s390/facilities: allow TOD-CLOCK steering facility bit
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
arm/arm64: KVM: Report correct FSC for unsupported fault types
arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
kvm: Fix kvm_get_page_retry_io __gup retval check
arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
kvm: x86: Unpin and remove kvm_arch->apic_access_page
kvm: vmx: Implement set_apic_access_page_addr
kvm: x86: Add request bit to reload APIC access page address
kvm: Add arch specific mmu notifier for page invalidation
kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
kvm: Fix page ageing bugs
kvm/x86/mmu: Pass gfn and level to rmapp callback.
x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
kvm: x86: use macros to compute bank MSRs
KVM: x86: Remove debug assertion of non-PAE reserved bits
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
kvm: Faults which trigger IO release the mmap_sem
...
|
|
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.
Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch introduces the halt_wakeup counter used by common code and uses it to
count vcpu wakeups done in s390 arch specific code.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
There is nothing to do for KVM to support TOD-CLOCK steering.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
|
|
|
|
Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
arch/mips/net/bpf_jit.c
drivers/net/can/flexcan.c
Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull KVM fixes from Paolo Bonzini:
"Two very simple bugfixes, affecting all supported architectures"
[ Two? There's three commits in here. Oh well, I guess Paolo didn't
count the preparatory symbol export ]
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: correct null pid check in kvm_vcpu_yield_to()
KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()
mm: export symbol dependencies of is_zero_pfn()
|
|
Using the new kvm_register_device_ops() interface makes us get rid of
an #ifdef in common code.
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to make the static inline function is_zero_pfn() callable by
modules, export its symbol dependencies 'zero_pfn' and (for s390 and
mips) 'zero_page_mask'.
We need this for KVM, as CONFIG_KVM is a tristate for all supported
architectures except ARM and arm64, and testing a pfn whether it refers
to the zero page is required to correctly distinguish the zero page
from other special RAM ranges that may also have the PG_reserved bit
set, but need to be treated as MMIO memory.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently we have 2 pkt_type_offset functions doing the same thing and
spread across the architecture files. Remove those and replace them
with a PKT_TYPE_OFFSET macro helper which gets the constant value from a
zero sized sk_buff member right in front of the bitfield with offsetof.
This new offset marker does not change size of struct sk_buff.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
r is already initialized to 0.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
|
|
The old handling of prefix pages was broken in the diag10 ballooner.
We now rely on gmap_discard to check for start > end and do a
slow path if the prefix swap pages are affected:
1. discard the pages from start to prefix
2. discard the absolute 0 pages
3. discard the pages after prefix swap to end
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
|
|
Due to the earlier check we know that ipte_lock_count must be 0.
No need to add a useless if. Let's make clear that we are going
to always wakeup when we execute that code.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
We must not fallthrough if the conditions for external call are not met.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
|
|
Currently we fill up a full 5 level page table to hold the guest
mapping. Since commit "support gmap page tables with less than 5
levels" we can do better.
Having more than 4 TB might be useful for some testing scenarios,
so let's just limit ourselves to 16TB guest size.
Having more than that is totally untested as I do not have enough
swap space/memory.
We continue to allow ucontrol the full size.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
We now propagate interrupt injection errors back to the ioctl. We
should mark functions that might fail with __must_check.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
|
|
We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
|
|
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.
We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.
As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in aa2d2c73c21f
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.
Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.
This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.
JIT tested on x86_64 and s390x with BPF test suite.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A bug fix for the vdso code, the loadparm for booting from SCSI is
added and the access permissions for the dasd module parameters are
corrected"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vdso: remove NULL pointer check from clock_gettime
s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
s390/dasd: Make module parameter visible in sysfs
|
|
|
|
Pull kvm fixes from Paolo Bonzini:
"A smattering of bug fixes across most architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
powerpc/kvm/cma: Fix panic introduces by signed shift operation
KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
KVM: s390/mm: Fix storage key corruption during swapping
arm/arm64: KVM: Complete WFI/WFE instructions
ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU
KVM: s390/mm: try a cow on read only pages for key ops
KVM: s390: Fix user triggerable bug in dead code
|
|
With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate. This patch moves the to be interpreted bytecode to
read-only pages.
In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").
This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).
Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.
Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables. Brad Spengler proposed the same idea
via Twitter during development of this patch.
Joint work with Hannes Frederic Sowa.
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled. Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.
To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.
For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters. Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.
This will be a slight slowdown on some arches. The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.
This is a prerequisite for making two-phase seccomp work cleanly.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
commit 0944fe3f4a32 ("s390/mm: implement software referenced bits")
triggered another paging/storage key corruption. There is an
unhandled invalid->valid pte change where we have to set the real
storage key from the pgste.
When doing paging a guest page might be swapcache or swap and when
faulted in it might be read-only and due to a parallel scan old.
An do_wp_page will make it writeable and young. Due to software
reference tracking this page was invalid and now becomes valid.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
|
|
Since 3.12 or more precisely commit 0944fe3f4a32 ("s390/mm:
implement software referenced bits") guest storage keys get
corrupted during paging. This commit added another valid->invalid
translation for page tables - namely ptep_test_and_clear_young.
We have to transfer the storage key into the pgste in that case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
|
|
The explicit NULL pointer check on the timespec argument is only
required for clock_getres but not for clock_gettime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Currently the loadparm is only supported for CCW IPL. But also for SCSI
IPL it can be specified either on the HMC load panel respectively
z/VM console or via diagnose 308.
So fix this for SCSI and add the required sysfs attributes for reading the
IPL loadparm and for setting the loadparm for re-IPL.
With this patch the following two sysfs attributes are introduced:
- /sys/firmware/ipl/loadparm (for system that have been IPLed from SCSI)
- /sys/firmware/reipl/fcp/loadparm
Because the loadparm is now available for SCSI and CCW it is moved
now from "struct ipl_block_ccw" to the generic "struct ipl_list_hdr".
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
New system call depends on crypto. As it did not have a separate config
option, CONFIG_KEXEC was modified to select CRYPTO and CRYPTO_SHA256.
But now previous patch introduced a new config option for new syscall.
So CONFIG_KEXEC does not require crypto. Remove that dependency.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
Remove unnecessary arguments that stem from this.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)
This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
e790d9ef6 KVM: add kvm_arch_sched_in
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).
Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
commit ab3f285f227f ("KVM: s390/mm: try a cow on read only pages for
key ops")' misaligned a code block. Let's fixup the indentation.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and features for 3.18 part 1
1. The usual cleanups: get rid of duplicate code, use defines, factor
out the sync_reg handling, additional docs for sync_regs, better
error handling on interrupt injection
2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes
3. Additional registers for kvm_run sync regs. This is usually not
needed in the fast path due to eventfd/irqfd, but kvm stat claims
that we reduced the overhead of console output by ~50% on my system
4. A rework of the gmap infrastructure. This is the 2nd step towards
host large page support (after getting rid of the storage key
dependency). We introduces two radix trees to store the guest-to-host
and host-to-guest translations. This gets us rid of most of
the page-table walks in the gmap code. Only one in __gmap_link is left,
this one is required to link the shadow page table to the process page
table. Finally this contains the plumbing to support gmap page tables
with less than 5 levels.
|
|
The radix tree rework removed all code that uses the gmap_rmap
and gmap_pgtable data structures. Remove these outdated definitions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Add an addressing limit to the gmap address spaces and only allocate
the page table levels that are needed for the given limit. The limit
is fixed and can not be changed after a gmap has been created.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Store the target address for the gmap segments in a radix tree
instead of using invalid segment table entries. gmap_translate
becomes a simple radix_tree_lookup, gmap_fault is split into the
address translation with gmap_translate and the part that does
the linking of the gmap shadow page table with the process page
table.
A second radix tree is used to keep the pointers to the segment
table entries for segments that are mapped in the guest address
space. On unmap of a segment the pointer is retrieved from the
radix tree and is used to carry out the segment invalidation in
the gmap shadow page table. As the radix tree can only store one
pointer, each host segment may only be mapped to exactly one
guest location.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
The S390 architecture advertises support for HAVE_DMA_ATTRS when PCI is
enabled. Patches to unify some of the DMA API would like to rely on the
dma_alloc_attrs() and dma_free_attrs() functions to be provided when an
architecture supports DMA attributes.
Rename dma_alloc_coherent() and dma_free_coherent() to dma_alloc_attrs()
and dma_free_attrs() since they are functionally equivalent and alias
the former to the latter for compatibility.
For consistency with other architectures, also reuse the existing symbol
HAVE_DMA_ATTRS defined in arch/Kconfig instead of providing a duplicate.
Select it when PCI is enabled.
While at it, drop a redundant 'default n' from the PCI Kconfig symbol.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Here are two fixes for s390 KVM code that prevent:
1. a malicious user to trigger a kernel BUG
2. a malicious user to change the storage key of read-only pages
|
|
Make the order of arguments for the gmap calls more consistent,
if the gmap pointer is passed it is always the first argument.
In addition distinguish between guest address and user address
by naming the variables gaddr for a guest address and vmaddr for
a user address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from
gmap_do_ipte_notify").
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Revert git commit 1b7fd6952063 ("remove unecessary parameter from
pgste_ipte_notify")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|