Age | Commit message (Collapse) | Author |
|
The nat extension structure is 32bytes in size on x86_64:
struct nf_conn_nat {
struct hlist_node bysource; /* 0 16 */
struct nf_conn * ct; /* 16 8 */
union nf_conntrack_nat_help help; /* 24 4 */
int masq_index; /* 28 4 */
/* size: 32, cachelines: 1, members: 4 */
/* last cacheline: 32 bytes */
};
The hlist is needed to quickly check for possible tuple collisions
when installing a new nat binding. Storing this in the extension
area has two drawbacks:
1. We need ct backpointer to get the conntrack struct from the extension.
2. When reallocation of extension area occurs we need to fixup the bysource
hash head via hlist_replace_rcu.
We can avoid both by placing the hlist_head in nf_conn and place nf_conn in
the bysource hash rather than the extenstion.
We can also remove the ->move support; no other extension needs it.
Moving the entire nat extension into nf_conn would be possible as well but
then we have to add yet another callback for deletion from the bysource
hash table rather than just using nat extension ->destroy hook for this.
nf_conn size doesn't increase due to aligment, followup patch replaces
hlist_node with single pointer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We don't need to acquire the bucket lock during early drop, we can
use lockless traveral just like ____nf_conntrack_find.
The timer deletion serves as synchronization point, if another cpu
attempts to evict same entry, only one will succeed with timer deletion.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
From: Liping Zhang <liping.zhang@spreadtrum.com>
Similar to ctnl_untimeout, when hash resize happened, we should try
to do unhelp from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
ctnl_untimeout, and iterate on 3000# bucket.
Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
are removed to the new hashtable. When this hash resize operation finished,
we still try to itreate ct begin from 3000# bucket, find nothing to do and
just return.
We may miss unlinking some timeout objects. And later we will end up with
invalid references to timeout object that are already gone.
So when we find that hash resize happened, try to unlink timeout objects
from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[<ffffffff81261a1c>] seq_read+0x2cc/0x390
[<ffffffff812a8d62>] proc_reg_read+0x42/0x70
[<ffffffff8123bee7>] __vfs_read+0x37/0x130
[<ffffffff81347980>] ? security_file_permission+0xa0/0xc0
[<ffffffff8123cf75>] vfs_read+0x95/0x140
[<ffffffff8123e475>] SyS_read+0x55/0xc0
[<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.
The solution in this patch is based on Florian's Commit 5e3c61f98175
("netfilter: conntrack: fix lookup race during hash resize"). And
add a wrapper function nf_conntrack_get_ht to get hash and hsize
suggested by Florian Westphal.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When working with Device Tree we iterate over children of "gpio-leds"
compatible node and create LED device for each of them. We take care of
all common DT properties (label, default trigger, state, etc.), however
some triggers may want to support more of them.
It could be useful for timer trigger to support setting delay on/off
values with Device Tree property. Or for transient trigger to support
e.g. duration property.
We obviously should handle such properties in triggers, not in generic
code. To let trigger drivers read properties from DT node we need to set
of_node to point the relevant node. This change allows using all kind of
of helpers in e.g. "activate" callbacks.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
|
|
This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
the dwmac is set to sgmii.
Signed-off-by: Tien Hock Loh <thloh@altera.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the target needs more time to process the received PDU, it sends
Response Timeout Extension (RTOX) PDU.
When the initiator receives a RTOX PDU, it must reply with a RTOX PDU
and extends the current rwt value with the formula:
rwt_int = rwt * rtox
This patch takes care of the rtox value passed by the target in the RTOX
PDU and extends the timeout for the next response accordingly.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
When sending an ATR_REQ, the initiator must wait for the ATR_RES at
least 'RWT(nfcdep,activation) + dRWT(nfcdep)' and no more than
'RWT(nfcdep,activation) + dRWT(nfcdep) + dT(nfcdep,initiator)'. This
gives a timeout value between 1237 ms and 1337 ms. This patch defines
DIGITAL_ATR_RES_RWT to 1337 used for the timeout value of ATR_REQ
command.
For other DEP PDUs, the initiator must wait between 'RWT + dRWT(nfcdep)'
and 'RWT + dRWT(nfcdep) + dT(nfcdep,initiator)' where RWT is given by
the following formula: '(256 * 16 / f(c)) * 2^wt' where wt is the value
of the TO field in the ATR_RES response and is in the range between 0
and 14. This patch declares a mapping table for wt values and gives RWT
max values between 100 ms and 5049 ms.
This patch also defines DIGITAL_ATR_RES_TO_WT, the maximum wt value in
target mode, to 8.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
This patch frees the RTOX resp sk_buff in initiator mode. It also makes
use of the free_resp exit point for ATN supervisor PDUs in both
initiator and target mode.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
With this patch, ACK PDU sk_buffs are now freed and code has been
refactored for better errors handling.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
When the target receives a NACK PDU, it re-sends the last sent PDU.
ACK PDUs are received by the target as a reply from the initiator to
chained I-PDUs. There are 3 cases to handle:
- If the target has previously received 1 or more ATN PDUs and the PNI
in the ACK PDU is equal to the target PNI - 1, then it means that the
initiator did not received the last issued PDU from the target. In
this case it re-sends this PDU.
- If the target has received 1 or more ATN PDUs but the ACK PNI is not
the target PNI - 1, then this means that this ACK is the reply of the
previous chained I-PDU sent by the target. The target did not received
it on the first attempt and it is being re-sent by the initiator. The
process continues as usual.
- No ATN PDU received before this ACK PDU. This is the reply of a
chained I-PDU. The target keeps on processing its chained I-PDU.
The code has been refactored to avoid too many indentation levels.
Also, ACK and NACK PDUs were not freed. This is now fixed.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
When the initiator sends a DEP_REQ I-PDU, the target device may not
reply in a timely manner. In this case the initiator device must send an
attention PDU (ATN) and if the recipient replies with an ATN PDU in
return, then the last I-PDU must be sent again by the initiator.
This patch fixes how the target handles I-PDU received after an ATN PDU
has been received.
There are 2 possible cases:
- The target has received the initial DEP_REQ and sends back the DEP_RES
but the initiator did not receive it. In this case, after the
initiator has sent an ATN PDU and the target replied it (with an ATN
as well), the initiator sends the saved skb of the initial DEP_REQ
again and the target replies with the saved skb of the initial
DEP_RES.
- Or the target did not even received the initial DEP_REQ. In this case,
after the ATN PDUs exchange, the initiator sends the saved skb and the
target simply passes it up, just as usual.
This behavior is controlled using the atn_count and the PNI field of the
digital device structure.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
When allocating chained I-PDUs, there is no need to call skb_reserve()
since it's already done by digital_alloc_skb() and contains enough room
for the driver head and tail data.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
This patch fixes the way an I-PDU is saved in case it needs to be sent
again. It is now copied using pskb_copy() and not simply referenced
using skb_get() since it could be modified by the driver.
digital_in_send_saved_skb() and digital_tg_send_saved_skb() still get a
reference on the saved skb which is re-sent but release it if the send
operation fails. That way the caller doesn't have to take care about skb
ref in case of error.
RTOX supervisor PDU must not be saved as this can override a previously
saved I-PDU that should be re-sent later on.
Signed-off-by: Thierry Escande <thierry.escande@collabora.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a.
Stephane reported the following regression:
> Since Andi added:
>
> commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a
> Author: Andi Kleen <ak@linux.intel.com>
> Date: Thu Jun 9 06:14:38 2016 -0700
>
> perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
>
> $ perf stat -e ref-cycles ls
> <not counted> ....
>
> fails systematically because the ref-cycles is now used by the
> watchdog and given this is a system-wide pinned event, it monopolizes
> the fixed counter 2 which is the only counter able to measure this event.
Since the next merge window is near, fix the regression for now
by reverting the commit.
Reported-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, a schedule while atomic error prints the stack trace to the
kernel log and the system continue running.
Although it is possible to collect the kernel log messages and analyze
it, often more information are needed. Furthermore, keep the system
running is not always the best choice. For example, when the preempt
count underflows the system will not stop to complain about scheduling
while atomic, so the kernel log can wrap around overwriting the first
stack trace, tuning the analysis even more challenging.
This patch uses the kernel.panic_on_warn sysctl to help out on these
more complex situations.
When kernel.panic_on_warn is set to 1, the kernel will panic() in the
schedule while atomic detection.
The default value of the sysctl is 0, maintaining the current behavior.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e8f7b80f353aa22c63bd8557208163989af8493d.1464983675.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff81374370>] pcie_isr
[<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus
Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We used to scan secondary buses until the following commit that
was applied in 2009:
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.
We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.
The commit message of 8659c406ade3 notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to 8659c406ade3 was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.
Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.
If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since the following commit:
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
... early quirks are only applied to devices on the root bus.
The motivation was to prevent application of the nvidia_bugs quirk on
secondary buses.
We're about to reintroduce scanning of secondary buses for a quirk to
reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
regressions, open code the requirement to apply nvidia_bugs only on the
root bus.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is to fix some conflicts in iwlwifi.
Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
drivers/net/wireless/intel/iwlwifi/mvm/scan.c
|
|
This showed up as "6Failed to access..." here.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1b74dde7c47c ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
Link: http://lkml.kernel.org/r/1468170841-17045-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pull MIPS fix from Ralf Baechle:
"Another week with just a single 4.7 fix.
This fixes a possible 'loss' of the huge page bit from pmd on
permission change"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Fix page table corruption on THP permission changes.
|
|
The vertical indentation is kinda chaotic in intel-mid.h. Let's be
consistent with it.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465992260-29897-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The new physical address randomized KASLR implementation can cause the
kernel to be aligned close to the end of physical memory. In this case,
_brk_end aligned to PMD will go beyond what is expected safe and hit
the assert in __phys_addr_symbol():
VIRTUAL_BUG_ON(y >= KERNEL_IMAGE_SIZE);
Instead, perform an inclusive range check to avoid incorrectly triggering
the assert:
kernel BUG at arch/x86/mm/physaddr.c:38!
invalid opcode: 0000 [#1] SMP
...
RIP: 0010:[<ffffffffbe055721>] __phys_addr_symbol+0x41/0x50
...
Call Trace:
[<ffffffffbe052eb9>] cpa_process_alias+0xa9/0x210
[<ffffffffbe109011>] ? do_raw_spin_unlock+0xc1/0x100
[<ffffffffbe051eef>] __change_page_attr_set_clr+0x8cf/0xbd0
[<ffffffffbe201a4d>] ? vm_unmap_aliases+0x7d/0x210
[<ffffffffbe05237c>] change_page_attr_set_clr+0x18c/0x4e0
[<ffffffffbe0534ec>] set_memory_4k+0x2c/0x40
[<ffffffffbefb08b3>] check_bugs+0x28/0x2a
[<ffffffffbefa4f52>] start_kernel+0x49d/0x4b9
[<ffffffffbefa4120>] ? early_idt_handler_array+0x120/0x120
[<ffffffffbefa4423>] x86_64_start_reservations+0x29/0x2b
[<ffffffffbefa4568>] x86_64_start_kernel+0x143/0x152
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Link: http://lkml.kernel.org/r/20160615190545.GA26071@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Mathieu Desnoyers reported that the STACK_FRAME_NON_STANDARD macro
wasn't working with the lttng_filter_interpret_bytecode() function in
the lttng-modules code.
Usually the relocation created by STACK_FRAME_NON_STANDARD creates a
reference to a section symbol like this:
Offset Type Value Addend Name
000000000000000000 X86_64_64 000000000000000000 +3136 .text
But in this case it created a reference to a function symbol:
Offset Type Value Addend Name
000000000000000000 X86_64_64 0x00000000000003a0 +0 lttng_filter_interpret_bytecode
To be honest I have no idea what causes gcc to decide to do one over the
other. But both are valid ELF, so add support for the function symbol.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lttng-dev@lists.lttng.org
Link: http://lkml.kernel.org/r/9cee42843bc6d94e990a152e4e0319cfdf6756ef.1466023450.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The arrays xstate_offsets[] and xstate_sizes[] record XSAVE standard-
format offsets and sizes. Values for non-extended state components
fpu and xmm's were not initialized or used. Ptrace format conversion
needs them. Fix it.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cf3ea36cf30e2a99e37da6483e65446d018ff0a7.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Component offset print out was incorrect for XSAVES. Correct it and move
to a separate function.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/86602a8ac400626c6eca7125c3e15934866fc38e.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
XSAVES uses compacted format and is a kernel instruction. The kernel
should use standard-format, non-supervisor state data for PTRACE.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
[ Edited away artificial linebreaks. ]
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/de3d80949001305fe389799973b675cab055c457.1466179491.git.yu-cheng.yu@intel.com
[ Made various readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
CPUID function 0x0d, sub function (i, i > 1) returns in ebx the offset of
xstate component i. Zero is returned for a supervisor state. A supervisor
state can only be saved by XSAVES and XSAVES uses a compacted format.
There is no fixed offset for a supervisor state. This patch checks and
makes sure a supervisor state offset is not recorded or mis-used. This has
no effect in practice as we currently use no supervisor states, but it
would be good to fix.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81b29e40d35d4cec9f2511a856fe769f34935a3f.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
CPUID function 0x0d, sub function (i, i > 1) returns in ecx[1] the
alignment requirement of component 'i' when the compacted format is used.
If ecx[1] is 0, component 'i' is located immediately following the preceding
component. If ecx[1] is 1, component 'i' is located on the next 64-byte
boundary following the preceding component.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/331e2bef1a0a7a584f06adde095b6bbfbe166472.1466179491.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There are several notes and warning mesages in the middle of
the media docbook. Use the ReST tags for that, as it makes
them visually better and hightlights them.
While here, modify a few ones to make them clearer.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
per the Intel 64 and IA-32 Architecture Software Developer's Manual...
Add the reference clock for Intel Atom Processors
Based on the Airmont Microarchitecture.
Reported-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/abc6a0f4b18281410da1a3f26e2819d8e03e144f.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Atom processors use a 19.2 MHz crystal oscillator.
Early processors generate 100 MHz via 19.2 MHz * 26 / 5 = 99.84 MHz.
Later preocessor generate 100 MHz via 19.2 MHz * 125 / 24 = 100 MHz.
Update the Silvermont-based tables accordingly,
matching the Software Developers Manual.
Also, correct a 166 MHz entry that should have been 116 MHz,
and add a missing 80 MHz entry.
Reported-by: Stephane Gasparini <stephane.gasparini@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5d7561655dfb066ff10801b423405bae4d1cfbe2.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Syntax only, no functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8653a2dba21fef122fc7b29eafb750e2004d3976.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Debugging messages are not necessary after all of the
possible hardware failures that never occur.
Instead, this code can simply return 0.
This code also doesn't need to print in the success case.
tsc_init() already prints the TSC frequency,
and apic=debug is available if anybody really is
interested in printing the LAPIC frequency.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cf03279a125b95dfa9b8d3d5b4a66de09cd04050.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
try_msr_calibrate_tsc() is currently Intel-specific,
and should not execute on any other vendor's parts.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1fe23c052826bdcfeb3d45045aa02246078cb5a7.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This reverts commit:
e2724e9d9692 ("x86/tsc: Add missing Cherrytrail frequency to the table")
... as it is incomplete, and is replaced by a more complete patch
later in this series.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2199d0e959f7f71a18827268b5d060f8d3831639.1466138954.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This command:
echo 5 > /proc/irq/10/smp_affinity
means only the first and third (not fourth) CPUs can handle irqs
That is, CPU0 is the first CPU and CPU2 is the third cpu
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466168715-8410-1-git-send-email-jkacur@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There are several notes for this DTV property. Some are
outdated, so take some care of it, making it updated.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Make the note better formatted and documented.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
This matches what is already done for prepare_exit_to_usermode(),
and saves about 60 clock cycles (4% speedup) with the benchmark
in the previous commit message.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Thanks to all the work that was done by Andy Lutomirski and others,
enter_from_user_mode() and prepare_exit_to_usermode() are now called only with
interrupts disabled. Let's provide them a version of user_enter()/user_exit()
that skips saving and restoring the interrupt flag.
On an AMD-based machine I tested this patch on, with force-enabled
context tracking, the speed-up in system calls was 90 clock cycles or 6%,
measured with the following simple benchmark:
#include <sys/signal.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>
unsigned long rdtsc()
{
unsigned long result;
asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n"
"or %%rdx, %%rax" : "=a" (result) : : "rdx");
return result;
}
int main()
{
unsigned long tsc1, tsc2;
int pid = getpid();
int i;
tsc1 = rdtsc();
for (i = 0; i < 100000000; i++)
kill(pid, SIGWINCH);
tsc2 = rdtsc();
printf("%ld\n", tsc2 - tsc1);
}
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Unfortunately, captions are new on Sphinx for c blocks: it was
added only on version 1.3. Also, it were already bad enough
not being able to auto-numerate them.
So, let's give up and use, instead, titles before the examples.
Not much is lost, and, as a side track, we don't need to
numerate them anymore.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Now that the LIRC header was added, we can cross-reference it
and identify the documentation gaps.
There are lots of stuff missing there, but at least now we
can avoid the gap to increase.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
Just like the other parts of the document, let's add the LIRC
header, as it is part of the API.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|