Age | Commit message (Collapse) | Author |
|
Amend orion_mdio_wait_ready so that the same timeout is used when
polling or using wait_event_timeout. Set the timeout to 1ms.
Replace udelay with usleep_range to avoid a busy loop, and set the
polling interval range as 45us to 55us, so that the first sleep
will be enough in almost all cases.
Generate the same log message at timeout when polling or using
wait_event_timeout.
Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function needn't to be public, so to make it as static.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The interface type, which is being traced by "struct be_adapter::
if_type", isn't used currently. So we can remove that safely
according to Sathya's comments.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use a more current logging style.
Convert printks to pr_<level>.
Consolidate multiple printks into a single printk to avoid
any possible dmesg interleaving. Add a default "event" msg
in case the listed types are ever expanded.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently multicast code attempts to extrace the vlan id from
the skb even when vlan filtering is disabled. This can lead
to mdb entries being created with the wrong vlan id.
Pass the already extracted vlan id to the multicast
filtering code to make the correct id is used in
creation as well as lookup.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:
====================
One patch for net/3.12 fixing an issue where devices could be in an
invalid state they are removed while still attached to OVS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com
Signed-off-by: Michael Drüing <michael@drueing.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:
0: No match
-1: Select classid given in "tc filter ..." command
else: flowid, overwrite the default one
As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:
echo 1 > /proc/sys/net/core/bpf_jit_enable
tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:
1) People familiar with tcpdump-like filters:
tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
2) People that want to low-level program their filters or use BPF
extensions that lack support by libpcap's compiler:
bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
ssh.ops example code:
ldh [12]
jne #0x800, drop
ldb [23]
jneq #6, drop
ldh [20]
jset #0x1fff, drop
ldxb 4 * ([14] & 0xf)
ldh [%x + 14]
jeq #0x16, pass
ldh [%x + 16]
jne #0x16, drop
pass: ret #-1
drop: ret #0
It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This cleans code a bit and will be useful when allocating buffers in
other places (like RX path, to avoid skb_copy_from_linear_data_offset).
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
This pull request contains the following netfilter fix:
* fix --queue-bypass in xt_NFQUEUE revision 3. While adding the
revision 3 of this target, the bypass flags were not correctly
handled anymore, thus, breaking packet bypassing if no application
is listening from userspace, patch from Holger Eitzenberger,
reported by Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to Software User's Manual, the event of last-level-cache
read/write misses is mapped to even counters. Odd counters of that
event number count miss cycles.
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6036/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
Reported by "make includecheck"
Tested that the corresponding sources still compile well on x86
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper. This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.
Reported-by: Nico Golde <nico@ngolde.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
"This contains five tooling fixes:
- fix a remaining mmap2 assumption which resulted in perf top output
breakage
- fix mmap ring-buffer processing bug that corrupts data
- fix for a severe python scripting memory leak
- fix broken (and user-visible) -g option handling
- fix stdio output
The diffstat size is larger than what we'd like to see this late :-/"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fixup mmap event consumption
perf top: Split -G and --call-graph
perf record: Split -g and --call-graph
perf hists: Add color overhead for stdio output buffer
perf tools: Fix up /proc/PID/maps parsing
perf script python: Fix mem leak due to missing Py_DECREFs on dict entries
|
|
Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs. That
doesn't actually help debugging at all.
So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the firmware image that we attempt to load doesn't
actually exist we have a broken firmware file or other
code not checking things correctly, so warn in such a
case. Also avoid assigning cur_ucode/ucode_loaded then.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
When writing the disable_power_off value, the LPRX
enable value also gets written unintentionally, so
fix that by adding the missing break statement.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
This can be useful when using the device as a sniffer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Having a WARN_ON() followed by a printed message is
less useful than having the message in the warning
so move the message.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
When we disassociate, mac80211 removes the station and
then, it sets the bss it unsets the assoc bool in bss_info.
Since the firwmware wants it the opposite (first set the
MAC context as unassoc, and only then, remove the STA of
the API), we have a small period of time in which the STA
in firmware doesn't have a valid ieee80211_sta pointer.
During that time, iwl_mvm_vif->ap_sta_id, is still set
to the STA in firmware that represent the AP.
This avoids:
[ 4481.476246] BUG: unable to handle kernel NULL pointer dereference at 00000045
[ 4481.479521] IP: [<f8416a6a>] iwl_mvm_bt_coex_reduced_txp+0x7a/0x190 [iwlmvm]
[ 4481.482023] *pde = 00000000
[ 4481.484332] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 4481.486897] Modules linked in: netconsole configfs autofs4 rfcomm(O) bnep(O) nfsd nfs_acl auth_rpcgss exportfs nfs lockd binfmt_misc sunrpc fscache arc4 iwlmvm(O) mac80211(O) btusb(O) iwlwifi(O) bluetooth(O) cfg80211(O) snd_hda_codec_hdmi coretemp dell_wmi snd_hda_codec_idt compat(O) dell_laptop aesni_intel i915 sparse_keymap dcdbas cryptd psmouse serio_raw aes_i586 microcode snd_hda_intel drm_kms_helper snd_hda_codec drm snd_pcm snd_timer i2c_algo_bit video intel_agp intel_gtt snd soundcore snd_page_alloc crc32c_intel ahci sdhci_pci libahci sdhci mmc_core e1000e xhci_hcd [last unloaded: configfs]
[ 4481.502983]
[ 4481.505599] Pid: 6507, comm: kworker/0:1 Tainted: G O 3.4.43-dev #1 Dell Inc. Latitude E6430/0CMDYV
[ 4481.508575] EIP: 0060:[<f8416a6a>] EFLAGS: 00010246 CPU: 0
[ 4481.511248] EIP is at iwl_mvm_bt_coex_reduced_txp+0x7a/0x190 [iwlmvm]
[ 4481.513947] EAX: ffffffea EBX: 00000002 ECX: 00000001 EDX: 00000001
[ 4481.516710] ESI: ec6f0f28 EDI: 00000000 EBP: e8175dfc ESP: e8175d9c
[ 4481.519445] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 4481.522185] CR0: 8005003b CR2: 00000045 CR3: 01a5e000 CR4: 001407d0
[ 4481.524950] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 4481.527768] DR6: ffff0ff0 DR7: 00000400
[ 4481.530565] Process kworker/0:1 (pid: 6507, ti=e8174000 task=e8032b20 task.ti=e8174000)
[ 4481.533447] Stack:
[ 4481.536379] e472439f 00003a12 e8032b20 e8033048 00000001 e8175ddc 00000246 e8033040
[ 4481.540132] 00000002 01814990 ec4d1ddc e8175dcc 00000000 00000000 00000000 00000000
[ 4481.543867] 00000000 00000000 00000001 000001c8 009b0002 ec4d1ddc ec6f0f28 00000000
[ 4481.547633] Call Trace:
[ 4481.550578] [<f8418027>] iwl_mvm_bt_rssi_event+0x197/0x220 [iwlmvm]
[ 4481.553537] [<f840919c>] iwl_mvm_stat_iterator+0xdc/0x240 [iwlmvm]
[ 4481.556582] [<f8d129c2>] __iterate_active_interfaces+0xe2/0x1f0 [mac80211]
[ 4481.559544] [<f84090c0>] ? iwl_mvm_update_smps+0x90/0x90 [iwlmvm]
[ 4481.562519] [<f84090c0>] ? iwl_mvm_update_smps+0x90/0x90 [iwlmvm]
[ 4481.565498] [<f8d12b0c>] ieee80211_iterate_active_interfaces+0x3c/0x50 [mac80211]
[ 4481.568421] [<f8409b43>] iwl_mvm_rx_statistics+0xb3/0x130 [iwlmvm]
[ 4481.571349] [<f8405431>] iwl_mvm_async_handlers_wk+0xc1/0xf0 [iwlmvm]
[ 4481.574251] [<c1052915>] ? process_one_work+0x105/0x5c0
[ 4481.577162] [<c1052991>] process_one_work+0x181/0x5c0
[ 4481.580025] [<c1052915>] ? process_one_work+0x105/0x5c0
[ 4481.582861] [<f8405370>] ? iwl_mvm_rx_fw_logs+0x20/0x20 [iwlmvm]
[ 4481.585722] [<c10530f1>] worker_thread+0x121/0x2c0
[ 4481.588536] [<c1052fd0>] ? rescuer_thread+0x1d0/0x1d0
[ 4481.591323] [<c105af0d>] kthread+0x7d/0x90
[ 4481.594059] [<c105ae90>] ? flush_kthread_worker+0x120/0x120
[ 4481.596868] [<c15b7cc2>] kernel_thread_helper+0x6/0x10
[ 4481.599605] Code: 9d de c3 c8 85 c0 74 0d 80 3d f8 ae 42 f8 00 0f 84 dc 00 00 00 8b 45 c8 0f b6 d3 31 ff 89 55 c0 8b 84 90 d8 03 00 00 0f b6 55 c7 <38> 50 5b 89 45 bc 0f 84 a8 00 00 00 a1 e4 d2 04 c2 85 c0 0f 84
[ 4481.611782] EIP: [<f8416a6a>] iwl_mvm_bt_coex_reduced_txp+0x7a/0x190 [iwlmvm] SS:ESP 0068:e8175d9c
[ 4481.614985] CR2: 0000000000000045
[ 4481.687441] ---[ end trace b11bc915fbac4412 ]---
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
The number of commands can never be negative, so it should
be using an unsigned type. This also shuts up an smatch
warning elsewhere in the code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Change old UAPSD bit to PM_CMD_SUPPORT, and add a new bit to indicate
real UAPSD support.
Don't use UAPSD when the firmware doesn't support it.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Originally I've thought that this is leftover hw state dirt from the
BIOS. But after way too much helpless flailing around on my part I've
noticed that the actual bug is when we change the state of an already
active pipe.
For example when we change the fdi lines from 2 to 3 without switching
off outputs in-between we'll never see the crucial on->off transition
in the ->modeset_global_resources hook the current logic relies on.
Patch version 2 got this right by instead also checking whether the
pipe is indeed active. But that in turn broke things when pipes have
been turned off through dpms since the bifurcate enabling is done in
the ->crtc_mode_set callback.
To address this issues discussed with Ville in the patch review move
the setting of the bifurcate bit into the ->crtc_enable hook. That way
we won't wreak havoc with this state when userspace puts all other
outputs into dpms off state. This also moves us forward with our
overall goal to unify the modeset and dpms on paths (which we need to
have to allow runtime pm in the dpms off state).
Unfortunately this requires us to move the bifurcate helpers around a
bit.
Also update the commit message, I've misanalyzed the bug rather badly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507
Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
V3 of the NFQUEUE target ignores the --queue-bypass flag,
causing packets to be dropped when the userspace listener
isn't running.
Regression is in since 8746ddcf12bb26 ("netfilter: xt_NFQUEUE:
introduce CPU fanout").
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix to return -ENOMEM in the memory alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch removes the need to keep a zero_base variable in the adapter
structure. Now we just use two different macros to set the non-zero and
zero base. This adds to readability and shortens some of the structure
initialization under 80 columns. The gathering of status for ethtool was
slightly modified to again better fit into 80 columns and become a bit
more readable.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch adds the extended statistics similar to the ixgbe driver. These
statistics keep track of how often the busy polling yields, as well as how many
packets are cleaned or missed by the polling routine.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables
sockets which have enabled the SO_BUSY_POLL socket option to use the
ndo_busy_poll_recv operation which could result in lower latency, at the cost
of higher CPU utilization, and increased power usage. This support is similar
to how the ixgbe driver works.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.
What triggered it is that my WSM-EP is broken :-(
[ 0.001000] tsc: Fast TSC calibration using PIT
[ 0.002000] tsc: Detected 2533.715 MHz processor
[ 0.500180] TSC synchronization [CPU#0 -> CPU#6]:
[ 0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.
This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.
While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.
So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:
<idle>-0 [013] d.h. 886.311970: rcu_nmi_enter <-do_nmi
...
<idle>-0 [013] d.h. 886.311997: perf_sample_event_took: HERE!!! : 1040990
So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!
Now since all this measurement stuff lives in x86 code, we can actually
fix it.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.
When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rather than return true/false indicating whether there was budget left, return
the total packets cleaned. This currently has no use, but will be used in a
following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track
how many packets were cleaned during the busy poll as part of the extended
statistics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on
how packets can be received. It will be extended in a following patch for
CONFIG_NET_RX_BUSY_POLL support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch removes the unnecessary display of PCIe bandwidth twice. Since the
ixgbe_check_minimum_link does a better job, and ensures accurate detection on
even complex chains, this older check is no longer necessary.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch updates the ixgbe_check_minimum_link function to correctly show that
there is some minor loss of encoding, even though we don't calculate it in the
max GT/s equation. It is small enough to not bother, but is better to report it
than not.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
A THP PMD update is accounted for as 512 pages updated in vmstat. This is
large difference when estimating the cost of automatic NUMA balancing and
can be misleading when comparing results that had collapsed versus split
THP. This patch addresses the accounting issue.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
THP migration uses the page lock to guard against parallel allocations
but there are cases like this still open
Task A Task B
--------------------- ---------------------
do_huge_pmd_numa_page do_huge_pmd_numa_page
lock_page
mpol_misplaced == -1
unlock_page
goto clear_pmdnuma
lock_page
mpol_misplaced == 2
migrate_misplaced_transhuge
pmd = pmd_mknonnuma
set_pmd_at
During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There are three callers of task_numa_fault():
- do_huge_pmd_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_pmd_numa_page():
Accounts not at all when the page isn't migrated, otherwise
accounts against the node we migrated towards.
This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.
So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.
They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
THP migrations are serialised by the page lock but on its own that does
not prevent THP splits. If the page is split during THP migration then
the pmd_same checks will prevent page table corruption but the unlock page
and other fix-ups potentially will cause corruption. This patch takes the
anon_vma lock to prevent parallel splits during migration.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The locking for migrating THP is unusual. While normal page migration
prevents parallel accesses using a migration PTE, THP migration relies on
a combination of the page_table_lock, the page lock and the existance of
the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.
If a THP page is currently being migrated and another thread traps a
fault on the same page it checks if the page is misplaced. If it is not,
then pmd_numa is cleared. The problem is that it checks if the page is
misplaced without holding the page lock meaning that the racing thread
can be migrating the THP when the second thread clears the NUMA bit
and faults a stale page.
This patch checks if the page is potentially being migrated and stalls
using the lock_page if it is potentially being migrated before checking
if the page is misplaced or not.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If another task handled a hinting fault in parallel then do not double
account for it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-5-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
ixgbe_napi_disable_all calls napi_disable on each queue, however the busy
polling code introduced a local_bh_disable()d context around the napi_disable.
The original author did not realize that napi_disable might sleep, which would
cause a sleep while atomic BUG. In addition, on a single processor system, the
ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an
ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to
indicate to the poll and napi routines that the q_vector has been disabled. Now
the ixgbe_napi_disable_all function will wait until all pending work has been
finished and prevent any future work from being started.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@intel.com>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
napi_disable uses an msleep() call to wait for outstanding napi work to be
finished after setting the disable bit. It does not always sleep incase there
was no outstanding work. This resulted in a rare bug in ixgbe_down operation
where a napi_disable call took place inside of a local_bh_disable()d context.
In order to enable easier detection of future sleep while atomic BUGs, this
patch adds a might_sleep() call, so that every use of napi_disable during
atomic context will be visible.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Alexander Duyck <alexander.duyck@intel.com>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch removes the burden from the NIC drivers to check if the
vxlan driver is enabled in the kernel and also makes available
the vxlan headrooms to them.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
* Add color overhead for stdio output buffer, which fixes
--stdio output being chopped up on the hot (red) entries,
fix from Jiri Olsa.
* Get 'perf record -g -a sleep 1' working again, removing the
need for -- separating perf options from the workload, restoring
ages old behaviour, fix from Jiri Olsa.
More patches allowing ~/.perfconfig setting up of default
callchain collecting method ("fp" or "dwarf") left for next
merge window.
* Fixup mmap event consumption, where we were acking the
consumption by writing the tail before actually accessing
the event, which could lead to using overwritten records
in things like 'perf record --call-graph'. From Zhouyi Zhou.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.
Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The padlen member of struct esp_data is always zero. Get rid of it.
Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
time_after_eq() only works if the delta is < MAX_ULONG/2.
For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).
Use jiffies_64 variant to mitigate this problem for 32bit Dom0.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|