summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-12-06sched, nohz: Implement sched group, domain aware nohz idle load balancingSuresh Siddha
When there are many logical cpu's that enter and exit idle often, members of the global nohz data structure are getting modified very frequently causing lot of cache-line contention. Make the nohz idle load balancing more scalabale by using the sched domain topology and 'nr_busy_cpu's in the struct sched_group_power. Idle load balance is kicked on one of the idle cpu's when there is atleast one idle cpu and: - a busy rq having more than one task or - a busy rq's scheduler group that share package resources (like HT/MC siblings) and has more than one member in that group busy or - for the SD_ASYM_PACKING domain, if the lower numbered cpu's in that domain are idle compared to the busy ones. This will help in kicking the idle load balancing request only when there is a potential imbalance. And once it is mostly balanced, these kicks will be minimized. These changes helped improve the workload that is context switch intensive between number of task pairs by 2x on a 8 socket NHM-EX based system. Reported-by: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20111202010832.602203411@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched, nohz: Track nr_busy_cpus in the sched_group_powerSuresh Siddha
Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group because sched groups are duplicated for the SD_OVERLAP scheduler domain] and for each cpu that enters and exits idle, this parameter will be updated in each scheduler group of the scheduler domain that this cpu belongs to. To avoid the frequent update of this state as the cpu enters and exits idle, the update of the stat during idle exit is delayed to the first timer tick that happens after the cpu becomes busy. This is done using NOHZ_IDLE flag in the struct rq's nohz_flags. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched, nohz: Introduce nohz_flags in 'struct rq'Suresh Siddha
Introduce nohz_flags in the struct rq, which will track these two flags for now. NOHZ_TICK_STOPPED keeps track of the tick stopped status that gets set when the tick is stopped. It will be used to update the nohz idle load balancer data structures during the first busy tick after the tick is restarted. At this first busy tick after tickless idle, NOHZ_TICK_STOPPED flag will be reset. This will minimize the nohz idle load balancer status updates that currently happen for every tickless exit, making it more scalable when there are many logical cpu's that enter and exit idle often. NOHZ_BALANCE_KICK will track the need for nohz idle load balance on this rq. This will replace the nohz_balance_kick in the rq, which was not being updated atomically. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20111202010832.499438999@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched/rt: Code cleanup, remove a redundant function callShan Hai
The second call to sched_rt_period() is redundant, because the value of the rt_runtime was already read and it was protected by the ->rt_runtime_lock. Signed-off-by: Shan Hai <haishan.bai@gmail.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched: Fix the sched group node allocation for SD_OVERLAP domainsSuresh Siddha
For the SD_OVERLAP domain, sched_groups for each CPU's sched_domain are privately allocated and not shared with any other cpu. So the sched group allocation should come from the cpu's node for which SD_OVERLAP sched domain is being setup. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111118230554.164910950@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched: Set skip_clock_update in yield_task_fair()Mike Galbraith
This is another case where we are on our way to schedule(), so can save a useless clock update and resulting microscopic vruntime update. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321971686.6855.18.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched: Use rt.nr_cpus_allowed to recover select_task_rq() cyclesMike Galbraith
rt.nr_cpus_allowed is always available, use it to bail from select_task_rq() when only one cpu can be used, and saves some cycles for pinned tasks. See the line marked with '*' below: # taskset -c 3 pipe-test PerfTop: 997 irqs/sec kernel:89.5% exact: 0.0% [1000Hz cycles], (all, CPU: 3) ------------------------------------------------------------------------------------------------ Virgin Patched samples pcnt function samples pcnt function _______ _____ ___________________________ _______ _____ ___________________________ 2880.00 10.2% __schedule 3136.00 11.3% __schedule 1634.00 5.8% pipe_read 1615.00 5.8% pipe_read 1458.00 5.2% system_call 1534.00 5.5% system_call 1382.00 4.9% _raw_spin_lock_irqsave 1412.00 5.1% _raw_spin_lock_irqsave 1202.00 4.3% pipe_write 1255.00 4.5% copy_user_generic_string 1164.00 4.1% copy_user_generic_string 1241.00 4.5% __switch_to 1097.00 3.9% __switch_to 929.00 3.3% mutex_lock 872.00 3.1% mutex_lock 846.00 3.0% mutex_unlock 687.00 2.4% mutex_unlock 804.00 2.9% pipe_write 682.00 2.4% native_sched_clock 713.00 2.6% native_sched_clock 643.00 2.3% system_call_after_swapgs 653.00 2.3% _raw_spin_unlock_irqrestore 617.00 2.2% sched_clock_local 633.00 2.3% fsnotify 612.00 2.2% fsnotify 605.00 2.2% sched_clock_local 596.00 2.1% _raw_spin_unlock_irqrestore 593.00 2.1% system_call_after_swapgs 542.00 1.9% sysret_check 559.00 2.0% sysret_check 467.00 1.7% fget_light 472.00 1.7% fget_light 462.00 1.6% finish_task_switch 461.00 1.7% finish_task_switch 437.00 1.5% vfs_write 442.00 1.6% vfs_write 431.00 1.5% do_sync_write 428.00 1.5% do_sync_write * 413.00 1.5% select_task_rq_fair 404.00 1.5% _raw_spin_lock_irq 386.00 1.4% update_curr 402.00 1.4% update_curr 385.00 1.4% rw_verify_area 389.00 1.4% do_sync_read 377.00 1.3% _raw_spin_lock_irq 378.00 1.4% vfs_read 369.00 1.3% do_sync_read 340.00 1.2% pipe_iov_copy_from_user 360.00 1.3% vfs_read 316.00 1.1% __wake_up_sync_key 342.00 1.2% hrtick_start_fair 313.00 1.1% __wake_up_common Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06sched: Clean up domain traversal in select_idle_sibling()Suresh Siddha
Instead of going through the scheduler domain hierarchy multiple times (for giving priority to an idle core over an idle SMT sibling in a busy core), start with the highest scheduler domain with the SD_SHARE_PKG_RESOURCES flag and traverse the domain hierarchy down till we find an idle group. This cleanup also addresses an issue reported by Mike where the recent changes returned the busy thread even in the presence of an idle SMT sibling in single socket platforms. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321556904.15339.25.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06events, sched: Add tracepoint for accounting blocked timeAndrew Vagin
This tracepoint shows how long a task is sleeping in uninterruptible state. E.g. it may show how long and where a mutex is waited for. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, core: Rate limit perf_sched_events jump_label patchingGleb Natapov
jump_lable patching is very expensive operation that involves pausing all cpus. The patching of perf_sched_events jump_label is easily controllable from userspace by unprivileged user. When te user runs a loop like this: "while true; do perf stat -e cycles true; done" ... the performance of my test application that just increments a counter for one second drops by 4%. This is on a 16 cpu box with my test application using only one of them. An impact on a real server doing real work will be worse. Performance of KVM PMU drops nearly 50% due to jump_lable for "perf record" since KVM PMU implementation creates and destroys perf event frequently. This patch introduces a way to rate limit jump_label patching and uses it to fix the above problem. I believe that as jump_label use will spread the problem will become more common and thus solving it in a generic code is appropriate. Also fixing it in the perf code would result in moving jump_label accounting logic to perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this patch all details are nicely hidden inside jump_label code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Fix enable_on_exec for sibling eventsPeter Zijlstra
Deng-Cheng Zhu reported that sibling events that were created disabled with enable_on_exec would never get enabled. Iterate all events instead of the group lists. Reported-by: Deng-Cheng Zhu <dczhu@mips.com> Tested-by: Deng-Cheng Zhu <dczhu@mips.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322048382.14799.41.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Remove superfluous argumentsPeter Zijlstra
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-yv4o74vh90suyghccgykbnry@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Prefer fixed-purpose counters when schedulingPeter Zijlstra
This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter
The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement event scheduler helper functionsRobert Richter
This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Avoid a useless pmu_disable() in the perf-tickPeter Zijlstra
Gleb writes: > Currently pmu is disabled and re-enabled on each timer interrupt even > when no rotation or frequency adjustment is needed. On Intel CPU this > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal > it does not cause significant slowdown, but when running perf in a virtual > machine it leads to 20% slowdown on my machine. Cure this by keeping a perf_event_context::nr_freq counter that counts the number of active events that require frequency adjustments and use this in a similar fashion to the already existing nr_events != nr_active test in perf_rotate_context(). By being able to exclude both rotation and frequency adjustments a-priory for the common case we can avoid the otherwise superfluous PMU disable. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Clean up and extend do_int3()Srikar Dronamraju
Since there is a possibility of !KPROBES int3 listeners (such as kgdb) and since DIE_TRAP is currently not being used by anybody, notify all listeners with DIE_INT3. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025142159.GB21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Call do_notify_resume() with interrupts enabledSrikar Dronamraju
do_notify_resume() gets called with interrupts disabled on x86_32. This is different from the x86_64 behavior, where interrupts are enabled at the time. Queries on lkml on this issue hasn't yielded any clear answer. Lets make x86_32 behave the same as x86_64, unless there is a real reason to maintain status quo. Please refer https://lkml.org/lkml/2011/9/27/130 for more details. A similar change was suggested in ARM: https://lkml.org/lkml/2011/8/25/231 My 32-bit machine works fine (tm) with this patch. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025141812.GA21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06lockdep: Print lock name in lockdep_init_error()Ming Lei
This patch prints the name of the lock which is acquired before lockdep_init() is called, so that users can easily find which lock triggered the lockdep init error warning. This patch also removes the lockdep_init_error() message of "Arch code didn't call lockdep_init() early enough?" since lockdep_init() is called in arch independent code now. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321508072-23853-2-git-send-email-tom.leiming@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06init/main.c: Execute lockdep_init() as early as possibleMing Lei
This patch fixes a lockdep warning on ARM platforms: [ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough? [ 0.000000] Call stack leading to lockdep invocation was: [ 0.000000] [<c00164bc>] save_stack_trace_tsk+0x0/0x90 [ 0.000000] [<ffffffff>] 0xffffffff The warning is caused by printk inside smp_setup_processor_id(). It is safe to do this because lockdep_init() doesn't depend on smp_setup_processor_id(), so improve things that printk can be called as early as possible without lockdep complaint. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321508072-23853-1-git-send-email-tom.leiming@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()Yong Zhang
Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"), lockdep_init_map() will clear all the struct. But it will break lock_set_class()/lock_set_subclass(). A typical race condition is like below: CPU A CPU B lock_set_subclass(lockA); lock_set_class(lockA); lockdep_init_map(lockA); /* lockA->name is cleared */ memset(lockA); __lock_acquire(lockA); /* lockA->class_cache[] is cleared */ register_lock_class(lockA); look_up_lock_class(lockA); WARN_ON_ONCE(class->name != lock->name); lock->name = name; So restore to what we have done before commit f59de89 but annotate ->lock with kmemcheck_mark_initialized() to suppress the kmemcheck warning reported in commit f59de89. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Borislav Petkov <bp@alien8.de> Suggested-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06lockdep, rtmutex, bug: Show taint flags on errorBen Hutchings
Show the taint flags in all lockdep and rtmutex-debug error messages. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1319773015.6759.30.camel@deadeye Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdepPeter Zijlstra
It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false lockdep messages, so do not disable lockdep in that case. We still want to keep lockdep disabled in the TAINT_OOT_MODULE case: - bin-only modules can cause various instabilities in their and in unrelated kernel code - they are impossible to debug for kernel developers - they also typically do not have the copyright license permission to link to the GPL-ed lockdep code. Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06Merge commit 'v3.2-rc4' into core/lockingIngo Molnar
Merge reason: Pick up post-rc1 fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06MAINTAINERS: Update tip.git related git treesPeter Zijlstra
Update the six major subsystem trees hosted in the tip tree to the new location (or add the location if it was missing). Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-w0z98as3kwy9bo1o3k2mmuvi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code supportSteffen Persvold
I used "ifdef CONFIG_NUMA" simply because it doesn't make sense in a non-numa configuration even with SMP enabled. Besides, the only place where it is called right now is in kernel/cpu/amd.c:srat_detect_node() within the "CONFIG_NUMA" protected part. Signed-off-by: Steffen Persvold <sp@numascale.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06iscsi-target: Fix hex2bin warn_unused compile messageNicholas Bellinger
Fix the following compile warning with hex2bin() usage: drivers/target/iscsi/iscsi_target_auth.c: In function ‘chap_string_to_hex’: drivers/target/iscsi/iscsi_target_auth.c:35: warning: ignoring return value of ‘hex2bin’, declared with attribute warn_unused_result Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Don't return an error if disabling unsupported featuresAndy Grover
If an attribute is present (but not yet supported) it should be OK to write 0 (a no-op) to the attribute. This is an issue because userspace should be able to save and restore all set attribute values without error. Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target/rd: fix or rewrite the copy routineSebastian Andrzej Siewior
So the code assumes that the sg list is only a array while in reality loopback SGL memory via scsi_cmnd into target-core may be already chained. This patch converts ramdisk code to use sg_miter logic from scatterlist.h in order to properly support passthrough SGL usage with transport_generic_map_mem_to_cmd() via loopback. With this patch the bug goes away. However after umount/mount of the device my files are gone. So something is still not right. After looking at it for a while I decided to rewrite the that part of the code and now things do work for me. For reference: - http://article.gmane.org/gmane.linux.scsi.target.devel/595 the sg_next() conversion - http://article.gmane.org/gmane.linux.scsi.target.devel/602 the rewrite of the copy code (nab: Fix compile warning in rd_MEMCPY) Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target/rd: simplify the page/offset computationSebastian Andrzej Siewior
Breakout rd_MEMCPY_do_task() usage of do_div() to tmp value during rd_request->rd_page assignment. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the unused se_dev_listChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target/file: walk properly over sg listSebastian Andrzej Siewior
This patch changes fileio to use for_each_sg() when walking se_task->task_sg memory passed into from loopback LLD struct scsi_cmnd scatterlist memory. This addresses an issue where FILEIO backends with loopback where hitting the following OOPs with mkfs.ext2: |kernel BUG at include/linux/scatterlist.h:97! |invalid opcode: 0000 [#1] PREEMPT SMP |Modules linked in: sd_mod tcm_loop target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs scsi_mod | |Pid: 671, comm: LIO_fileio Not tainted 3.1.0-rc10+ #139 Bochs Bochs |EIP: 0060:[<e0afd746>] EFLAGS: 00010202 CPU: 0 |EIP is at fd_do_task+0x396/0x420 [target_core_file] | [<e0aa7884>] __transport_execute_tasks+0xd4/0x190 [target_core_mod] | [<e0aa797c>] transport_execute_tasks+0x3c/0xf0 [target_core_mod] |EIP: [<e0afd746>] fd_do_task+0x396/0x420 [target_core_file] SS:ESP 0068:dea47e90 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove unused struct fieldsJörn Engel
Some are never used, some are set but never read, dev_hoq_count is incremented and decremented, but never read. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Fix page length in emulated INQUIRY VPD page 86hRoland Dreier
The LSB of the page length is at offset 3, not 2. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Handle 0 correctly in transport_get_sectors_6()Roland Dreier
SBC-3 says: A TRANSFER LENGTH field set to zero specifies that 256 logical blocks shall be written. Any other value specifies the number of logical blocks that shall be written. The old code was always just returning the value in the TRANSFER LENGTH byte. Fix this to return 256 if the byte is 0. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Don't return an error status for 0-length READ and WRITERoland Dreier
IO commands with a TRANSFER LENGTH of 0 are not an error; for example, for READ (10) and WRITE (10), SBC-3 says: A TRANSFER LENGTH field set to zero specifies that no logical blocks shall be read. This condition shall not be considered an error. In case we have nothing to do, just complete the command with good status. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06iscsi-target: Use kmemdup rather than duplicating its implementationThomas Meyer
The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06iscsi-target: Add missing F_BIT for iscsi_tm_rspNicholas Bellinger
This patch sets the missing ISCSI_FLAG_CMD_FINAL bit in iscsit_send_task_mgt_rsp() for a struct iscsi_tm_rsp PDU. This usage is hardcoded for all TM response PDUs in RFC-3720 section 10.6. Reported-by: whucecil <whucecil1999@gmail.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06iscsi-target: Fix residual count hanlding + remove iscsi_cmd->residual_countNicholas Bellinger
This patch fixes iscsi-target handling of underflow where residual data is causing an OOPs by using the incorrect iscsi_cmd_t->data_length initially assigned in iscsit_allocate_se_cmd(). It resets iscsi_cmd_t->data_length from se_cmd_t->data_length after transport_generic_allocate_tasks() has been invoked in iscsit_handle_scsi_cmd() RX context, and converts iscsi_cmd->residual_count usage to access iscsi_cmd->se_cmd.residual_count to get the proper residual count set by target-core. Reported-by: <lists@internyc.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Reject SCSI data overflow for fabrics using ↵Nicholas Bellinger
transport_generic_map_mem_to_cmd This patch changes transport_generic_map_mem_to_cmd() to reject SCSI data overflow and to send exception status with CHECK_CONDITION + TCM_INVALID_CDB_FIELD for fabrics that are passing a pre-populated struct scatterlist (eg: tcm_loop and iscsi-target) being mapped into se_cmd->t_data_sg and se_cmd->t_data_nents. This addresses an OOPs where transport_allocate_data_tasks() would walk the incorrect post OVERFLOW cmd->data_length value beyond the end of the passed scatterlist. Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fieldsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the t_tasks_bidi se_cmd fieldChristoph Hellwig
And use a SCF_BIDI flag instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the t_tasks_fua se_cmd fieldChristoph Hellwig
And use a SCF_FUA flag instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the se_ordered_node se_cmd fieldChristoph Hellwig
We never walk ordered_cmd_list in the se_device, so remove all code related to supporting it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fieldsChristoph Hellwig
We already have a perfectly valid se_device pointer in the command, so remove the mostly useless duplicates. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Drop config_item_name usage in fabric TFO->free_wwn()Nicholas Bellinger
This patch removes config_item_name() informational usage of TFO->free_wwn() treewide in loopback, tcm_fc, ib_srpt and tcm_vhost module code. Using v4 target_core_fabric_configfs.c logic, a fabric call for config_item_name() in TFO->drop_wwn() context returns NULL as target_fabric_drop_wwn() invoking config_item_put() -> config_group_put() will release fabric_port->port_wwn.wwn_group before the last config_item_put() -> TFO->drop_wwn() is invoked. Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Get rid of unused se_cmd_cacheRoland Dreier
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Avoid compiler warnings about signed one-bit bitfieldsBart Van Assche
Convert to unsigned bit fields for active I/O shutdown fields. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06target: Improve system responsivity during I/OBart Van Assche
While testing ib_srpt I noticed that the target system became rather unresponsive during intensive I/O. The patch below made my target system responsive again during I/O without decreasing performance. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-06iscsi-target: Fix sess allocation leak in iscsi_login_zero_tsih_s1Nicholas Bellinger
This patch adds missing kfree() for an allocation in iscsi_login_zero_tsih_s1() code, and make transport_init_session() check for IS_ERR() returns. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>