diff options
| author | Alex Estrin <alex.estrin@intel.com> | 2016-02-11 16:30:51 -0500 | 
|---|---|---|
| committer | Doug Ledford <dledford@redhat.com> | 2016-02-12 14:53:22 -0500 | 
| commit | 08bc327629cbd63bb2f66677e4b33b643695097c (patch) | |
| tree | 55f556c8853f61743d31e514957f619be18941d1 /lib/timerqueue.c | |
| parent | ee50aeac60ba5c4c7e072fbc0c9aa2043c8896e6 (diff) | |
IB/ipoib: fix for rare multicast join race condition
A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):
[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
 }
Proposed patch should prevent multicast join task to continue
if link state change is detected.
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Diffstat (limited to 'lib/timerqueue.c')
0 files changed, 0 insertions, 0 deletions
