Created by: bhaan
I came across this output while debugging race conditions in my own project using the -race
flag:
==================
WARNING: DATA RACE
Read by main goroutine:
runtime.mapiterinit()
/usr/local/go/src/runtime/hashmap.go:607 +0x0
[my_project]/vendor/github.com/quickfixgo/quickfix.(*Acceptor).Stop()
[my_project]/vendor/github.com/quickfixgo/quickfix/acceptor.go:58 +0x9d
main.main()
[my_project]/main.go:248 +0xf81
Previous write by goroutine 17:
runtime.mapdelete()
/usr/local/go/src/runtime/hashmap.go:545 +0x0
[my_project]/vendor/github.com/quickfixgo/quickfix.(*Acceptor).Start.func1()
[my_project]/vendor/github.com/quickfixgo/quickfix/acceptor.go:45 +0x348
Goroutine 17 (running) created at:
[my_project]/vendor/github.com/quickfixgo/quickfix.(*Acceptor).Start()
[my_project]/vendor/github.com/quickfixgo/quickfix/acceptor.go:48 +0x49e
main.main()
[my_project]/main.go:238 +0xdd9
==================
Problem
The data race appears to be around operations on a map[int]chan bool
in the Acceptor, where the chan
is used to send a "quit" signal to goroutines that manage individual sessions.
The concurrent operations on the map occur while closing the channels (iterating over the map) during a call to acceptor.Stop()
:
for _, channel := range a.quitChans {
close(channel)
}
Then the close
of the channel will trigger a delete
operation on the map from a different goroutine:
case j := <-a.doneChannel:
delete(a.quitChans, j)
}
Solution
After looking at this for a bit, I concluded that creating a dedicated chan bool
(stop signal) for each goroutine managing a session was unnecessary. Instead, only one channel needs to be created and passed to the other goroutines because the channel is only used as a signal, not for passing real information. Then a single call to close
will release all goroutines blocking on that channel.
This approach reduces the overall complexity of managing the session specific goroutines by removing both the map of quitChans
, and a corresponding doneChannel
for each goroutine.
Let me know if there's another preferred approach, or if I've overlooked anything.