Created by: johndydo
Sending multiple messages in rapid succession (or concurrently) causes deadlock.
While building an Acceptor implementation, I experienced the symptoms of a deadlock when the server happened to receive two requests in quick succession. The messages would be processed on the same goroutine as the FromApp callback and a response sent back using SendToTarget. The session thread got stuck in the second call to SendToTarget.
To debug the issue, I created an Initiator (which runs in a separate process on the same machine) that generates many requests:
for {
time.Sleep(time.Second * 10)
for i := 0; i < 10; i++ {
go func() {
log.Println("before send")
must(quickfix.SendToTarget(NewRequestMessage(), sessionId))
log.Println("after send")
}()
}
}
Surprisingly the Initiator would also deadlock quickly and stop printing "after send".
The deadlock occurs between the session thread and the EventTimer thread (for the stateTimer heartbeat timer).
Here is a diagram of the two threads:
Session EventTimer stateTimer
------ ----------------------
t=0 select select
t=1 case Timer fires
EventTimer.f()
blocking send to unbuffered channel sessionEvent
--- waiting for Session to synchronously receive from sessionEvent ---
--- never gets to select case which would read reset channel ---
t=2 case messageEvent
SendAppMessages() with at least 2 messages
t=3 send msg #1 to socket
t=4 stateTimer.Reset(s.HeartBtInt)
buffered send to Timer's reset channel (fills the buffer, but does not block)
t=5 send msg #2 to socket
t=6 stateTimer.Reset(s.HeartBtInt)
buffered send to Timer's reset channel (blocks since the channel buffer is full)
--- send is waiting for Timer's reset channel to not be full ---
--- never gets to the select case which reads sessionEvent ---
In summary, the session thread's select happens to enter the case for messageEvent, but there is a race with the timer thread which enters the case for executing the timer action.
The timer's action is to send to an unbuffered channel that can only get read when the session thread is in a different select case than it is currently in.
If there happens to be more than 1 message to send, then the session thread will call EventTimer.Reset more than once, which will guarantee that the reset channel will become full and now these threads are deadlocked.
The deadlock won't happen if EventTimer.Reset simply resets the timer in a nonblocking fashion.
Past commit https://github.com/quickfixgo/quickfix/commit/b75fbeaf62d9e9b242c0456681e6f5b0c887380b#diff-8dc0bfb75b5413b6f4ed2d880e648ad9 moved from using time.After to the current implementation.
This PR changes the EventTimer to instead use time.Timer directly, without the need for a reset channel, as any thread can call Reset on it to a future time, without the possibility of getting blocked.