[Feature] re-architecting the non-polling mechanism
today we use the redis command BRPOPLPUSH, which allows us to move a job atomically from the wait list to the active list. The advantages are:
- very low latency to start processing a job (with empty queue).
- atomically move a job from one list to another.
- low CPU usage since no polling is required.
However, there are a number of disadvantages which, in my opinion are too much of a burden:
- cannot perform the move operation and other operations (such as job lock) atomically.
- requires a dedicated extra connection to redis.
So I have been thinking if we can have a new mechanism that includes the advantages without the named disadvantages. This is what I propose:
- poll only once after processing a job.
- use pubsub to notify workers that new jobs have been added.
- listen to close/connect events checking if there are elements to process in the wait list.
Basically when a worker starts, the first think he does is set the subscription to the "add job" event. After that it checks if there are jobs to process in the wait list, if not, he keeps idling until an "add job" event has been received. If the connection to redis is lost, start the process again by polling once the wait list.
When the worker finds a job in the queue, he can move it atomically from wait to active as well as to set the lock on the job.
If there is a guarantee that redis delivers all the published messages as long as there is a valid connection, then this mechanism should work. According to redis documentation: "Because Redis Pub/Sub is fire and forget currently there is no way to use this feature if you application demands reliable notification of events, that is, if your Pub/Sub client disconnects, and reconnects later, all the events delivered during the time the client was disconnected are lost." So the question is if we can "reliably" decide that a client has been disconnected, if so we can poll on the next reconnect and everything should be fine.