Possible infinite loop if BRPOPLPUSH fails ?
Created by: smennesson
Description
Hello today we faced a problem that seemed to be an infinite loop on Bull library. Our service is hosted on Heroku with the Redis addon and today we reached the memory quota of the Redis DB. What happened is that we had an enormous log stack saying this:
BRPOPLPUSH { ReplyError: OOM command not allowed when used memory > 'maxmemory'.
at parseError (/app/node_modules/ioredis/node_modules/redis-parser/lib/parser.js:179:12)
at parseType (/app/node_modules/ioredis/node_modules/redis-parser/lib/parser.js:302:14)
command:
{ name: 'brpoplpush',
args:
[ 'bull:<name-of-our-job>:wait',
'bull:<name-of-our-job>:active',
'5' ] } }
The log stack took up to 1gb in a few minutes until we fix the quota issue.
By looking a little bit to the code in lib/queue.js
, it seems that the error on BRPOPLPUSH is ignored in Queue.prototype.getNextJob
. So I guess that what happened is that the loop searching for new jobs to process was infinitely popping the error.
I don't have enough knowledge about how Bull internally works to propose a fix, but I think this is something that should be handled, maybe by detecting when there is several errors on BRPOPLPUSH and add a waiting duration when this happens to frequently.
Bull version
v3.7.0
(just seen that 3.8 is out ; it doesn't seem that it would be fixed in this version by reading the changelog)