Jobs with delay < -new Date().getTime() never complete and cause 100% CPU usage
Created by: harris-m
Description
If you add a job with a delay value less than the negative unix epoch, it never runs and bull thrashes constantly running updateDelaySet
.
Minimal, Working Test code to reproduce the issue.
(An easy to reproduce test case will dramatically decrease the resolution time.)
var Queue = require("bull");
var testQueue = new Queue("test_queue", "redis://127.0.0.1:6379");
testQueue.process(function (job, done) {
console.log("job complete");
done();
});
// this works fine
// testQueue.add({}, { delay: -new Date().getTime() });
// subtract 1
// this never completes and locks CPU at 100%
testQueue.add({}, { delay: -new Date().getTime() - 1 });
Bull version
3.13.0
Additional information
I have created a pull request which fixes this issue: https://github.com/OptimalBits/bull/pull/1706
Explanation (as best as I could work out):
I noticed that my application using bull had maxed out CPU usage, on a node worker and redis. I used redis-cli monitor
to investigate:
1587536426.277449 [0 172.17.0.1:47268] "evalsha" "8b912cdc5b4c20108ef73d952464fba3a7470d7b" "6" "bull:test_queue:delayed" "bull:test_queue:active" "bull:test_queue:wait" "bull:test_queue:priority" "bull:test_queue:paused" "bull:test_queue:meta-paused" "bull:test_queue:" "1587536426276" "e94d3501-42b5-459d-8f15-854d6900ef18"
1587536426.277481 [0 lua] "ZRANGEBYSCORE" "bull:test_queue:delayed" "0" "6502549202026496" "LIMIT" "0" "1000"
1587536426.277491 [0 lua] "ZRANGE" "bull:test_queue:delayed" "0" "0"
"WITHSCORES"
1587536426.277500 [0 lua] "PUBLISH" "bull:test_queue:delayed" "-3.1539841246358106e+17"
I found these lines repeating infinitely. I noticed the negative numbers and tried clamping my application's job's delays to 0, and the issue went away. Looking into bull's codebase, I found this line which led me to the core issue. delayedTimestamp
is expected to be a positive value (a time in the future). However if it's a large enough negative value (less than negative unix epoch), delay
is set to a value <= 0 and the function repeats, causing this infinite loop.