completed and failed jobs should be removed from the stalled set
Created by: tomgrossman
Description
When a job moves to failed
or completed
status, it should be removed from the stalled
set if it's in there.
If a process took longer than expected, it make sense it will be considered as stalled
because it was unlocked. But if the worker didn't crash and just got stuck in one of the sub-processes, the job will finally be completed or failed, so it means it's not really stalled.
In order to avoid re-running of the same job, you can check if the job is in the stalled
set and remove it from there.
I know this is the current design and I can adjust the settings of stalledInterval
and maxStalledCount
. But if it can be avoided easily in the way I described, why not?
Worst than that, let's say the job was completed and cleaned, finally the stalled
job will be returned to wait
, but the job data is already deleted, so the worker will crash due to missing data of the job.
This is also can be avoided by the suggested fix.
Bull version
3.11.0