Created by: Xirider
Patch Description
Added a new flag max_loss_to_skip_batch
that, if set to some maximum acceptable loss will abort the iteration before doing an optimizer step.
The loss value to compare to is the same one used in the logs. It might be or not different to the one in tensorboard.
The logic is similar to our skip_gradient_update_on_clip_norm
flag which also skips batches, whenever the gradient norm is above the clip value, and also how we handle overflows.
Testing steps Tested this with our small sweep script. I think our disks are full so I couldn't test this with a longer run. For testing I increased the loss and checked whether we are skipping correctly