Created by: stephenroller
Patch Description These metrics have been clogging our stderr, but they're no longer relevant to us: pnorm/gnorm/etc all getting to zero is an artifact of fp16 having a very poor range for xmins. While this was important (and harrowing) to track before, the switch to bfloat16 has eliminated this concern for us.
Testing steps
2022-08-20 12:47:21 | INFO | train_inner | {"epoch": 1, "actv_norm": "937.49", "pos_norm": "0.701", "tok_norm": "1", "emb_norm": "0.008", "docsperex": "3.82", "loss": "10.436", "ppl": "1385.32", "wps": "419672", "ups": "0.4", "wpb": "1.04858e+06", "bsz": "512", "num_updates": "293", "lr": "9.84874e-05", "gnorm": "1.139", "clip": "100", "train_wall": "2", "cuda_gb_allocated": "14.7", "cuda_gb_reserved": "21.9", "cuda_gb_free": "64.5", "wall": "965"}