Created by: KUNAL1612
Patch Description Added a small gpu test to ensure training works as expected. A small dataset (~18MB) is downloaded from a public S3 bucket and it is trained for 5 epochs. The files are then cleaned up.
Testing steps Launched a slurm job with 8 GPU's to test this. Compared returned loss value with expected loss value after 5 updates/5 epochs. Implemented PR #44 and some other changes to make sure that training broke and the assertions in the test failed. Also tried running in non-GPU environment to make sure test failed there too.