Checkpointing happens based on train, not validate

Currently Checkpointing based on loss value happens in response to train batch loss at the end of the epoch. 

This should change to loss value reported by validate_batch if training is run with validation. If validation is not used it should behave as it does now.