Validation

Validation Command

Quick Start

To run validation, execute:

./scripts/val.sh \
  --checkpoint /path/to/checkpoint.pt \
  --model_config /path/to/model_config.yaml \
  --val_manifests /datasets/LibriSpeech/librispeech-dev-other-flac.cuts.jsonl.gz  \
  --val_data_dir /datasets/LibriSpeech/ \
  --val_standardizer_lang en \
  --val_batch_duration 3600

Arguments

Customise validation by specifying the --checkpoint, --model_config, and --val_manifests arguments to adjust the model checkpoint, model YAML configuration, and validation manifest file(s), respectively.

Predictions are saved as described here.

See args/val.py and args/shared.py for the complete set of arguments and their respective docstrings.

Further Detail

  • All references and hypotheses are normalized with the normalizer before calculating WERs, as described in the WER calculation docs. Use --val_standardizer_lang to set the language for normalization. To switch off normalization, modify the respective config file entry to read standardize_wer: false.
  • During validation the state resets technique is applied by default in order to increase the model’s accuracy.
  • The model’s accuracy can be improved by using beam search and an n-gram language model.
  • Validating on long utterances is calibrated to not run out of memory on a single 11 GB GPU. If a smaller GPU is used, or utterances are longer than 2 hours, refer to this document.

Next Step

See the hardware export documentation for instructions on exporting a hardware checkpoint for inference on an accelerator.