Validation
Validation Command
Quick Start
To run validation, execute:
./scripts/val.sh \
--checkpoint /path/to/checkpoint.pt \
--model_config /path/to/model_config.yaml \
--val_manifests /datasets/LibriSpeech/librispeech-dev-other-flac.cuts.jsonl.gz \
--val_data_dir /datasets/LibriSpeech/ \
--val_standardizer_lang en \
--val_batch_duration 3600
Arguments
Customise validation by specifying the --checkpoint, --model_config, and --val_manifests arguments to adjust the model checkpoint, model YAML configuration, and validation manifest file(s), respectively.
Predictions are saved as described here.
See args/val.py and
args/shared.py
for the complete set of arguments and their respective docstrings.
Further Detail
- All references and hypotheses are normalized with the normalizer before calculating WERs, as described in the WER calculation docs. Use
--val_standardizer_langto set the language for normalization. To switch off normalization, modify the respective config file entry to readstandardize_wer: false. - During validation the state resets technique is applied by default in order to increase the model’s accuracy.
- The model’s accuracy can be improved by using beam search and an n-gram language model.
- Validating on long utterances is calibrated to not run out of memory on a single 11 GB GPU. If a smaller GPU is used, or utterances are longer than 2 hours, refer to this document.
Next Step
See the hardware export documentation for instructions on exporting a hardware checkpoint for inference on an accelerator.