Log-mel feature normalization

We normalize the acoustic log mel features based on the global mean and variance recorded over the training dataset.

Record dataset stats

The script generate_mel_stats.py computes these statistics and stores them in /datasets/stats/<dataset_name+window_size> as PyTorch tensors. For example usage see:

scripts/make_json_artifacts.sh

Empirically, it was found that normalizing the input activations with dataset global mean and variance makes the early stage of training unstable. As such, the default behaviour is to move between two modes of normalization on a schedule during training.

This is controlled by the arguments in mel_feat_norm.py.

Validation

When running validation, the dataset global mean and variance are always used for normalization regardless of how far through the schedule the model is.

Backwards compatibility

Prior to v1.9.0, the per-utterance stats were used for normalization during training (and then streaming normalization was used during inference). To evaluate a model trained on <=v1.8.0, use the --norm_over_utterance flag to the val.sh script.

CAIMAN-ASR

Log-mel feature normalization

Record dataset stats

Training stability

Validation

Backwards compatibility