End of input (EOI) ‘token’

During training/validation if the append_EOI option is set to true in the configuration file, like:

input_val: # or input_train
  audio_dataset:
    append_EOI: true

Then a special EOI mel-spectrogram is appended to the end of each audio sample. This communicates to the model that the audio sample is about to terminate and allows the model to ‘dump’ any internally buffered predictions. Empirically this has proven to reduce WER.

During inference on the FPGA this token is appended at the close of stream automatically.

EOI-drop regularization

Previously, the un-warned termination incentivised early emissions to minimise the risk of not emitting in time. Hence, the EOI token increases emission latency. To counter this, a fraction of the training utterances randomly have their EOI tokens removed. This is controlled via the --eoi_drop_fraction argument and defaults to 0.5 (50%).