Keyword boosting

Keyword boosting is a technique to improve the recognition of domain-specific words/phrases and proper nouns. It works by boosting/suppressing the probability of specific tokens at inference time.

Keyword boosting is currently only available in the beam decoder. To use keyword boosting create a json file containing your keywords and their corresponding boost values. The json file should have the following format:

{
  "keywords": {
    "keyword": <exponential boost factor>,
  }
}

Keywords are case and space sensitive, they should be formatted using the same character-set as the output of your decoder. The boost factors should be numeric values. The typical boost factors are in the range -1 to 1.

As an example to discuss how keyword boosting works, consider the following example (note: the empty spaces are important as the keywords are space sensitive):

{
  "keywords": {
    "car": 1.0,
    " cat": 2.0,
    " bat ": -1.0,
  }
}

This would increase the probability of words containing the sequence car; increase the probability of words starting with cat (more strongly than the car boost); and decrease the probability of the whole word bat.

Picking the boost values is a domain-specific task that requires some trial and error.

CAIMAN-ASR

Keyword boosting