Keyword boosting
Keyword boosting is a technique to improve the recognition of domain-specific words/phrases and proper nouns. It works by boosting/suppressing the probability of specific tokens at inference time.
Keyword boosting is currently only available in the beam decoder. To use
keyword boosting create a json
file containing your keywords and their
corresponding boost values. The json
file should have the following format:
{
"keywords": {
"keyword": <exponential boost factor>,
}
}
Keywords are case and space sensitive, they should be formatted using the same character-set as the output of your decoder. The boost factors should be numeric values. The typical boost factors are in the range -1 to 1.
As an example to discuss how keyword boosting works, consider the following example (note: the empty spaces are important as the keywords are space sensitive):
{
"keywords": {
"car": 1.0,
" cat": 2.0,
" bat ": -1.0,
}
}
This would increase the probability of words containing the sequence car
;
increase the probability of words starting with cat
(more strongly than the
car
boost); and decrease the probability of the whole word bat
.
Picking the boost values is a domain-specific task that requires some trial and error.