N-gram Metrics¶
GAICo provides several n-gram-based metrics for evaluating text similarity and quality.
These metrics are useful for tasks such as machine translation evaluation, text summarization, and general text comparison.
gaico.metrics.ngram_metrics.BLEU ¶
Bases: TextualMetric
BLEU (Bilingual Evaluation Understudy) score implementation. This class provides methods to calculate BLEU scores for individual sentence pairs and for batches of sentences. It uses the NLTK library to calculate BLEU scores.
__init__ ¶
__init__(n=4, smoothing_function=None)
Initialize the BLEU scorer with the specified parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
The max n-gram order to use for BLEU calculation, defaults to 4 |
4
|
smoothing_function
|
Optional[Callable]
|
The smoothing function to use for BLEU, defaults to SmoothingFunction.method1 from NLTK |
None
|
gaico.metrics.ngram_metrics.ROUGE ¶
Bases: TextualMetric
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score implementation using the rouge_score
library.
__init__ ¶
__init__(rouge_types=None, use_stemmer=True, **kwargs)
Initialize the ROUGE scorer with the specified ROUGE types and parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rouge_types
|
Optional[List[str]]
|
The ROUGE types to calculate, defaults to None Should be one of "rouge1", "rouge2", or "rougeL" in a list to return a single F1 score of that type. If multiple types are provided in a list, the output will be a dictionary of F1 scores for each type. Defaults is None which returns a dictionary of all scores. Equivalent of passing ["rouge1", "rouge2", "rougeL"] |
None
|
use_stemmer
|
bool
|
Whether to use stemming for ROUGE calculation, defaults to True |
True
|
kwargs
|
Any
|
Additional parameters to pass to the ROUGE calculation, defaults to None Default only passes the |
{}
|