Welcome to GAICo¶

GenAI Results Comparator (GAICo) is a Python library for comparing, analyzing, and visualizing outputs from Large Language Models (LLMs). It offers an extensible range of metrics, including standard text similarity scores and specialized metrics for structured data like planning sequences and time-series.
Check out our latest release updates!
Description¶
At its core, the library provides a set of metrics for evaluating various types of outputs—from plain text strings to structured data like planning sequences and time-series. These metrics produce normalized scores (typically 0 to 1), where 1 indicates a perfect match, enabling robust analysis and visualization of LLM performance.
Class Structure: All metrics are implemented as extensible classes inheriting from BaseMetric
. Each metric requires just one method: calculate()
.
The calculate()
method takes two main parameters:
generated_texts
: A single generated output or an iterable (list, numpy array, etc.) of outputs.reference_texts
: A single reference output or an iterable of outputs.
Important
Handling Missing References: If reference_texts
is None
or empty, GAICo will automatically use the first item from generated_texts
as the reference for comparison. A warning will be printed to the console.
Note
Batch Processing: When you provide iterables as input, calculate()
assumes a one-to-one mapping between generated and reference items. If a single reference is provided for multiple generated items, it will be broadcasted for comparison against each one.
Note
Optional Dependencies: The standard pip install gaico
is lightweight. Some metrics with heavy dependencies (like BERTScore
or JSDivergence
) require optional installation.
Inspiration: The design and evaluation metrics are inspired by Microsoft's article on evaluating LLM-generated content. GAICo currently focuses on reference-based metrics.

Features¶
- Comprehensive Metric Library:
- Textual Similarity: Jaccard, Cosine, Levenshtein, Sequence Matcher.
- N-gram Based: BLEU, ROUGE, JS Divergence.
- Semantic Similarity: BERTScore.
- Structured Data: Specialized metrics for planning sequences (
PlanningLCS
,PlanningJaccard
) and time-series data (TimeSeriesElementDiff
,TimeSeriesDTW
). - Streamlined Evaluation Workflow:
- A high-level
Experiment
class to easily compare multiple models, apply thresholds, generate plots, and create CSV reports. - Powerful Visualization:
- Generate bar charts and radar plots to compare model performance using Matplotlib and Seaborn.
- Efficient & Flexible:
- Supports batch processing for efficient computation on datasets.
- Optimized for various input types (lists, NumPy arrays, Pandas Series).
- Easily extensible architecture for adding new custom metrics.
- Robust and Reliable:
- Includes a comprehensive test suite using Pytest.
Installation¶
Important
We strongly recommend using a Python virtual environment to manage dependencies and avoid conflicts with other packages.
GAICo can be installed using pip.
- Create and activate a virtual environment (e.g., named
gaico-env
):
# For Python 3.10+
python3 -m venv gaico-env
source gaico-env/bin/activate # On macOS/Linux
# gaico-env\Scripts\activate # On Windows
- Install GAICo: Once your virtual environment is active, install GAICo using pip:
pip install gaico
This installs the core GAICo library.
Using GAICo with Jupyter Notebooks/Lab¶
If you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the same activated virtual environment:
# (Ensure your 'gaico-env' is active)
pip install notebook # For Jupyter Notebook
# OR
# pip install jupyterlab # For JupyterLab
Then, launch Jupyter from the same terminal where your virtual environment is active:
# (Ensure your 'gaico-env' is active)
jupyter notebook
# OR
# jupyter lab
New notebooks created in this session should automatically use the gaico-env
Python environment. For troubleshooting kernel issues, please see our FAQ document.
Optional Installations¶
The default pip install gaico
is lightweight. Some metrics require extra dependencies, which you can install as needed.
- To include the JSDivergence metric (requires SciPy and NLTK):
pip install 'gaico[jsd]'
- To include the CosineSimilarity metric (requires scikit-learn):
pip install 'gaico[cosine]'
- To include the BERTScore metric (which has larger dependencies like PyTorch):
pip install 'gaico[bertscore]'
- To install with all optional features:
pip install 'gaico[jsd,cosine,bertscore]'
Tip
The dev
extra, used for development installs, also includes all optional features.
Installation Size Comparison¶
The following table provides an estimated overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.
Note: Core dependencies include: levenshtein
, matplotlib
, numpy
, pandas
, rouge-score
, and seaborn
.
Installation Command | Dependencies | Estimated Total Size Impact |
---|---|---|
pip install gaico |
Core | 215 MB |
pip install 'gaico[jsd]' |
Core + scipy , nltk |
310 MB |
pip install 'gaico[cosine]' |
Core + scikit-learn |
360 MB |
pip install 'gaico[bertscore]' |
Core + bert-score (includes torch , transformers , etc.) |
800 MB |
pip install 'gaico[jsd,cosine,bertscore]' |
Core + all dependencies from above | 960 MB |
For Developers (Installing from source)¶
If you want to contribute to GAICo or install it from source for development:
-
Clone the repository:
git clone https://github.com/ai4society/GenAIResultsComparator.git cd GenAIResultsComparator
-
Set up a virtual environment and install dependencies:
We recommend using UV for fast environment and dependency management.
# Create a virtual environment (Python 3.10-3.12 recommended) uv venv # Activate the environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install in editable mode with all development dependencies uv pip install -e ".[dev]"
If you prefer not to use
uv
, you can usepip
:# Create a virtual environment (Python 3.10-3.12 recommended) python3 -m venv .venv # Activate the environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the package in editable mode with development extras pip install -e ".[dev]"
The
dev
extra installs GAICo with all optional features, plus dependencies for testing, linting, and documentation. -
Set up pre-commit hooks (recommended for contributors):
Pre-commit hooks help maintain code quality by running checks automatically before you commit.
pre-commit install
Citation¶
If you find this project useful, please consider citing it in your work:
@software{AI4Society_GAICo_GenAI_Results,
author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},
license = {MIT},
title = {{GAICo: GenAI Results Comparator}},
year = {2025},
url = {https://github.com/ai4society/GenAIResultsComparator}
}
Acknowledgments¶
- The library is developed by Nitin Gupta, Pallav Koppisetti, and Biplav Srivastava. Members of AI4Society contributed to this tool as part of ongoing discussions. Major contributors are credited.
- This library uses several open-source packages including NLTK, scikit-learn, and others. Special thanks to the creators and maintainers of the implemented metrics.
Contact¶
If you have any questions, feel free to reach out to us at ai4societyteam@gmail.com.