Visualization¶
GAICo includes functions to help you visualize comparison results, making it easier to understand model performance across different metrics. These functions are used internally by the Experiment Class when plot=True
is set in the compare()
method, but you can also use them directly for custom plotting needs.
The primary visualization functions are found in the gaico.visualize
module:
plot_metric_comparison()
: Generates a bar plot comparing different models based on a single metric.plot_radar_comparison()
: Generates a radar (spider) plot comparing multiple models across several metrics.
Prerequisites: These functions require matplotlib
, seaborn
, numpy
, and pandas
to be installed. GAICo attempts to import them, and will raise an ImportError
if they are not available.
Preparing Data for Plotting¶
Both plotting functions expect input data as a pandas DataFrame in a "long" format. This DataFrame typically has columns for model names, metric names, and scores. The gaico.utils.prepare_results_dataframe()
function is designed to convert a nested dictionary of scores into this format.
prepare_results_dataframe()
¶
This utility takes a dictionary where keys are model names and values are dictionaries of metric names to scores (or nested score dictionaries, like those from ROUGE or BERTScore).
from gaico.utils import prepare_results_dataframe
import pandas as pd
# Example raw scores (perhaps from direct metric calculations)
raw_scores_data = {
'Model_A': {'Jaccard': 0.8, 'ROUGE': {'rouge1': 0.75, 'rougeL': 0.70}, 'Levenshtein': 0.85},
'Model_B': {'Jaccard': 0.6, 'ROUGE': {'rouge1': 0.65, 'rougeL': 0.60}, 'Levenshtein': 0.70},
'Model_C': {'Jaccard': 0.9, 'ROUGE': {'rouge1': 0.80, 'rougeL': 0.78}, 'Levenshtein': 0.92}
}
# Convert to a long-format DataFrame
plot_df = prepare_results_dataframe(raw_scores_data)
print("Prepared DataFrame for Plotting:")
print(plot_df)
# Expected Output:
# Prepared DataFrame for Plotting:
# model_name metric_name score
# 0 Model_A Jaccard 0.80
# 1 Model_A ROUGE_rouge1 0.75
# 2 Model_A ROUGE_rougeL 0.70
# 3 Model_A Levenshtein 0.85
# 4 Model_B Jaccard 0.60
# 5 Model_B ROUGE_rouge1 0.65
# 6 Model_B ROUGE_rougeL 0.60
# 7 Model_B Levenshtein 0.70
# 8 Model_C Jaccard 0.90
# 9 Model_C ROUGE_rouge1 0.80
# 10 Model_C ROUGE_rougeL 0.78
# 11 Model_C Levenshtein 0.92
plot_df
is now ready to be used with the visualization functions.
1. Bar Plot: plot_metric_comparison()
¶
Use this function to compare models on a single, specific metric.
Key Parameters:¶
df
(pd.DataFrame
): The long-format DataFrame (fromprepare_results_dataframe
).metric_name
(str
): Required. The name of the metric to plot (e.g., "Jaccard", "ROUGE_rouge1").aggregate_func
(OptionalCallable
): Aggregation function if multiple scores exist per model for the chosen metric (e.g.,numpy.mean
). Defaults tonumpy.mean
.model_col
,score_col
,metric_col
(Optionalstr
): Names of columns for model, score, and metric. Defaults to "model_name", "score", "metric_name".title
,xlabel
,ylabel
(Optionalstr
): Plot customization.figsize
(Optionaltuple
): Figure size.axis
(Optionalmatplotlib.axes.Axes
): Existing Matplotlib Axes to plot on.
Example:¶
from gaico.visualize import plot_metric_comparison
import matplotlib.pyplot as plt # For plt.show()
import numpy as np # For np.mean (default aggregate_func)
# Assuming plot_df is available from the previous example
# Plot Jaccard scores
plot_metric_comparison(plot_df, metric_name="Jaccard", title="Jaccard Similarity Comparison")
plt.show()
# Plot ROUGE_rouge1 scores
plot_metric_comparison(plot_df, metric_name="ROUGE_rouge1", title="ROUGE-1 F1 Score Comparison")
plt.show()
2. Radar Plot: plot_radar_comparison()
¶
Use this function to get a multi-dimensional view of how models perform across several metrics simultaneously. Radar plots are most effective with 3 to 10-12 metrics.
Key Parameters:¶
df
(pd.DataFrame
): The long-format DataFrame.metrics
(OptionalList[str]
): A list of metric names (e.g., ["Jaccard", "ROUGE_rouge1", "Levenshtein"]) to include in the radar plot. IfNone
, all metrics present in thedf
for the models are used.aggregate_func
(OptionalCallable
): Aggregation function. Defaults tonumpy.mean
.model_col
,score_col
,metric_col
(Optionalstr
): Column names.title
(Optionalstr
): Plot title.figsize
(Optionaltuple
): Figure size.axis
(Optionalmatplotlib.axes.Axes
): Existing Matplotlib polar Axes to plot on.
Example:¶
from gaico.visualize import plot_radar_comparison
import matplotlib.pyplot as plt # For plt.show()
import numpy as np # For np.mean
# Assuming plot_df is available
# Define which metrics to include in the radar plot
metrics_for_radar = ["Jaccard", "ROUGE_rougeL", "Levenshtein"]
plot_radar_comparison(plot_df, metrics=metrics_for_radar, title="Overall Model Performance Radar")
plt.show()
# If you want to plot all available metrics (that have scores for the models)
# plot_radar_comparison(plot_df, title="Overall Model Performance Radar (All Metrics)")
# plt.show()
metrics_for_radar
, and each model (Model_A, Model_B, Model_C) is represented by a colored shape connecting its scores on these axes.
By using these visualization tools directly, you can create custom plots tailored to specific analyses or integrate them into larger reporting dashboards. Remember to have the necessary plotting libraries installed in your environment.