Taxonomy On Incorporating Large Language Models in Planning Accepted at ICAPS 2024, 2025

Leaderboard for automated paper categorization methods

Classification Performance Comparison

This leaderboard compares automated approaches for categorizing LLM-planning papers. D₁ contains 126 papers (until Nov 2023), D₂ contains 47 papers (until Sep 2024). Single-label assigns one category per paper; multi-label allows multiple categories.

Key Finding: Decision Trees (DT) perform best among automated methods (F1: 0.349), but human-augmented classification achieves 0.83, demonstrating the value of expert review for identifying emerging categories like "Goal Decomposition" and "Replanning."

Classifier Name Single-Label Setup
D₁
Single-Label Setup
D₂
Multi-Label Setup
D₁
Multi-Label Setup
D₂
SVM 0.222 0.346 0.123 0.280
DT 0.124 0.258 0.233 0.349
RF 0.117 0.213 0.044 0.215
BERT 0.049 0.043 0.102 0.069
SciBERT 0.000 0.013 0.102 0.150
Human-augmented - - - 0.83