State-Transition Generalized Planning Pipeline

Interactive Visualization of Blocksworld Example

Problem Setup

Domain: Blocksworld

Objects: O = {A, B}

Predicates: P = {on(x,y), ontable(x), clear(x), handempty, holding(x)}

Operators: A = {pickup, putdown, stack, unstack}

Initial State (s₀)

A
B

Goal State (g)

B
A

Initial State Predicates

ontable(A)
ontable(B)
clear(A)
clear(B)
handempty

Goal Condition

on(A, B)

Stage 1: Symbolic Plan Generation

Using Fast Downward Planner

Configuration: A* search with landmark-cut heuristic (60s timeout)

s₀
Initial State
a₁
pickup(A)
s₁
Intermediate
a₂
stack(A, B)
s₂
Goal Reached!
Plan π = ⟨pickup(A), stack(A, B)⟩ Plan Length: 2 actions Optimal: Yes

Key Point

The planner generates a valid action sequence, but we need the intermediate states for learning the transition model. That's what Stage 2 provides!

Stage 2: State Trajectory Reconstruction

Using VAL (Plan Validator)

VAL applies each action symbolically and outputs the complete state sequence

State s₀

A
B
ontable(A)
ontable(B)
clear(A)
clear(B)
handempty

State s₁

B
A
holding(A)
ontable(B)
clear(B)
holding(A)

State s₂ (Goal)

B
A
ontable(B)
on(A, B)
clear(A)
handempty

Key Point

Notice how predicates change between states:

  • s₀ → s₁: Deleted {ontable(A), clear(A), handempty}, Added {holding(A)}
  • s₁ → s₂: Deleted {clear(B), holding(A)}, Added {on(A,B), clear(A), handempty}

This sparse structure motivates the residual (delta) prediction approach!

Stage 3: Instance Learning Graph (ILG) Construction

Building the Relational Graph Gs₀,g = (V, E, Fcat, L)

We encode the state-goal pair (s₀, g) as a graph where nodes represent objects and predicates, and edges connect predicates to their arguments.

Interactive Graph Visualization for (s₀, g)

Legend:

Object Nodes
State Predicates (apn)
Goal Predicates (upg)

Node Categories

Object Nodes
A, B
State Predicates
ontable(A), ontable(B), clear(A), clear(B), handempty
Goal Predicates
on(A, B)

Edge List with Labels

Source (Predicate) Target (Object) Edge Label (Arg Position)
ontable(A)A1
ontable(B)B1
clear(A)A1
clear(B)B1
on(A, B)A1
on(A, B)B2

Key Point

Node Features (Fcat): Each node has a categorical feature:

  • "apn" = achieved & present in state, but not in goal
  • "upg" = unachieved & present in goal
  • "apg" = achieved & present in both state and goal

This rich encoding captures both current state and goal information!

Stage 4: Weisfeiler-Leman Color Refinement

Iterative Color Refinement (k=2 iterations)

WL algorithm iteratively refines node colors by aggregating neighborhood information, creating unique structural fingerprints.

Iteration 0: Initial Colors (based on Fcat)

Node Initial Color c⁽⁰⁾
A"object"
B"object"
ontable(A)("ontable", "apn")
ontable(B)("ontable", "apn")
clear(A)("clear", "apn")
clear(B)("clear", "apn")
handempty("handempty", "apn")
on(A, B)("on", "upg")

Iteration 1: First Refinement

Formula: c⁽¹⁾(v) = HASH(c⁽⁰⁾(v), {{(c⁽⁰⁾(u), label(u,v)) | u ∈ neighbors(v)}})

Example for node A:

Own color: "object"

Neighbors: {ontable(A), clear(A), on(A,B)}

Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 1)}}

New color c⁽¹⁾(A): HASH("object", neighborhood) → "obj_A_config1"

Example for node B:

Own color: "object"

Neighbors: {ontable(B), clear(B), on(A,B)}

Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 2)}}

New color c⁽¹⁾(B): HASH("object", neighborhood) → "obj_B_config1"

📝 Important Observation

Notice that A and B get different colors in iteration 1, even though they both start as "object"!

This is because they have different roles in on(A, B):

  • A connects to on(A,B) at position 1
  • B connects to on(A,B) at position 2

This structural difference is captured by WL!

Iteration 2: Second Refinement

The process continues, using c⁽¹⁾ colors to compute c⁽²⁾. Each iteration creates more specific structural signatures.

After k=2 iterations, we collect all colors from all iterations:

C⁽²⁾ = {{ c⁽⁰⁾(A), c⁽¹⁾(A), c⁽²⁾(A), c⁽⁰⁾(B), c⁽¹⁾(B), c⁽²⁾(B), c⁽⁰⁾(ontable(A)), c⁽¹⁾(ontable(A)), c⁽²⁾(ontable(A)), ... (all nodes across all iterations) }}

Stage 5: Fixed-Dimensional Embeddings

From Color Multisets to Vectors

We transform the color multiset C⁽²⁾ into a fixed-dimensional vector φ(s, g) ∈ ℝ⁵⁸⁷

Step 1: Vocabulary Construction (during training)

Collect all unique colors across ALL training graphs to form vocabulary V:

V = {color_1, color_2, color_3, ..., color_587} Example colors in vocabulary: - "object" - "obj_A_config1" - ("ontable", "apn") - ("on", "upg") - ... (587 unique color strings total for Blocksworld)

Step 2: Histogram Embedding

For each graph, compute a count vector:

φ(s₀, g)[i] = COUNT(C⁽²⁾, color_i) for i ∈ [1, 587]

Visual representation (showing first 50 dimensions):

0
0
2
1
0
0
1
0
3
0
0
0
1
0
0
2
0
0
1
0
0
0
0
1
0
0
0
0
2
0
0
1
0
0
0
0
0
3
0
0
0
0
1
0
0
0
1
0
0
0

... continues for 587 total dimensions ...

Embeddings for Each State

Component Dimension Description
φ(s₀)ℝ⁵⁸⁷Embedding of initial state
φ(s₁)ℝ⁵⁸⁷Embedding after pickup(A)
φ(s₂)ℝ⁵⁸⁷Embedding of goal state
φ(g)ℝ⁵⁸⁷Embedding of goal condition

Key Properties of WL Embeddings

  • Size Invariant: Dimension D=587 is fixed, regardless of |O|. A 100-block problem also maps to ℝ⁵⁸⁷
  • Permutation Invariant: Swapping A↔B doesn't change the histogram counts
  • Sparse for STRIPS: Only a few dimensions change between states, perfect for delta prediction!

Training & Inference Pipeline

Training Data Construction

We create training examples from the state trajectory:

Training Example 1: Input: [φ(s₀), φ(g)] // concatenated embeddings Target (state mode): φ(s₁) Target (delta mode): Δ₀ = φ(s₁) - φ(s₀) Training Example 2: Input: [φ(s₁), φ(g)] Target (state mode): φ(s₂) Target (delta mode): Δ₁ = φ(s₂) - φ(s₁)

Model Architectures

LSTM (Parametric)

  • Layers: 2-layer LSTM
  • Hidden units: 256
  • Embedding dim: 32
  • Parameters: 927K
  • Loss (state): Cosine
  • Loss (delta): MSE

XGBoost (Non-parametric)

  • Type: Gradient boosting
  • Tree depth: 8
  • Learning rate: 0.1
  • Nodes: ~115K
  • Loss: MSE
  • Early stopping: 10 rounds

Inference: Neuro-Symbolic Plan Decoding (Algorithm 1)

Step 1: Encode State

Current symbolic state: s₀

Embed: φ(s₀), φ(g)

Step 2: Neural Prediction

Predict successor embedding:

v₀ = φ(s₀) + f_θ(φ(s₀), φ(g))

Step 3: Enumerate Successors

Symbolic successors:

Succ(s₀) = {
  γ(s₀, pickup(A)),
  γ(s₀, pickup(B))
}

Step 4: Nearest Neighbor

Find closest symbolic successor:

s₁ = argmin ||φ(s') - v₀||₂ s'∈Succ(s₀)

Step 5: Recover Action

Extract the action that led to s₁:

a₀ = pickup(A)

Step 6: Update & Repeat

Update current state:

s_current ← s₁

Append action to plan

Repeat until goal reached!

Why This Works

  • Symbolic Validity: Every action is guaranteed to be applicable because we enumerate from Succ(s)
  • Neural Guidance: The learned model steers search toward the goal
  • Error Correction: If neural prediction is slightly off, nearest-neighbor matching corrects it
  • Generalization: WL embeddings enable transfer to larger problem instances!

Complete Pipeline Summary

1
Symbolic Planning
2
State Trajectory
3
ILG Graph
4
WL Refinement
5
Embeddings
6
Learn & Infer

Key Results from the Paper

  • Blocksworld Extrapolation: WL-XGB (delta) achieves 0.45 success rate vs 0.13 for SATr
  • VisitAll Extrapolation: WL-XGB (delta) achieves 0.87 success rate vs 0.64 for SATr
  • Model Size: 115K nodes (XGBoost) vs 25-35M parameters (SATr)
  • Training Data: Competitive with orders of magnitude fewer training instances
  • Key Insight: Explicit transition modeling + size-invariant representations = sample-efficient generalization!