State-Transition Generalized Planning Pipeline

Problem Setup

Domain: Blocksworld

Objects: O = {A, B}

Predicates: P = {on(x,y), ontable(x), clear(x), handempty, holding(x)}

Operators: A = {pickup, putdown, stack, unstack}

Initial State (s₀)

A

B

→

Goal State (g)

B

A

Initial State Predicates

ontable(A)

ontable(B)

clear(A)

clear(B)

handempty

Goal Condition

on(A, B)

Stage 1: Symbolic Plan Generation

Using Fast Downward Planner

Configuration: A* search with landmark-cut heuristic (60s timeout)

s₀

Initial State

a₁

pickup(A)

s₁

Intermediate

a₂

stack(A, B)

s₂

Goal Reached!

Plan π = ⟨pickup(A), stack(A, B)⟩

Plan Length: 2 actions
Optimal: Yes
                

Key Point

The planner generates a valid action sequence, but we need the intermediate states for learning the transition model. That's what Stage 2 provides!

Stage 2: State Trajectory Reconstruction

Using VAL (Plan Validator)

VAL applies each action symbolically and outputs the complete state sequence

State s₀

A

B

ontable(A)

ontable(B)

clear(A)

clear(B)

handempty

→

State s₁

B

A

holding(A)

ontable(B)

clear(B)

holding(A)

→

State s₂ (Goal)

B

A

ontable(B)

on(A, B)

clear(A)

handempty

Key Point

Notice how predicates change between states:

s₀ → s₁: Deleted {ontable(A), clear(A), handempty}, Added {holding(A)}
s₁ → s₂: Deleted {clear(B), holding(A)}, Added {on(A,B), clear(A), handempty}

This sparse structure motivates the residual (delta) prediction approach!

Stage 3: Instance Learning Graph (ILG) Construction

Building the Relational Graph Gs₀,g = (V, E, Fcat, L)

We encode the state-goal pair (s₀, g) as a graph where nodes represent objects and predicates, and edges connect predicates to their arguments.

Interactive Graph Visualization for (s₀, g)

Legend:

Object Nodes

State Predicates (apn)

Goal Predicates (upg)

Node Categories

Object Nodes
A, B

State Predicates
ontable(A), ontable(B), clear(A), clear(B), handempty

Goal Predicates
on(A, B)

Edge List with Labels

Source (Predicate)	Target (Object)	Edge Label (Arg Position)
ontable(A)	A	1
ontable(B)	B	1
clear(A)	A	1
clear(B)	B	1
on(A, B)	A	1
on(A, B)	B	2

Key Point

Node Features (Fcat): Each node has a categorical feature:

"apn" = achieved & present in state, but not in goal
"upg" = unachieved & present in goal
"apg" = achieved & present in both state and goal

This rich encoding captures both current state and goal information!

Stage 4: Weisfeiler-Leman Color Refinement

Iterative Color Refinement (k=2 iterations)

WL algorithm iteratively refines node colors by aggregating neighborhood information, creating unique structural fingerprints.

Iteration 0: Initial Colors (based on Fcat)

Node	Initial Color c⁽⁰⁾
A	"object"
B	"object"
ontable(A)	("ontable", "apn")
ontable(B)	("ontable", "apn")
clear(A)	("clear", "apn")
clear(B)	("clear", "apn")
handempty	("handempty", "apn")
on(A, B)	("on", "upg")

Iteration 1: First Refinement

Formula: c⁽¹⁾(v) = HASH(c⁽⁰⁾(v), {{(c⁽⁰⁾(u), label(u,v)) | u ∈ neighbors(v)}})

Example for node A:

Own color: "object"

Neighbors: {ontable(A), clear(A), on(A,B)}

Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 1)}}

New color c⁽¹⁾(A): HASH("object", neighborhood) → "obj_A_config1"

Example for node B:

Own color: "object"

Neighbors: {ontable(B), clear(B), on(A,B)}

Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 2)}}

New color c⁽¹⁾(B): HASH("object", neighborhood) → "obj_B_config1"

📝 Important Observation

Notice that A and B get different colors in iteration 1, even though they both start as "object"!

This is because they have different roles in on(A, B):

A connects to on(A,B) at position 1
B connects to on(A,B) at position 2

This structural difference is captured by WL!

Iteration 2: Second Refinement

The process continues, using c⁽¹⁾ colors to compute c⁽²⁾. Each iteration creates more specific structural signatures.

After k=2 iterations, we collect all colors from all iterations:

C⁽²⁾ = {{
    c⁽⁰⁾(A), c⁽¹⁾(A), c⁽²⁾(A),
    c⁽⁰⁾(B), c⁽¹⁾(B), c⁽²⁾(B),
    c⁽⁰⁾(ontable(A)), c⁽¹⁾(ontable(A)), c⁽²⁾(ontable(A)),
    ... (all nodes across all iterations)
}}
                        

Stage 5: Fixed-Dimensional Embeddings

From Color Multisets to Vectors

We transform the color multiset C⁽²⁾ into a fixed-dimensional vector φ(s, g) ∈ ℝ⁵⁸⁷

Step 1: Vocabulary Construction (during training)

Collect all unique colors across ALL training graphs to form vocabulary V:

V = {color_1, color_2, color_3, ..., color_587}

Example colors in vocabulary:
- "object"
- "obj_A_config1"
- ("ontable", "apn")
- ("on", "upg")
- ... (587 unique color strings total for Blocksworld)
                    

Step 2: Histogram Embedding

For each graph, compute a count vector:

φ(s₀, g)[i] = COUNT(C⁽²⁾, color_i) for i ∈ [1, 587]

Visual representation (showing first 50 dimensions):

0

2

1

0

1

0

3

0

1

0

2

0

1

0

1

0

2

0

1

0

3

0

1

0

1

0

... continues for 587 total dimensions ...

Embeddings for Each State

Component	Dimension	Description
φ(s₀)	ℝ⁵⁸⁷	Embedding of initial state
φ(s₁)	ℝ⁵⁸⁷	Embedding after pickup(A)
φ(s₂)	ℝ⁵⁸⁷	Embedding of goal state
φ(g)	ℝ⁵⁸⁷	Embedding of goal condition

                    Key Properties of WL Embeddings
                    Size Invariant: Dimension D=587 is fixed, regardless of |O|. A 100-block problem also maps to ℝ⁵⁸⁷
Permutation Invariant: Swapping A↔B doesn't change the histogram counts
Sparse for STRIPS: Only a few dimensions change between states, perfect for delta prediction!

                

Training & Inference Pipeline

Training Data Construction

We create training examples from the state trajectory:

Training Example 1:
Input:  [φ(s₀), φ(g)]  // concatenated embeddings
Target (state mode):  φ(s₁)
Target (delta mode):  Δ₀ = φ(s₁) - φ(s₀)

Training Example 2:
Input:  [φ(s₁), φ(g)]
Target (state mode):  φ(s₂)
Target (delta mode):  Δ₁ = φ(s₂) - φ(s₁)
                

Model Architectures

LSTM (Parametric)

Layers: 2-layer LSTM
Hidden units: 256
Embedding dim: 32
Parameters: 927K
Loss (state): Cosine
Loss (delta): MSE

XGBoost (Non-parametric)

Type: Gradient boosting
Tree depth: 8
Learning rate: 0.1
Nodes: ~115K
Loss: MSE
Early stopping: 10 rounds

Inference: Neuro-Symbolic Plan Decoding (Algorithm 1)

Step 1: Encode State

Current symbolic state: s₀

Embed: φ(s₀), φ(g)

Step 2: Neural Prediction

Predict successor embedding:

v₀ = φ(s₀) + f_θ(φ(s₀), φ(g))

Step 3: Enumerate Successors

Symbolic successors:

Succ(s₀) = {
γ(s₀, pickup(A)),
γ(s₀, pickup(B))
}

Step 4: Nearest Neighbor

Find closest symbolic successor:

s₁ = argmin ||φ(s') - v₀||₂
     s'∈Succ(s₀)
                        

Step 5: Recover Action

Extract the action that led to s₁:

a₀ = pickup(A)

Step 6: Update & Repeat

Update current state:

s_current ← s₁

Append action to plan

Repeat until goal reached!

                    Why This Works
                    Symbolic Validity: Every action is guaranteed to be applicable because we enumerate from Succ(s)
Neural Guidance: The learned model steers search toward the goal
Error Correction: If neural prediction is slightly off, nearest-neighbor matching corrects it
Generalization: WL embeddings enable transfer to larger problem instances!

                

Complete Pipeline Summary

1

Symbolic Planning

2

State Trajectory

3

ILG Graph

4

WL Refinement

5

Embeddings

6

Learn & Infer

Key Results from the Paper

Blocksworld Extrapolation: WL-XGB (delta) achieves 0.45 success rate vs 0.13 for SATr
VisitAll Extrapolation: WL-XGB (delta) achieves 0.87 success rate vs 0.64 for SATr
Model Size: 115K nodes (XGBoost) vs 25-35M parameters (SATr)
Training Data: Competitive with orders of magnitude fewer training instances
Key Insight: Explicit transition modeling + size-invariant representations = sample-efficient generalization!