Problem Setup
Domain: Blocksworld
Objects: O = {A, B}
Predicates: P = {on(x,y), ontable(x), clear(x), handempty, holding(x)}
Operators: A = {pickup, putdown, stack, unstack}
Initial State (s₀)
Goal State (g)
Initial State Predicates
Goal Condition
Stage 1: Symbolic Plan Generation
Using Fast Downward Planner
Configuration: A* search with landmark-cut heuristic (60s timeout)
Key Point
The planner generates a valid action sequence, but we need the intermediate states for learning the transition model. That's what Stage 2 provides!
Stage 2: State Trajectory Reconstruction
Using VAL (Plan Validator)
VAL applies each action symbolically and outputs the complete state sequence
State s₀
State s₁
State s₂ (Goal)
Key Point
Notice how predicates change between states:
- s₀ → s₁: Deleted {ontable(A), clear(A), handempty}, Added {holding(A)}
- s₁ → s₂: Deleted {clear(B), holding(A)}, Added {on(A,B), clear(A), handempty}
This sparse structure motivates the residual (delta) prediction approach!
Stage 3: Instance Learning Graph (ILG) Construction
Building the Relational Graph Gs₀,g = (V, E, Fcat, L)
We encode the state-goal pair (s₀, g) as a graph where nodes represent objects and predicates, and edges connect predicates to their arguments.
Interactive Graph Visualization for (s₀, g)
Legend:
Node Categories
A, B
ontable(A), ontable(B), clear(A), clear(B), handempty
on(A, B)
Edge List with Labels
| Source (Predicate) | Target (Object) | Edge Label (Arg Position) |
|---|---|---|
| ontable(A) | A | 1 |
| ontable(B) | B | 1 |
| clear(A) | A | 1 |
| clear(B) | B | 1 |
| on(A, B) | A | 1 |
| on(A, B) | B | 2 |
Key Point
Node Features (Fcat): Each node has a categorical feature:
- "apn" = achieved & present in state, but not in goal
- "upg" = unachieved & present in goal
- "apg" = achieved & present in both state and goal
This rich encoding captures both current state and goal information!
Stage 4: Weisfeiler-Leman Color Refinement
Iterative Color Refinement (k=2 iterations)
WL algorithm iteratively refines node colors by aggregating neighborhood information, creating unique structural fingerprints.
Iteration 0: Initial Colors (based on Fcat)
| Node | Initial Color c⁽⁰⁾ |
|---|---|
| A | "object" |
| B | "object" |
| ontable(A) | ("ontable", "apn") |
| ontable(B) | ("ontable", "apn") |
| clear(A) | ("clear", "apn") |
| clear(B) | ("clear", "apn") |
| handempty | ("handempty", "apn") |
| on(A, B) | ("on", "upg") |
Iteration 1: First Refinement
Formula: c⁽¹⁾(v) = HASH(c⁽⁰⁾(v), {{(c⁽⁰⁾(u), label(u,v)) | u ∈ neighbors(v)}})
Own color: "object"
Neighbors: {ontable(A), clear(A), on(A,B)}
Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 1)}}
New color c⁽¹⁾(A): HASH("object", neighborhood) → "obj_A_config1"
Own color: "object"
Neighbors: {ontable(B), clear(B), on(A,B)}
Neighborhood multiset:
{{(("ontable", "apn"), 1), (("clear", "apn"), 1), (("on", "upg"), 2)}}
New color c⁽¹⁾(B): HASH("object", neighborhood) → "obj_B_config1"
📝 Important Observation
Notice that A and B get different colors in iteration 1, even though they both start as "object"!
This is because they have different roles in on(A, B):
- A connects to on(A,B) at position 1
- B connects to on(A,B) at position 2
This structural difference is captured by WL!
Iteration 2: Second Refinement
The process continues, using c⁽¹⁾ colors to compute c⁽²⁾. Each iteration creates more specific structural signatures.
After k=2 iterations, we collect all colors from all iterations:
Stage 5: Fixed-Dimensional Embeddings
From Color Multisets to Vectors
We transform the color multiset C⁽²⁾ into a fixed-dimensional vector φ(s, g) ∈ ℝ⁵⁸⁷
Embeddings for Each State
| Component | Dimension | Description |
|---|---|---|
| φ(s₀) | ℝ⁵⁸⁷ | Embedding of initial state |
| φ(s₁) | ℝ⁵⁸⁷ | Embedding after pickup(A) |
| φ(s₂) | ℝ⁵⁸⁷ | Embedding of goal state |
| φ(g) | ℝ⁵⁸⁷ | Embedding of goal condition |
Key Properties of WL Embeddings
- Size Invariant: Dimension D=587 is fixed, regardless of |O|. A 100-block problem also maps to ℝ⁵⁸⁷
- Permutation Invariant: Swapping A↔B doesn't change the histogram counts
- Sparse for STRIPS: Only a few dimensions change between states, perfect for delta prediction!
Training & Inference Pipeline
Training Data Construction
We create training examples from the state trajectory:
Inference: Neuro-Symbolic Plan Decoding (Algorithm 1)
Step 1: Encode State
Current symbolic state: s₀
Embed: φ(s₀), φ(g)
Step 2: Neural Prediction
Predict successor embedding:
Step 3: Enumerate Successors
Symbolic successors:
γ(s₀, pickup(A)),
γ(s₀, pickup(B))
}
Step 4: Nearest Neighbor
Find closest symbolic successor:
Step 5: Recover Action
Extract the action that led to s₁:
a₀ = pickup(A)
Step 6: Update & Repeat
Update current state:
s_current ← s₁
Append action to plan
Repeat until goal reached!
Why This Works
- Symbolic Validity: Every action is guaranteed to be applicable because we enumerate from Succ(s)
- Neural Guidance: The learned model steers search toward the goal
- Error Correction: If neural prediction is slightly off, nearest-neighbor matching corrects it
- Generalization: WL embeddings enable transfer to larger problem instances!
Key Results from the Paper
- Blocksworld Extrapolation: WL-XGB (delta) achieves 0.45 success rate vs 0.13 for SATr
- VisitAll Extrapolation: WL-XGB (delta) achieves 0.87 success rate vs 0.64 for SATr
- Model Size: 115K nodes (XGBoost) vs 25-35M parameters (SATr)
- Training Data: Competitive with orders of magnitude fewer training instances
- Key Insight: Explicit transition modeling + size-invariant representations = sample-efficient generalization!