Classical Planning in Deep Latent Space: From Unlabelled Images to PDDL (and back)

Masataro Asai, Alex Fukunaga, The University of Tokyo

20+ min

Made by guicho2.71828 (Masataro Asai)

1

Hello, Machine Learning people!

First of all, hello machine learning people. Im not a machine learning person. Im usually studying a type of symbolic AI and I visited here expecting an interesting conversation with machine learning professionals.

2

Hello, Logic Programming, Data Mining, AGI, Theoretical CS people, too!

And also, hello other symbolic AI people. Although I am not doing exactly the same thing, we have common roots.

Let me give you one question before presenting my paper…

3 How many people have learned AI with these books?

Nice.

4

I am from Planning community

International Conference on Automated Planning and Scheduling

(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)

5 Planning people are working on Automated Planners

Compute the high-level plan for an agent: plan = (action₁, action₂, … action_n)
- Search-based planning projects many steps further into the future than RL
- ↔ reflex (0-step), limited deliberation (1,2-steps)
High-dimensional, implicit state space search
- Entire space does not fit in memory, only transition rules are given
STRIPS-like PDDL modelling language, based on
- 1st-order logic, close-world assumption
Soundness, completeness, optimality matters
- Expensive Applications: Satellite operations (Deep Space 1), Mars Exploration Rover (MER), underwater vehicles

6 High-level Planning: Sliding tile puzzle (a.k.a 8-puzzle)

Slide a tile to an empty place (each step)
→ make the tile placement identical to the goal
→ (abstracts the low level control e.g. arm motion)

Search states represented by conjunctions of 1st-order logic (objects, predicates, propositions)

(empty x0 y0)     ; or, empty(x0, y0)
(is panel6 x1 y0) ; or, is(panel6, x1, y0)
(up    y0 y1)... (down  y1 y0)...
(right x0 x1)... (left  x0 x1)...

initial state

goal state

To symbolic systems names do not matter (just matters to human) as long as there is consistency

(car   banana0 kiwi0)           ; numbers (0,1) don't matter either
(shark bogo6 banana1 kiwi0)     ; just to make sense for us
(apple  kiwi0 kiwi1)... (pinapple kiwi1 kiwi0)...
(pen    banana0 banana1)... (applepen banana0 banana1)...

6.1 High-level Planning Task: State Transitions

How to represent "sliding up tile 7" ? → actions

Transition Rules of "sliding up":

You can slide up a tile only to an empty location.

When you slide it, the orignal location becomes empty.

New location is no longer empty, is occupied by the tile.

initial state

When Empty(x, y_old) ∧ is(panel, x, y_new) ∧ up(y_new, y_old) ;

then ¬ Empty(x,y_old) ∧ Empty(x,y_new) ∧ ¬ is(panel, x, y_new) ∧ is(panel, x, y_old)

;; Translates to a PDDL model below:
(:action slide-up ...
 :precondition (and (empty ?x ?y-old)
                    (is ?panel ?x ?y-new) ...)
 :effects (and (not (empty ?x ?y-old))     (empty ?x ?y-new)
               (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))

But where does this representation come from?

6.2 Solving these problems: Heuristic state-space search

6.3 High-level Planning Task: State Transitions

How to represent "sliding up tile 7" ? → actions

Transition Rules of "sliding up":

You can slide up a tile only to an empty location.

When you slide it, the orignal location becomes empty.

New location is no longer empty, is occupied by the tile.

initial state

When Empty(x, y_old) ∧ is(panel, x, y_new) ∧ up(y_new, y_old) ;

then ¬ Empty(x,y_old) ∧ Empty(x,y_new) ∧ ¬ is(panel, x, y_new) ∧ is(panel, x, y_old)

;; Translates to a PDDL model below:
(:action slide-up ...
 :precondition (and (empty ?x ?y-old)
                    (is ?panel ?x ?y-new) ...)
 :effects (and (not (empty ?x ?y-old))     (empty ?x ?y-new)
               (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))

But where does this representation come from?

6.4 PDDL representations come from human encoding.

Knowledge-Acquisition Bottleneck (Cullen, 1988):

The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems

Symbols: Labels for identifiable entities (it's not just about objects!)

Types of symbols
Object symbols	panel7, x₀, y₀ …
Predicate symbols	(empty ?x ?y) (up ?y₀ ?y₁)
Propositions	p₂₈ = (empty x₀ y₀) — application
Action symbols	(slide-up panel₇ x₀ y₁)

Action Model: Description of Actions using Symbols

　

When Empty(x, y_old) ∧ … ; Then /¬ Empty(x,y_old) ∧ / …

The knowledge acquisition bottleneck: time for reassessment? : Cullen, J and Bryman, A Expert Syst. Vol 5 No 3 (August 1988) pp 216-225

6.5 Can we solve an 8-puzzle optimally, w/o human encoding?

3x3 Image-based Sliding Tile Puzzle: maximum 31 steps optimal plan
4x4、5x5 : infeasible under blind search (memory exhaust)
Symbolic systems can compute optimal solutions w/ A* + admissible heuristics

BUT

WE

DO

NOT

HAVE

SYMBOLS

NOR

AN ACTION MODEL!

7 Backgrounds

Survey of Exisiting Action Model Acquisition Techniques

i.e. Systems that find action models

7.1 Limitations of Existing Systems

So far, ALL existing AMA systems require symbolic / near-symbolic, accurate state inputs and/or discrete action labels.

i.e. They need symbols to find an action model

7.2 ARMS (Action-Relation Modelling System) (Yang AIJ07)

Taking the symbolic inputs

7.3 LOCM (ICAPS09)

Taking the symbolic inputs

7.4 Framer (ICAPS17)

Near-Symbols : Parses natural language sentences with a clear grammatical structure.

Alleviates the burden of domain experts, but still requires human
Not handling "Natural Language":

Pick up that parcel over there … yeah, it has a label on it, it says Parcel1, you can see it from here, the Location B. Then put it in the car, I mean the truck, the red one.

7.5 Konidaris, Kaelbring (AAAI14, IJCAI15)

"Constructing Symbolic Representations for High-Level Planning" (AAAI14)

What it does

Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation

i.e. Model-to-Model conversion, not generating a model from the scratch

Semi-MDP contains Action Labels

move and interact (Playroom)

Sensor inputs are structured (Labels for "State Variable" are known)

x/y-distance, light level, whether a monkey cries

→ Each sensor has a distinct meaning (no overwrap)

7.6 Learning from Observation (Argall AIJ09, Mourao UAI12) skip

Noisy, incomplete, but symbolic states/actions

7.7 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16) skip

Handles Images, but with strong assumptions (almost symbol)

e.g. Understands Tic-Tac-Toe with 3x3 Ellipse Detector (Bardu 10)

Almost immediately provides propositions

Domain-dependent ("3x3 grid" is hard-coded)

8 Problem Description

We do not have SYMBOLS nor ACTION MODELS

So we cannot apply existing AMA methods!

8.1 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

8.2 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

8.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

No Prior Knowledge : Domain-independent Image-based planner

Tower of Hanoi

Lights-Out

8.4 Input1: Training Inputs – Image Pairs

Image pairs showing the before/after states of valid actions (randomly sampled)

8.5 Input1: Training Inputs – Image Pairs

8.6 Input2: Planning Inputs – Initial Image & Goal Image

Visual depiction of the initial state and a single goal state

8.7 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

8.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

9 Our Core Contribution : State AutoEncoder (SAE)

SAE is a neural network which provides two functions:

$b = Encode(r)$: Function that maps a raw datum $r\;$to a bit vector $b\;$(propositions)
$\tilde{r} = Decode(b)$: Function that maps a bit vector $b\;$to a raw datum $\tilde{r}$
A bidirectional mapping between subsymbolic & symbolic representation

9.1 AutoEncoder : Unsupervised Learning

Auto = "self" — Autoencoding = "encoding itself"

Target Function: Identity $x=f^*(x)$
Compression: $X \rightarrow Z \rightarrow X$
Map the input $x\;$to a latent vector $z$, then back to $x$
Extract the de/compressor: $Encode: x \mapsto z, Decode: z \mapsto x$

→ However, ✘ Latent vector is real-valued

INCOMPATIBLE to classical planning

9.2 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

A reparametrization trick for categorical distribution: 1-hot vector e.g. $\langle 0,0,0,1,0,0 \rangle$.

Below: Represent an MNIST image with 30 variables of 8 categories.

Key idea: These categorical variables are directly usable

as a source of PDDL/SAS

In particular, 2 categories → propositional variables (true/false)

9.3 State Autoencoder (before training)

9.4 State Autoencoder (after training)

10 Using SAE, we propose LatPlan, an architechture

and LatPlanα, an implementation

10.1 Step 1: State Autoencoder (Core Contribution)

A neural network bridging the Symbolic/Subsymbolic boundary

10.2 Step 2: Action Model Acquisition (NOT our contribution)

Bitvectors → PDDL domain description

10.2.1 Our PDDL generation in LatPlan α (NOT our contribution)

Placeholder for existing/future Action Model Acquisition methods

Individual actions are mapped to PDDL actions (trivial Actiom Model Acquisition)

 0011 → 0101

　　　↓

(:action ...
 :precondition (and (b0-false) (b1-false) (b2-true) (b3-true)) ; before-state
 :effect       (and (not (b1-false)) (b1-true)    ; state diff
                    (not (b2-true))  (b2-false)))

Conversion from a bit to a proposition:

$i$-th bit is 1 → Proposition ($b_i$-true)

$i$-th bit is 0 → Proposition ($b_i$-false)

Disclaimer: We have new AMA method based on NN (we do not disclose it yet)

10.2.2 Example PDDL

(define (domain latent)
 (:requirements :strips)
 (:predicates (b0-true) (b0-false) (b1-true) ... (b24-false))

 (:action a10000010010110111100011111000010001011111110011111
  :parameters () :precondition
  (and (b0-true) (b1-false) (b2-false) ... (b24-true))
  :effect (and (not (b5-false))  (b5-true)
               (not (b6-true))   (b6-false)
               (not (b13-false)) (b13-true)
               (not (b20-false)) (b20-true)))

 (:action a10000010010110111100011110000001001011011110001110
  ...

10.2.3 Step 2: Action Model Acquisition (NOT our contribution)

Bitvectors → PDDL domain description

10.3 Step 3: Solve the PDDL instance w/ off-the-shelf planner

10.4 Step 4: Executing the symbolic plan

We get an incomprehensive state sequence

10.5 Step 5: Mapping the plan to images (human-comprehensive)

11 Results (MNIST 8-puzzle)

8-puzzle using digits from MNIST database

An instance whose the optimal solution length is known

→ 31 step optimal plan

11.1 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

MNIST 8-puzzle has cleanly separated objects -> This domain does not.

11.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

→ Optimal Solution

11.3 Tower of Hanoi (3 disks, 4 disks)

Completely different puzzle problem can be solved with no change

→ Optimal Solution (7 steps,15 steps)

11.4 Lights Out

Completely different puzzle problem can be solved with no change

→ Optimal Solution

11.5 Twisted Lights Out

Does not assume grid-like structures

→ Optimal Solution

11.6 Handling the Noisy Input

→ Optimal Solutions

12 Why bother the off-the-shelf planner? Shouldn't the blind search do?

Domain-independent heuristics works, SURPRISINGLY!

This is NOT a trivial finding!

Heuristics are…
- taylored for man-made domains
- assuming a certain structure
PDB may even sometimes lose to Blind on IPC instances (Edelkamp 12)
Would allow to solve the more difficult problems (future work)

Expanded nodes	Dijkstra	A*+PDB	Speedup
MNIST 8-puzzle	193924	109096	x2
MNIST 8-puzzle	201156	111642	x2
MNIST 8-puzzle	186767	84561	x2
MNIST 8-puzzle	183336	82518	x2
MNIST 8-puzzle	169907	52084	x3
MNIST 8-puzzle	130863	26967	x5
Hanoi (4 peg)	55	17	x3
LightsOut (4x4)	952	27	x30
Spiral LightsOut (3x3)	522	214	x2.5
Mandrill 8-puzzle	335378	88851	x4

→ Leverage the effort from ICAPS community!

13 Conclusion

Input: Unlabelled pairs of images, initial image, goal image
Output: Visualized plans to achieve the goal
State AutoEncoder(SAE): VAE with Gumbel-Softmax
Contribution : SAE generates propositions from raw data

→ compatible to the symbolic KE systems, classical planners
Latplan α : A simplified prototype

→ Can leverege from the development of both symbolic/subsymbolic research

13.1 Future Work (input format)

LatPlan is an architecture : Any system with SAE is a LatPlan implementation

Different SAE allows LatPlan to reason about different types of raw data

Autoencoders for unstructured text [Li et.al. 2015], audio [Deng, Li, et al. 2010]
Examples:
- This is an apple, this is a pen → oh, ApplePen! (ugh embarassing)
  
  when actions resembling "word concatenation" was learned
SAE + Automated Planning will bring AI to a whole new level

e.g. 1000 steps of optimal reasoning over natural language
Hire me! I'm finding a Postdoc position ☺

"A hierarchical neural autoencoder for paragraphs and documents." (2015)

"Binary coding of speech spectrograms using a deep auto-encoder." (2010)

14 Appendix

14.1 I expect mixed responses such as:

Wait, what!? You solved the symbol grounding!? (No)
Huh, I hate deep learning hype, NN cannot be trusted.

(LatPlan can be trusted)
This is not an action model acquisition / not finding symbols.

(We don't do AML; we find different symbols)

14.2 Did we find symbols? It doesn't sound like what I think symbols are

You solved the symbol grounding!? / This is not finding symbols. / This isn't a action model acquisition.

PDDL implies there are several kinds of symbols and we solved only one issue

Each issue requires a different approach. LatPlan should be combined with these work.

Types of symbols addressed by

Propositions SAE

Objects R-CNN (Computer Vision)

Predicates ???

Actions ???

Types of symbols	addressed by
Propositions	SAE
Objects	R-CNN (Computer Vision)
Predicates	???
Actions	???

14.3 Did we find symbols? Why not individual pixels? Why DL?

Systems based on individual pixels lack generalization

Noise / variations can make the data entirely different
must acquire the generalized features
= a nonlinear function that recognize the entanglements between multiple pixels

14.4 Learning vs Planning

Main differences: Purposes and the abstraction layer

Machine Learning, Neural Networks

for Recognition, Reflex

Subsymbolic Input (continuous)

Images, Audio, unstructured text:
Soft Intelligence:

　 Reflex Agent, Immediate actions

Pavlov's dog : bell → drool

Autonomous Driving : Pedestrian → Stop.

Machine Translation : Sentence → Sentence

Eval. Function for Go : board → win-rate

☺ Efficient 1-to-1 mapping

☹ Simple tasks

Deliberation, Search

for Planning, Game, Theorem Proving

Symbolic Input/Output

Logic, objects, induction rules
Hard Intelligence by Logic:

　 Multi-step strategies

Rescue Robot : actions → help the surviver

Theorem Proving : theorems → QED

Compiler : x86 instructions

Game of Go : stones → Win

☺ Ordering constraint + complex tasks

AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)

14.5 Human-Competitive Systems

AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)

However, domain-specific – specialized in Go, "Grids" / "Stones" are known
Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
Is supervised learning necessary for human?

True intelligence should search / collect data by itself

DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)

Domain-independent Atari Game solver (Invader, Packman…), however:

RL Acting: Greedily follow the learned policy → no deliberation!
You can survive most Atari games by reflex

14.6 Latplan Advantages

Perception based on DLNN

— Robust systems augmented by the latest DL tech

Decision Making based on Classical Planning

— Better Theoretical Guarantee than Reinforcement Learning

Completeness (Finds solution whenever possible), Solution Optimality

— Decision Making Independent from Learning

Unsupervised (No data required), Explainable (Search by logic)

14.7 When Latplan returns a wrong solution?

Machine learning may contain errors (convergence only on $t\rightarrow \infty$, not on real time)

Images → Fraud symbols/model/graph
Optimal path on a fraud graph or graph disconnected

A* completeness, soundness, optimality (admissible heuristics)
Fraud visualized plan (noisy) / no plan found

LatPlan may make wrong observations but no wrong decisions

BTW, "correctness" is defined by error prone observations by humans anyways …

(completeness, optimality) → better reliablility than Reinforcement Learning

14.7.1 Reinforcement Learning

Not only perception but decision making also depends on training

Each training result does not have admissibility
When the learned policy is wrong, the solution could be suboptimal

… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.

Cade Metz. "In Two Moves, that Redifined the Future." Wired, 2016

RL may make wrong decisions.

14.8 Future Work (SAE)

SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)

1st-order logic (predicate $p(a,b)$)
We need object recognition from images (parameters $a,b$)
SAE with networks for object recognition (e.g. R-CNN) should achieve this

14.9 SAE implementation in LatPlan α

Keras, Adam optimizer (learning rate:0.001)

1764(42x42)

[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2

→FC(49,GumbelSoftmax) (variational loss)

[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2

→1764(42x42) (loss: Binary crossentropy)

Why full-connected layers ?

Main focus is on whether propositions made by SAE are feasible

→No complication due to CNN, ResNet, etc… (but I have a working implementation)

Training on 8-puzzle

12000 images out of entire states (362880) → Generalization

14.10 Domain Acquisition implementation in LatPlan α (NOT our core contribution)

Entire transitions $R$ → $Encode(R)$→ PDDL

Why? → Trivial domain acquisition, no generalization

Ground      actions 0011 → 0101  (state variables are fully specified)
Generalized actions *01* → *10*  (state variables are partially specified)

LatPlanα: No generalization wrto domain acquisition

Main focus is on SAE: whether propositions made by SAE are feasible

→No complication due to domain acquisition

Developping a generalizer is an entirely different topic, we leave it to existing work

We made flour and demonstrated a sugarless cookie with it.

Others study how to make the most beautiful piece of chocolate cake from given flour

— which is future work. (flour = symbol)

Domain acquisition typically requires exisiting models (such as Semi-MDP)

We made inputs for existing work.

[Konidaris et.al. 14; Cresswell et al 13]

14.11 Why symbols?

Symbols are strong abstraction mechanisms becasue

Meanings do not matter

You do not have to understand it: Does a symbol $X$mean an apple or a car?

Logical reasoning can be performed by mechanical application of rules

Domain-independent planning : mystery vs nomystery

Logistic domains where symbol names are mangled (truck → shark)

Composable

A latent vector is a conjunction (and)

Heuristic functions use modus ponens to derive guidance

14.12 More details

GTX1070, PhenomII X6 (3.4GHz OC), 16GB Mem

Training: ~30 min
Solving: ~3 sec

14.12.1 State AutoEncoder (Train data)

1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$

14.12.2 State AutoEncoder (Validation)

1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$

14.12.3 State AutoEncoder

Input 2: intial/goal images

14.12.4 PDDL Domain Definition

Examples in $N=25$(in the paper we bypassed PDDL-SAS converter though)

(define (domain latent)
 (:requirements :strips :negative-preconditions)
 (:predicates (z0) (z1) (z2) (z3) (z4) (z5) (z6) (z7) (z8) (z9) (z10)
  (z11) (z12) (z13) (z14) (z15) (z16) (z17) (z18) (z19) (z20) (z21)
  (z22) (z23) (z24))
 (:action a10000010010110111100011111000010001011111110011111
  :parameters () :precondition
  (and (z0) (not (z1)) (not (z2)) (not (z3)) (not (z4)) (not (z5))
       (z6) (not (z7)) (not (z8)) (z9) (not (z10)) (z11) (z12)
       (not (z13)) (z14) (z15) (z16) (z17) (not (z18)) (not (z19))
       (not (z20)) (z21) (z22) (z23) (z24))
  :effect (and (z5) (not (z6)) (z13) (z20)))
 (:action a10000010010110111100011110000001001011011110001110
  ...

14.12.5 Results in one place

14.13 Does it have something to do with symbolic planning (BDDs) ?

No.