Classical Planning in Deep Latent Space: From Unlabelled Images to PDDL (and back)

Masataro Asai, Alex Fukunaga, The University of Tokyo

20+ min

Made by guicho2.71828 (Masataro Asai)

1

Hello, Machine Learning people!

First of all, hello machine learning people. Im not a machine learning person. Im usually studying a type of symbolic AI and I visited here expecting an interesting conversation with machine learning professionals.

2

Hello, Logic Programming, Data Mining, AGI, Theoretical CS people, too!

And also, hello other symbolic AI people. Although I am not doing exactly the same thing, we have common roots.

Let me give you one question before presenting my paper…

3 How many people have learned AI with these books?

aima1.jpg

aima2.jpg

aima3.jpg

  • Nice.

4

I am from Planning community

icapslogo.gif

International Conference on Automated Planning and Scheduling

(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)

5 Planning people are working on Automated Planners

  • Compute the high-level plan for an agent: plan = (action1, action2, … actionn)
    • Search-based planning projects many steps further into the future than RL
    • ↔ reflex (0-step), limited deliberation (1,2-steps)
  • High-dimensional, implicit state space search
    • Entire space does not fit in memory, only transition rules are given
  • STRIPS-like PDDL modelling language, based on
    • 1st-order logic, close-world assumption
  • Soundness, completeness, optimality matters
    • Expensive Applications: Satellite operations (Deep Space 1), Mars Exploration Rover (MER), underwater vehicles

6 High-level Planning: Sliding tile puzzle (a.k.a 8-puzzle)

  • Slide a tile to an empty place (each step)
  • → make the tile placement identical to the goal
  • → (abstracts the low level control e.g. arm motion)
  • Search states represented by conjunctions of 1st-order logic (objects, predicates, propositions)

    (empty x0 y0)     ; or, empty(x0, y0)
    (is panel6 x1 y0) ; or, is(panel6, x1, y0)
    (up    y0 y1)... (down  y1 y0)...
    (right x0 x1)... (left  x0 x1)...
    

initial state 8puzzle-standard.png

goal state 8puzzle-standard-goal.png

  • To symbolic systems names do not matter (just matters to human) as long as there is consistency

    (car   banana0 kiwi0)           ; numbers (0,1) don't matter either
    (shark bogo6 banana1 kiwi0)     ; just to make sense for us
    (apple  kiwi0 kiwi1)... (pinapple kiwi1 kiwi0)...
    (pen    banana0 banana1)... (applepen banana0 banana1)...
    

6.1 High-level Planning Task: State Transitions

How to represent "sliding up tile 7" ? → actions

Transition Rules of "sliding up":

  • You can slide up a tile only to an empty location.
  • When you slide it, the orignal location becomes empty.
  • New location is no longer empty, is occupied by the tile.

initial state 8puzzle-standard-tile7.png

  • When Empty(x, yold) ∧ is(panel, x, ynew) ∧ up(ynew, yold) ;

    then ¬ Empty(x,yold) ∧ Empty(x,ynew) ∧ ¬ is(panel, x, ynew) ∧ is(panel, x, yold)

  • ;; Translates to a PDDL model below:
    (:action slide-up ...
     :precondition (and (empty ?x ?y-old)
                        (is ?panel ?x ?y-new) ...)
     :effects (and (not (empty ?x ?y-old))     (empty ?x ?y-new)
                   (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))
    
  • But where does this representation come from?

6.2 Solving these problems: Heuristic state-space search

graph.png

6.3 High-level Planning Task: State Transitions

How to represent "sliding up tile 7" ? → actions

Transition Rules of "sliding up":

  • You can slide up a tile only to an empty location.
  • When you slide it, the orignal location becomes empty.
  • New location is no longer empty, is occupied by the tile.

initial state 8puzzle-standard-tile7.png

  • When Empty(x, yold) ∧ is(panel, x, ynew) ∧ up(ynew, yold) ;

    then ¬ Empty(x,yold) ∧ Empty(x,ynew) ∧ ¬ is(panel, x, ynew) ∧ is(panel, x, yold)

  • ;; Translates to a PDDL model below:
    (:action slide-up ...
     :precondition (and (empty ?x ?y-old)
                        (is ?panel ?x ?y-new) ...)
     :effects (and (not (empty ?x ?y-old))     (empty ?x ?y-new)
                   (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))
    
  • But where does this representation come from?

6.4 PDDL representations come from human encoding.

Knowledge-Acquisition Bottleneck (Cullen, 1988):

The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems

  • Symbols: Labels for identifiable entities (it's not just about objects!)

    Types of symbols  
    Object symbols panel7, x0, y0
    Predicate symbols (empty ?x ?y) (up ?y0 ?y1)
    Propositions p28 = (empty x0 y0) — application
    Action symbols (slide-up panel7 x0 y1)
  • Action Model: Description of Actions using Symbols

     

    When Empty(x, yold) ∧ … ; Then /¬ Empty(x,yold) ∧ / …

The knowledge acquisition bottleneck: time for reassessment? : Cullen, J and Bryman, A Expert Syst. Vol 5 No 3 (August 1988) pp 216-225

6.5 Can we solve an 8-puzzle optimally, w/o human encoding?

  • 3x3 Image-based Sliding Tile Puzzle: maximum 31 steps optimal plan
  • 4x4、5x5 : infeasible under blind search (memory exhaust)
  • Symbolic systems can compute optimal solutions w/ A* + admissible heuristics

puzzle.jpg

  • BUT

    WE

    DO

    NOT

    HAVE

    SYMBOLS

    NOR

    AN ACTION MODEL!

7 Backgrounds

Survey of Exisiting Action Model Acquisition Techniques

 

i.e. Systems that find action models

7.1 Limitations of Existing Systems

So far, ALL existing AMA systems require symbolic / near-symbolic, accurate state inputs and/or discrete action labels.

i.e. They need symbols to find an action model

7.2 ARMS (Action-Relation Modelling System) (Yang AIJ07)

Taking the symbolic inputs

arms.png

7.3 LOCM (ICAPS09)

Taking the symbolic inputs

locm.png

7.4 Framer (ICAPS17)

Near-Symbols : Parses natural language sentences with a clear grammatical structure.

framer.png

  • Alleviates the burden of domain experts, but still requires human
  • Not handling "Natural Language":

    Pick up that parcel over there … yeah, it has a label on it, it says Parcel1, you can see it from here, the Location B. Then put it in the car, I mean the truck, the red one.

7.5 Konidaris, Kaelbring (AAAI14, IJCAI15)

"Constructing Symbolic Representations for High-Level Planning" (AAAI14)

What it does

Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation

i.e. Model-to-Model conversion, not generating a model from the scratch

Semi-MDP contains Action Labels
move and interact (Playroom)
Sensor inputs are structured (Labels for "State Variable" are known)

x/y-distance, light level, whether a monkey cries

→ Each sensor has a distinct meaning (no overwrap)

7.6 Learning from Observation (Argall AIJ09, Mourao UAI12)   skip

Noisy, incomplete, but symbolic states/actions

mourao.png

mourao2.png

7.7 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16)   skip

Handles Images, but with strong assumptions (almost symbol)

e.g. Understands Tic-Tac-Toe with 3x3 Ellipse Detector (Bardu 10)

Almost immediately provides propositions

Domain-dependent ("3x3 grid" is hard-coded)

8 Problem Description

We do not have SYMBOLS nor ACTION MODELS

So we cannot apply existing AMA methods!

puzzle.jpg

8.1 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

puzzle.jpg

8.2 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

puzzle.jpg

8.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

No Prior Knowledge : Domain-independent Image-based planner

Tower of Hanoi

hanoi.jpg

Lights-Out

lightsout.jpg

8.4 Input1: Training Inputs – Image Pairs

Image pairs showing the before/after states of valid actions (randomly sampled)

1.png

8.5 Input1: Training Inputs – Image Pairs

2.png

8.6 Input2: Planning Inputs – Initial Image & Goal Image

Visual depiction of the initial state and a single goal state

input2.png

8.7 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

 

 

8.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

9 Our Core Contribution : State AutoEncoder (SAE)

SAE is a neural network which provides two functions:

  • $b = Encode(r)$: Function that maps a raw datum $r\;$to a bit vector $b\;$(propositions)
  • $\tilde{r} = Decode(b)$: Function that maps a bit vector $b\;$to a raw datum $\tilde{r}$
  • A bidirectional mapping between subsymbolic & symbolic representation

9.1 AutoEncoder : Unsupervised Learning

Auto = "self" — Autoencoding = "encoding itself"

  • Target Function: Identity $x=f^*(x)$
  • Compression: $X \rightarrow Z \rightarrow X$
  • Map the input $x\;$to a latent vector $z$, then back to $x$
  • Extract the de/compressor: $Encode: x \mapsto z, Decode: z \mapsto x$

autoenc.png

  • → However, ✘ Latent vector is real-valued

    INCOMPATIBLE to classical planning

9.2 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

A reparametrization trick for categorical distribution: 1-hot vector e.g. $\langle 0,0,0,1,0,0 \rangle$.

Below: Represent an MNIST image with 30 variables of 8 categories.

256
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
256
ReLU
[Not supported by viewer]

Key idea: These categorical variables are directly usable

as a source of PDDL/SAS

In particular, 2 categories → propositional variables (true/false)

9.3 State Autoencoder (before training)

state-ae-before.png

9.4 State Autoencoder (after training)

state-ae.png

10 Using SAE, we propose LatPlan, an architechture

and LatPlanα, an implementation

planning1.png

10.1 Step 1: State Autoencoder (Core Contribution)

A neural network bridging the Symbolic/Subsymbolic boundary

planning2.png

10.2 Step 2: Action Model Acquisition (NOT our contribution)

Bitvectors → PDDL domain description

planning3.png

10.2.1 Our PDDL generation in LatPlan α (NOT our contribution)

Placeholder for existing/future Action Model Acquisition methods

Individual actions are mapped to PDDL actions (trivial Actiom Model Acquisition)

 0011 → 0101

   ↓

(:action ...
 :precondition (and (b0-false) (b1-false) (b2-true) (b3-true)) ; before-state
 :effect       (and (not (b1-false)) (b1-true)    ; state diff
                    (not (b2-true))  (b2-false)))

Conversion from a bit to a proposition:

$i$-th bit is 1 → Proposition ($b_i$-true)

$i$-th bit is 0 → Proposition ($b_i$-false)

Disclaimer: We have new AMA method based on NN (we do not disclose it yet)

10.2.2 Example PDDL

(define (domain latent)
 (:requirements :strips)
 (:predicates (b0-true) (b0-false) (b1-true) ... (b24-false))

 (:action a10000010010110111100011111000010001011111110011111
  :parameters () :precondition
  (and (b0-true) (b1-false) (b2-false) ... (b24-true))
  :effect (and (not (b5-false))  (b5-true)
               (not (b6-true))   (b6-false)
               (not (b13-false)) (b13-true)
               (not (b20-false)) (b20-true)))

 (:action a10000010010110111100011110000001001011011110001110
  ...

10.2.3 Step 2: Action Model Acquisition (NOT our contribution)

Bitvectors → PDDL domain description

planning3.png

10.3 Step 3: Solve the PDDL instance w/ off-the-shelf planner

planning4.png

10.4 Step 4: Executing the symbolic plan

planning5.png

We get an incomprehensive state sequence

10.5 Step 5: Mapping the plan to images (human-comprehensive)

planning6.png

11 Results (MNIST 8-puzzle)

8-puzzle using digits from MNIST database

mnist-plan.png

An instance whose the optimal solution length is known

31 step optimal plan

11.1 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

MNIST 8-puzzle has cleanly separated objects -> This domain does not.

mandrill-intro.png

11.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

mandrill-plan.png

Optimal Solution

11.3 Tower of Hanoi (3 disks, 4 disks)

Completely different puzzle problem can be solved with no change

hanoi3.png

hanoi4.png

Optimal Solution (7 steps,15 steps)

11.4 Lights Out

Completely different puzzle problem can be solved with no change

lights-out.png

Optimal Solution

11.5 Twisted Lights Out

Does not assume grid-like structures

lights-out-skewed.png

Optimal Solution

11.6 Handling the Noisy Input

noise.png

Optimal Solutions

12 Why bother the off-the-shelf planner? Shouldn't the blind search do?

Domain-independent heuristics works, SURPRISINGLY!

  • This is NOT a trivial finding!

    Heuristics are…

    • taylored for man-made domains
    • assuming a certain structure

    PDB may even sometimes lose to Blind on IPC instances (Edelkamp 12)

  • Would allow to solve the more difficult problems (future work)
Expanded nodes Dijkstra A*+PDB Speedup
MNIST 8-puzzle 193924 109096 x2
MNIST 8-puzzle 201156 111642 x2
MNIST 8-puzzle 186767 84561 x2
MNIST 8-puzzle 183336 82518 x2
MNIST 8-puzzle 169907 52084 x3
MNIST 8-puzzle 130863 26967 x5
Hanoi (4 peg) 55 17 x3
LightsOut (4x4) 952 27 x30
Spiral LightsOut (3x3) 522 214 x2.5
Mandrill 8-puzzle 335378 88851 x4

Leverage the effort from ICAPS community!

13 Conclusion

  • Input: Unlabelled pairs of images, initial image, goal image
  • Output: Visualized plans to achieve the goal
  • State AutoEncoder(SAE): VAE with Gumbel-Softmax
  • Contribution : SAE generates propositions from raw data

    compatible to the symbolic KE systems, classical planners

  • Latplan α : A simplified prototype

    → Can leverege from the development of both symbolic/subsymbolic research

13.1 Future Work (input format)

LatPlan is an architecture : Any system with SAE is a LatPlan implementation

Different SAE allows LatPlan to reason about different types of raw data

  • Autoencoders for unstructured text [Li et.al. 2015], audio [Deng, Li, et al. 2010]
  • Examples:
    • This is an apple, this is a pen → oh, ApplePen! (ugh embarassing)

      when actions resembling "word concatenation" was learned

  • SAE + Automated Planning will bring AI to a whole new level

    e.g. 1000 steps of optimal reasoning over natural language

  • Hire me! I'm finding a Postdoc position ☺

"A hierarchical neural autoencoder for paragraphs and documents." (2015)

"Binary coding of speech spectrograms using a deep auto-encoder." (2010)

14 Appendix

14.1 I expect mixed responses such as:

  • Wait, what!? You solved the symbol grounding!? (No)
  • Huh, I hate deep learning hype, NN cannot be trusted.

    (LatPlan can be trusted)

  • This is not an action model acquisition / not finding symbols.

    (We don't do AML; we find different symbols)

14.2 Did we find symbols? It doesn't sound like what I think symbols are

You solved the symbol grounding!? / This is not finding symbols. / This isn't a action model acquisition.

PDDL implies there are several kinds of symbols and we solved only one issue

  • Each issue requires a different approach. LatPlan should be combined with these work.

    Types of symbols addressed by
    Propositions SAE
    Objects R-CNN (Computer Vision)
    Predicates ???
    Actions ???

14.3 Did we find symbols? Why not individual pixels? Why DL?

Systems based on individual pixels lack generalization

  • Noise / variations can make the data entirely different

    noise.png

  • must acquire the generalized features
  • = a nonlinear function that recognize the entanglements between multiple pixels

14.4 Learning vs Planning

Main differences: Purposes and the abstraction layer

Machine Learning, Neural Networks

for Recognition, Reflex

  • Subsymbolic Input (continuous)

    Images, Audio, unstructured text:

  • Soft Intelligence:

      Reflex Agent, Immediate actions

    Pavlov's dog : bell → drool

    Autonomous Driving : Pedestrian → Stop.

    Machine Translation : Sentence → Sentence

    Eval. Function for Go : board → win-rate

    ☺ Efficient 1-to-1 mapping

    ☹ Simple tasks

Deliberation, Search

for Planning, Game, Theorem Proving

  • Symbolic Input/Output

    Logic, objects, induction rules

  • Hard Intelligence by Logic:

      Multi-step strategies

    Rescue Robot : actions → help the surviver

    Theorem Proving : theorems → QED

    Compiler : x86 instructions

    Game of Go : stones → Win

    ☺ Ordering constraint + complex tasks

  • AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)

14.5 Human-Competitive Systems

AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)

  • However, domain-specific – specialized in Go, "Grids" / "Stones" are known
  • Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
  • Is supervised learning necessary for human?

    True intelligence should search / collect data by itself

DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)

Domain-independent Atari Game solver (Invader, Packman…), however:

  • RL Acting: Greedily follow the learned policy → no deliberation!
  • You can survive most Atari games by reflex

14.6 Latplan Advantages

Perception based on DLNN

— Robust systems augmented by the latest DL tech

Decision Making based on Classical Planning

Better Theoretical Guarantee than Reinforcement Learning

Completeness (Finds solution whenever possible), Solution Optimality

Decision Making Independent from Learning

Unsupervised (No data required), Explainable (Search by logic)

14.7 When Latplan returns a wrong solution?

Machine learning may contain errors (convergence only on $t\rightarrow \infty$, not on real time)

  • Images → Fraud symbols/model/graph
  • Optimal path on a fraud graph or graph disconnected

    A* completeness, soundness, optimality (admissible heuristics)

  • Fraud visualized plan (noisy) / no plan found

LatPlan may make wrong observations but no wrong decisions

BTW, "correctness" is defined by error prone observations by humans anyways …

(completeness, optimality) → better reliablility than Reinforcement Learning

14.7.1 Reinforcement Learning

Not only perception but decision making also depends on training

  • Each training result does not have admissibility
  • When the learned policy is wrong, the solution could be suboptimal

… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.

Cade Metz. "In Two Moves, that Redifined the Future." Wired, 2016

RL may make wrong decisions.

14.8 Future Work (SAE)

SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)

  • 1st-order logic (predicate $p(a,b)$)
  • We need object recognition from images (parameters $a,b$)
  • SAE with networks for object recognition (e.g. R-CNN) should achieve this

14.9 SAE implementation in LatPlan α

Keras, Adam optimizer (learning rate:0.001)

1764(42x42)

[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2

→FC(49,GumbelSoftmax) (variational loss)

[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2

→1764(42x42) (loss: Binary crossentropy)

 

Why full-connected layers ?

Main focus is on whether propositions made by SAE are feasible

→No complication due to CNN, ResNet, etc… (but I have a working implementation)

Training on 8-puzzle
12000 images out of entire states (362880) → Generalization

14.10 Domain Acquisition implementation in LatPlan α (NOT our core contribution)

Entire transitions $R$ → $Encode(R)$→ PDDL

Why? → Trivial domain acquisition, no generalization

Ground      actions 0011 → 0101  (state variables are fully specified)
Generalized actions *01* → *10*  (state variables are partially specified)
LatPlanα: No generalization wrto domain acquisition

Main focus is on SAE: whether propositions made by SAE are feasible

→No complication due to domain acquisition

Developping a generalizer is an entirely different topic, we leave it to existing work

We made flour and demonstrated a sugarless cookie with it.

Others study how to make the most beautiful piece of chocolate cake from given flour

— which is future work. (flour = symbol)

Domain acquisition typically requires exisiting models (such as Semi-MDP)

We made inputs for existing work.

[Konidaris et.al. 14; Cresswell et al 13]

14.11 Why symbols?

Symbols are strong abstraction mechanisms becasue

Meanings do not matter

You do not have to understand it: Does a symbol $X$mean an apple or a car?

Logical reasoning can be performed by mechanical application of rules

  • Domain-independent planning : mystery vs nomystery

    Logistic domains where symbol names are mangled (truck → shark)

Composable

A latent vector is a conjunction (and)

Heuristic functions use modus ponens to derive guidance

14.12 More details

GTX1070, PhenomII X6 (3.4GHz OC), 16GB Mem

  • Training: ~30 min
  • Solving: ~3 sec

14.12.1 State AutoEncoder (Train data)

1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$

autoencoding_train.png

14.12.2 State AutoEncoder (Validation)

1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$

autoencoding_test.png

14.12.3 State AutoEncoder

Input 2: intial/goal images

init_goal.png

14.12.4 PDDL Domain Definition

Examples in $N=25$(in the paper we bypassed PDDL-SAS converter though)

(define (domain latent)
 (:requirements :strips :negative-preconditions)
 (:predicates (z0) (z1) (z2) (z3) (z4) (z5) (z6) (z7) (z8) (z9) (z10)
  (z11) (z12) (z13) (z14) (z15) (z16) (z17) (z18) (z19) (z20) (z21)
  (z22) (z23) (z24))
 (:action a10000010010110111100011111000010001011111110011111
  :parameters () :precondition
  (and (z0) (not (z1)) (not (z2)) (not (z3)) (not (z4)) (not (z5))
       (z6) (not (z7)) (not (z8)) (z9) (not (z10)) (z11) (z12)
       (not (z13)) (z14) (z15) (z16) (z17) (not (z18)) (not (z19))
       (not (z20)) (z21) (z22) (z23) (z24))
  :effect (and (z5) (not (z6)) (z13) (z20)))
 (:action a10000010010110111100011110000001001011011110001110
  ...

14.12.5 Results in one place

mnist-plan.png

mandrill-plan.png

hanoi3.png

hanoi4.png

lights-out.png

lights-out-skewed.png

14.13 Does it have something to do with symbolic planning (BDDs) ?

No.

Author: Masataro Asai

Created: 2017-07-18 火 09:33

Validate