Classical Planning in

Deep Latent Space:

Bridging the Subsymbolic-Symbolic Boundary

Masataro Asai

The University of Tokyo

Note to Basel people: this presentation is aimed at the general audience who may not be familiar or even haven't heard about AI planning.

Made by guicho2.71828 (Masataro Asai)

1 Introduction

Hello, Machine Learning people!

Deep Learning, RL, data mining, etc…

  • Do you identify yourself as a connectionist? ✋

First of all, hello machine learning people. Im not a machine learning person. Im usually studying a type of symbolic AI and I visited here expecting an interesting conversation with machine learning professionals.

1.1 Introduction

Hello, Symbolic AI people!

SAT, CP, Logic, TCS, …

  • Have you tried any DL libraries? ✋

And also, hello other symbolic AI people. Although I am not doing exactly the same thing, we have common roots.

Let me give you one question before presenting my paper…

1.2 Introduction

I am from AI Planning community (symbolic, logic-based)

icapslogo.gif

International Conference on Automated Planning and Scheduling

(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)

1.3 We are working on Automated Planners

 

High-level plan for agents

plan = (action1, action2, … actionn)

Initial state → 🚶 → 🚗 → 🔨 → Goal state

 

1st-Order Logic

STRIPS/PDDL

modelling language

Close-world assumption

Combinatorial Explosion

High-dimensional

implicit state space

PSPACE-Hard

Expensive, Critical Applications

Satellite Operation

Deep Space 1

Mars Exploration Rover

 

  • Soundness, Completeness, Optimality

    of the algorithms are important

    (compared to "planning" in other AI fields)

1.4 Introduction

Today, I talk about how to integrate DL and logic-based planning systems.

2 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

3 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

4 Sliding tile puzzle (a.k.a 8-puzzle)

8puzzle-standard.png Initial State

8puzzle-standard-goal.png Goal State

8puzzle.gif

  • You need to find a plan to reach the goal.

4.1 States

States as 1st-order logic formula, described in PDDL language.

Empty(x0, y0)

Is(panel6, x1, y0)

Up(y0, y1), Down(y1, y0)…

Right(x0, x1), Left(x0, x1)…

(empty x0 y0)
(is panel6 x1 y0)
(up    y0 y1), (down y1 y0)...
(right x0 x1), (left x0 x1)...

8puzzle-standard.png Initial State

8puzzle-standard-goal.png Goal State

8puzzle.gif

4.2 Transitions / Actions

Representing sliding upTransition Rules (Actions)

  • When the new location is empty:
  • The current location becomes empty.
  • New location is occupied and is not empty.

initial state 8puzzle-standard-tile7.png

  • First-order Logic Formula

    When Empty(x, yold) ∧ is(panel, x, ynew) ∧ up(ynew, yold) ;

    then ¬ Empty(x,yold) ∧ Empty(x,ynew) ∧ ¬ is(panel, x, ynew) …

  • PDDL Model : Actual Input to the Planner

    (:action slide-up ...
     :precondition (and (empty ?x ?y-old) ...)
     :effects (and (not (empty ?x ?y-old)) (empty ?x ?y-new) ...))
    

4.3 Symbolic planners cannot solve an Image-based 8-puzzle

  • PDDL for 8-puzzle can be solved optimally << 1sec

puzzle.jpg

  • BUT

    WE

    DO

    NOT

    HAVE

    A

    PDDL

    MODEL!

  • Impractical in an unknown environment where no human is available

    E.g. autonomous space exploration!

4.4 Knowledge-Acquisition Bottleneck (Cullen, 1988):

The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems

  • For image-based tasks, we must automate 2 processes:
  • 1. Symbol Grounding:

    Symbols = identifiable entities

    Types Examples
    Objects panel7, x0, y0
    Predicates (empty ?x ?y)
    Propositions p28 = (empty x0 y0)
    Actions (slide-up panel7 x0 y1)
  • 2. Action Model Acquisition (AMA):

    Describe actions.

     

    When Empty(x, yold) ∧ … ;

    Then ¬Empty(x,yold) ∧

  • A Long-Standing Problem
  • Now we propose a system which automate these processes…

4.5 Latent-Space Planner (Latplan)

latplanlogo.png

(embedding the lambda calculus inside a neural architecture)

5 Survey of Exisiting Action Model Acquisition Techniques

Survey of Exisiting Action Model Acquisition Techniques

 

i.e. Systems that find action models

5.1 Limitations of Existing Systems

So far, ALL existing AMA systems require symbolic / near-symbolic, accurate state inputs and/or discrete action labels.

i.e. They need symbols to find an action model

5.2 So far, ALL existing AMA systems require symbolic inputs

ARMS (Yang AIJ07) LOCM (ICAPS09) Argall (AIJ09) Mourao (UAI12)

All taking the symbolic inputs to find the action models

locm.png

5.3 Framer (ICAPS17)

Near-Symbols : Parses natural language sentences with a clear grammatical structure.

framer.png

  • Alleviates the burden of domain experts, but still requires human
  • Not handling "Natural Language":

    Pick up that .. ur .. parcel over there … yes, it has a label on it, it says Parcel1, you can see it from here, the Location B. Then put it in the car, I mean the truck, the red one whose rear bumper is a bit rusty.

5.4 Konidaris, Kaelbring (AAAI14, IJCAI15)

"Constructing Symbolic Representations for High-Level Planning" (AAAI14)

What it does

Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation

i.e. Model-to-Model conversion, not generating a model from the scratch

Semi-MDP contains Action Symbols
move and interact (Playroom)
Sensor inputs are structured (Labels for "State Variable" are known)

x/y-distance, light level, whether a monkey cries

→ Each sensor has a distinct meaning (no overwrap)

5.5 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16)

Handles Images, but with strong assumptions (almost symbol) e.g.

Tic-Tac-Toe with Ellipse Detectors (Bardu 10)

→ Almost immediately provides propositions

→ Also, Domain-dependent ("3x3 grid" "Ellipse" are hard-coded)

6 Problem Setting

Problem Setting

of

Latent-Space Planner

6.1 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

puzzle.jpg

6.2 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

puzzle.jpg

6.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

No Prior Knowledge : Domain-independent Image-based planner

Tower of Hanoi

hanoi.jpg

Lights-Out

lightsout.jpg

6.4 Inputs

The system is given 2 types of input:

  • Training Input
  • Planning Input

6.5 Input1: Training Input – Image Pairs

1.png

6.6 Input1: Training Input – Image Pairs

2.png

They are randomly sampled image transitions

  • No state descriptions (unlike AlphaGo (Silver et al. '16))
  • No expert traces (unlike AlphaGo)
  • No rewards (unlike DRL systems)
  • No access to the simulator (unlike DQN (Mnih et al. '15) for Atari)

    → cannot ask for more examples

  • No action symbols (no system to our knowledge)

      (e.g. ↑↓←→AB in Atari, grids in Go)

    → doesn't know what's happening in each transition

6.7 Input2: Planning Input – Initial Image & Goal Image

input2.png

6.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

6.9 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

7 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

8 Latent-Space Planner (LatPlan) architechture

planning1.png

8.1 Step 1: Propositional Symbol Grounding

planning2.png

8.2 Step 2: Action Model Acquisition (AMA)

planning3.png

8.3 Step 3: Solve the Symbolic Planning Problem

planning4.png

8.4 Step 4: Executing the Symbolic Plan

planning5.png

8.5 Step 5: Obtaining the Visualization

planning6.png

8.6 Summary

Latplan: Latent-space Planner.

  • bridges real-world and propositional representations.
  • performs a propositional, logical, sound reasoning.
  • returns a real-world output (e.g. images).
  • runs in a completely automated, unsupervised manner.

9 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

10 State AutoEncoder (SAE)

SAE is a neural network which provides two functions:

  • $b = Encode(r)$: maps a raw datum $r\;$to a bit vector $b\;$
  • $\tilde{r} = Decode(b)$: maps a bit vector $b\;$to a raw datum $\tilde{r}$
  • A bidirectional mapping between
    • subsymbolic representation (image array) and
    • symbolic representation

      (bit vectors = propositional variables)

10.1 Neural Network 101

1.png

10.2 Neural Network 101

2.png

10.3 Neural Network 101

3.png

10.4 Stochastic Gradient Descent + GPU

gradient-descent.png

Plus misc techniques e.g. Batchnorm, Dropout

Pretty much everything is on the standard online tutorial / lecture cource / MOOP

Good libraries — Tensorflow, Keras — you can learn in 1-2 months

10.5 NN for Standard Classification Tasks (Supervised)

Task Input x Output y
Image classification Image Label (1=car, 2=cat, 3=monkey …)
  • This is not suitable for our task;
    • There are no labels provided by humans, i.e.
    • There are no real answers for symbol grounding

       

      (People also ground symbols differently.

      cf. How many colors in a 🌈, 5,6 or 7?)

10.6 Unsupervised Learning for NN: AutoEncoder (AE)

 

Target Function: Identity $x=f(x)$

  • Encode $x\;$to a latent vector $z$
  • Decode $z\;$back to the input $x$
  • Training: Minimize $|x - f(x)|\;$ (reconstruction loss)

autoenc.png

  • → However, ✘ Latent vector Z is real-valued

    INCOMPATIBLE to propositional reasoning

10.7 Variational AutoEncoder (VAE, Kingma 2014)

An AutoEncoder that enforces a certain distribution on $Z \subset \mathbb{R}^n$over the dataset $X$

You have $X=${ 10k images of apples }. If you train a Gaussian VAE on $X$, then $Z = Encode(X) \approx N(\mu,\sigma)$for some $\mu,\sigma \in \mathbb{R}^n$.

VAE needs a reparametrization trick because random distributions are non-differentiable.

Reparametrization for $N(\mu,\sigma)$: $\mu + \sigma N(0,1)$

$\mu$and $\sigma$are differentiable (trainable) vectors, $N(0,1)$is not.

10.8 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

Additional optimization penalty which enforces $Z \sim \textbf{Categorical}$:

   → $z\;$converges to a 1-hot vector e.g. $\langle 0,0,1,0 \rangle$.

Example: Represent an MNIST image with 30 variables of 8 categories.

256
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
512
ReLU
[Not supported by viewer]
256
ReLU
[Not supported by viewer]
  • Key idea: These categorical variables are directly usable

    as the source of propositional models

    In particular, 2 categories → propositional variables (true/false)

10.9 State Autoencoder (before training)

state-ae-before.png

10.10 State Autoencoder (after training)

state-ae.png

11 Verifying the feasibility of Latplan system and SAE

Q: Does the SAE (neural network) produce

sound propositions for reasoning?

AMA1 : a trivial, oracular AMA method w/o generalization.

  • Encode the entire raw state transitions in the environment
  • For each transition, perform the following conversion:

     0011 → 0101  ;; encoded bit vectors of a transition
    
       ↓       ;; one action per transition
    (:action       action-0011-0101
                   ;; before-state is embedded directly
     :precondition (and (b0-false) (b1-false) (b2-true) (b3-true))
                   ;; effect = state diff
     :effect       (and (not (b1-false)) (b1-true)
                        (not (b2-true))  (b2-false)))
    

11.1 Step 3: Solve the Symbolic Planning Problem

planning4.png

11.2 Step 4: Executing the Symbolic Plan

planning5.png

11.3 Step 5: Obtaining the Visualization

planning6.png

12 AMA1 Experiments

SAE: trained with 20k images (note: > 360k entire states in 8-puzzle)

  • SAE is generalizing

AMA1: requires the entire transistions (note: > 1M transitions in 8-puzzle)

  • AMA1 is NOT generalizing, being an oracle

Planner: a State-of-the-Art, Fast Downward (Helmert, 08)

  • $A^*$(optimal search) : It must find an optimal solution
  • Runtime: ~3sec (instances are too small for symbolic systems)

12.1 8-puzzle Results with MNIST tiles (MNIST 8-puzzle)

8-puzzle using digits from MNIST database

mnist-plan.png

An instance whose optimal solution length is known

31 step optimal plan

12.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

MNIST 8-puzzle has cleanly separated objects -> This domain does not.

mandrill-intro.png

12.3 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

mandrill-plan.png

Optimal Solution

12.4 Results with photographic, unseparated tiles (Spider 8-puzzle)

spider-plan-new.png

Notice that latplan has no idea that this is an 8-puzzle; Thus MNIST, Mandrill, Spider are entirely different domains (state encoding is also different)

12.5 Tower of Hanoi (3 disks, 4 disks)

Completely different puzzle problems can be solved by the same system

hanoi3.png

hanoi4.png

Optimal Solution (7 steps,15 steps)

12.6 Lights Out

Does not assume "objects" are static (objects may disappear)

lightsout_new4x4.png

Optimal Solution

12.7 Twisted Lights Out

Does not assume grid-like structures

lightsout_twisted_new4x4.png

Optimal Solution

12.8 Handling the Noisy Input

SAE implementation uses a denoising layer (Denoising AE, Vincent 08)

noise-new.png

Benefit from existing DL methods immediately

Improved speed, robustness, accuracy

13 Why bother the off-the-shelf planner? Shouldn't the blind search do?

Domain-independent lowerbounds work, SURPRISINGLY!

  • This is NOT a trivial finding!

    lower-bounds are…

    • taylored for man-made domains
    • assumes the domain has a structure

    Blind search even sometimes outperform sophisticated methods on man-made instances (Edelkamp 12)

  • lb works → more difficult problems can be solved (future work)

 

Number of states expanded, mean(stdev.)

domain N Dijkstra A*+PDB
MNIST 13 210k(50k) 97k (53k)
Mandrill 12 176k(13k) 112k (57k)
Spider 13 275k(65k) 58k (30k)
Hanoi 30 20.7(29.7) 12.6 (22.0)
LightsOut 16 433(610.4) 130 (164)
Twisted 16 398(683.1) 29.3 (62.4)

N : number of instances.

14 Why Gumbel-Softmax is necessary?

  • Alternative 1: Use a normal NN and round the encoder output?
    • ✘ The decoder is not trained with 0/1 value, cannot obtain the meaningful output
  • Alternative 2: Include the round operation in a NN?
    • ✘ Rounding is non-differentiable / Backpropagation impossible

15 State Autoencoder Conclusion

SAE can learn from small examples
20k training images → learn to map 360k unseen images
SAE-based propositions are sound
Given the oracular AMA1, planners can reason over the propositions
Latplan maintains the theoretical guarantee in the search algorithm
Given the complete state space graph,
  • Optimising algorithm (A*) returns an optimal solution
  • Completeness is guaranteed

16 * ☕ Break ☕ *

Background

Latplan Architecture

State AutoEncoder (SAE)

☕ Break ☕

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

17 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

18 SAE Feasible! Now what?

AMA1 is impractical, cannot obtain entire data in the real world

  • AMA should

    learn / generalize

    from small examples

ama1.png

  • AMA2, a novel neural architecture

19 The Task of AMA2 : The Real AMA method

Input: Propositional transitions $\{ (s,t) \ldots \}$

($Encod$'ed from Training Input)

  • Action Symbol Grounding

    (:action slide-up-tile7-at-x0-y1 ...
    
  • Action Preconditon Learning

    :precondition (and (empty x0 y0) ...)
    
  • Action Effect Learning

    :effects      (and (empty x0 y1) (not ...)))
    

19.1 Why action symbols are necessary?

Planners typically perform a forward search (Dijkstra, $A^*$)

Without Action Symbols, successor generation becomes challenging

  • SAE with $|z|=36 \text{bit}$

    → Generate $2^{36}$potential successors, then filter the invalid

  • With action symbols, (e.g. up, down, left, right)
  • → We can enumerate the candidates in constant time.

19.2 Challenge: Training Input lacks action symbols

Should identify the number of schemas from what is un/affected

action-symbol.png

19.3 Challenge: Precondition/Effect learning is not trivial

Case 1: Linear Space

  • We do not need an action label; it is linear anyways
  • Then, AMA ≡ prediction task ($\approx$scene prediction in video)
    • We can train a neural network $a(s) = t\;$by minimizing $|t-a(s)|$
    • Most DL literature follows this path

linear.png

19.4 Challenge: Precondition/Effect learning is not trivial

Case 2: Graphs

  • Multiple action labels
  • Varying number of successors

    for each state

  • Cannot train $a(s) = t\;$for each $a$ because we don't know which $(s,t)$belongs to which $a$.

non-linear.png

  • Correct Question: What is the right function to learn?

20 AMA2 Overview

overview0.png

20.1 AMA2 Overview

overview1.png

20.2 AMA2 Overview

overview1-1.png

20.3 AMA2 Overview

overview2.png

20.4 AMA2 Overview

overview3.png

21 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

22 Action AutoEncoder

What is the right function to learn?

  • $a(s) = t$?

    We cannot train this

  • $a$is a variable!

    $apply(a,s) = t$

  • Transition is a mapping

    action $a → $ successor $t$

    conditioned by

    the current state $s$

  • The right function to learn!

22.1 Implementing $(a \rightarrow t) | s$

aae-0.png

22.2 Implementing $(a \rightarrow t) | s$

aae-1.png

22.3 Implementing $(a \rightarrow t) | s$

aae-2.png

22.4 Implementing $(a \rightarrow t) | s$

aae-3.png

23 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA2 Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

24 Action Discriminator learns preconditions

overview2.png

24.1 Action Discriminator learns Preconditions = Binary Classifer

Binary classifier telling which transitions are valid

ad.png

  • Trained by a positive and negative dataset
  • We have data for valid transitions (observed data).
  • Where are the negative (invalid) datasets?

24.2 Negative dataset

  • Invalid transitions: All transitions that are not valid.
    • We lack the definition; Cannot be generated.
  • Can we collect the invalid data?
    • Teleportation violates the laws of physics. (at least in a macro scale)

      teleportation.png

  • Invalid transitions never happen → an autonomous agent cannot possibly collect data in the real world.
  • Collecting a negative dataset is fundamentally impossible

24.3 Solution: PU-Learning (Elkan & Noto, KDD 08')

Training a positive & negative classifier from the positive & unlabelled datasets.

  • Positive: Training Input. Observed data are valid, guaranteed
  • Unlabelled: Successor candidates generated by the AAE
    • Some are valid, some are invalid
  • Now we can train the AD!

    (SD explanation: skipped)

25 Planning using AMA2

AAE enumerates the canditates; AD/SD filters the invalid.

\begin{align*} Succ(s) &= \{t = apply(a,s) \; | \; a \in \{0\ldots 127\},\\ & \qquad \land AD(s,t) \geq 0.5 \\ & \qquad \land SD(t) \geq 0.5 \} \end{align*}

We have action symbols and propositional states!

Now we can run $A^*$with goal-count heuristics

26 AMA2 Experiments 1

Is it feasibile to do planning with AAE and AD?

  • 100 instances for each domain
    • self-avoiding random walks from the goal state
    • (benchmark A) 7-step
    • (benchmark B) 14-step
  • 180 sec. time limit
  • Domain-specific plan validators.

26.1 Results

Noise are applied to the planning inputs (init/goal images)

G: Gaussian noise, s/p: salt/pepper noise

  • Easy instances: Majority of instances are solved
  • Harder instances: Still many instances are solved
/ <   > <   >
step 7 7 7 14 14 14
noise std G s/p std G s/p
MNIST 72 64 64 6 4 3
Mandrill 100 100 100 9 14 14
Spider 94 99 98 29 36 38
LightsOut 100 99 100 59 60 51
Twisted LO 96 65 98 75 68 72

27 AMA2 Experiments 2

How accurate are Action Discriminators?

Measure the type-1 / type-2 error in %

  type1 type2
MNIST 1.55 6.15
Mandrill 1.10 2.93
Spider 1.22 4.97
L. Out 0.03 1.64
Twisted 0.02 1.82
Hanoi 0.25 3.79
(AD type-1)
Generate all valid transitions and count the number of misclassification.
(AD type-2)
For 1000 randomly selected valid states, generate all successors, prune by SD, remove the valid transitions (w/ validator), then count the transitions misclassified as valid.
  • Reasonably accurate.

28 Conclusion

  • Latplan: The first system which completely automatically generates a classical, symbolic planning model from scratch
  • State AutoEncoder(SAE) : real-world statespropositional states

    Action AutoEncoder(AAE) : transitionsaction labels, effects

    Action Discriminator(AD) : transitionbool with PU-Learning

    Two of the major grounding problems were solved!

    Types of symbols  
    Propositional symbols Solved!
    Action symbols Solved!
    Object symbols Computer Vision techniques?
    Predicate symbols ???

    Future work: improved accuracy, runtime efficiency.

    This opens many avenue for future research.

28.1 Future Work (Input format)

LatPlan is an architecture : Any system with SAE is an implementation

Different SAE → reason about different types of raw data

  • Autoencoders for text, audio [Li 2015, Deng 2010]
  • Example:
    • Transition rule "This is an apple, this is a pen → oh, ApplePen!"
  • When this happens, Latplan brings AI to a new level e.g.

    1000 steps of natural language reasoning with logic, not by reflex

    • Fundamentally impossible for short-sighted / greedy agents

"A hierarchical neural autoencoder for paragraphs and documents." (2015)

"Binary coding of speech spectrograms using a deep auto-encoder." (2010)

28.2 Future Work (Extended planning formalism)

  • Latplan assumes nothing about the environment machinery (grids, movable tiles…)
  • Latplan assumes fully-observable, deterministic domains
  • Next step: Extending Latplan to MDP, POMDP
    • Gumbel-Softmax layer → just a Softmax layer? (probability)
    • $AO^*$, $LAO^*$algorithms for optimal planning under uncertainty

28.3 Future Work (Propositional → First-order)

SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)

  • 1st-order logic (predicate $p(a,b)$)
  • We need object recognition from images (parameters $a,b$)
  • SAE with networks for object recognition (e.g. R-CNN) should achieve this

28.4 Future Work (Knowledge extraction)

AMA2 (AAE/AD) is a neural network, thus precondition/effects are in a blackbox

Hinders deriving heuristic functions

Knowledge extraction by Konidaris et al.('14,'15), δ-ILP (Deepmind) ?

29 Appendix

Using the Remaining Time (Discussion)

29.1 Konidaris et. al (2014, 2015):

Structured input (e.g. light switch, x/y-distance) ↔ unstructured image

Action Symbols (move, interact)

Just converting Semi-MDP model to a PDDL model.

Could be used for extracting PDDL from AAE/AD

29.2 NNs for solving combinatorial tasks:

TSP (Hopfield and Tank 1985)

Neurosolver for ToH (Bieszczad and Kuchar 2015)

The input/output are symbolic.

29.3 Other Work Combining Symbolic Search and NNs

Embedded NNs inside a search to provide the search control knowledge

(i.e. node evaluation function)

Sliding-tile puzzle and Rubik’s Cube (Arfaee et al. 2011)

Classical planning (Satzger and Kramer 2013)

The game of Go (AlphaGo, Silver et al. 2016)

29.4 Deep Reinforcement Learning (DRL) (Mnih et al. 2015, DQN)

DQN assumes predetermined action symbols (↑↓←→+ buttons).

DQN relies on simulators. ↔ Latplan reverse-engineers a simulator.

DQN does not work when it does not know what action is even possible!

29.5 Other Interesting Systems

SCAN system (deepmind)

  • Maps continuous latent vector and human-provided symbolic vector

    • This does not make sense. They are missing points
    • Because they don't know what symbols are for
    • They are for efficient, automated reasoning!
    • Floats are not efficient, humans shouldn't be involved

δ-ILP (Inductive Logic Programming)

  • ILP robust to noise
    • Extracting rules from AAE/AD to form a PDDL?

29.6 Why not individual pixels? Why DL?

Systems based on individual pixels lack generalization

  • Noise / variations can make the data entirely different

    noise.png

  • must acquire the generalized features
  • = a nonlinear function that recognize the entanglements between multiple pixels

29.7 Learning vs Planning

Main differences: Purposes and the abstraction layer

Machine Learning, Neural Networks

for Recognition, Reflex

  • Subsymbolic Input (continuous)

    Images, Audio, unstructured text:

  • Soft Intelligence:

      Reflex Agent, Immediate actions

    Pavlov's dog : bell → drool

    Autonomous Driving : Pedestrian → Stop.

    Machine Translation : Sentence → Sentence

    Eval. Function for Go : board → win-rate

    ☺ Efficient 1-to-1 mapping

    ☹ Simple tasks

Deliberation, Search

for Planning, Game, Theorem Proving

  • Symbolic Input/Output

    Logic, objects, induction rules

  • Hard Intelligence by Logic:

      Multi-step strategies

    Rescue Robot : actions → help the surviver

    Theorem Proving : theorems → QED

    Compiler : x86 instructions

    Game of Go : stones → Win

    ☺ Ordering constraint + complex tasks

  • AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)

29.8 Human-Competitive Systems

AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)

  • However, domain-specific – specialized in Go, "Grids" / "Stones" are known
  • Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
  • Is supervised learning necessary for human?

    True intelligence should search / collect data by itself

DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)

Domain-independent Atari Game solver (Invader, Packman…), however:

  • RL Acting: Greedily follow the learned policy → no deliberation!
  • You can survive most Atari games by reflex

29.9 Latplan Advantages

Perception based on DLNN

— Robust systems augmented by the latest DL tech

Decision Making based on Classical Planning

Better Theoretical Guarantee than Reinforcement Learning

Completeness (Finds solution whenever possible), Solution Optimality

Decision Making Independent from Learning

Unsupervised (No data required), Explainable (Search by logic)

29.10 When Latplan returns a wrong solution?

Machine learning may contain errors (convergence only on $t\rightarrow \infty$, not on real time)

  • Images → Fraud symbols/model/graph
  • Optimal path on a fraud graph or graph disconnected

    A* completeness, soundness, optimality (admissible heuristics)

  • Fraud visualized plan (noisy) / no plan found

LatPlan may make wrong observations but no wrong decisions

BTW, "correctness" is defined by error prone observations by humans anyways …

(completeness, optimality) → better reliablility than Reinforcement Learning

29.11 Reinforcement Learning

Both perception and decision making depend on training

  • When the learned policy is wrong,
    • the solution could be suboptimal
    • It could even fail to find solutions (incomplete agent)

… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.

Cade Metz. "In Two Moves, that Redifined the Future." Wired, 2016

RL may make wrong decisions.

29.12 Why symbols?

Symbols are strong abstraction mechanisms becasue

Meanings do not matter

No understanding necessary: Does a symbol $X$mean an apple or a car?

Logical reasoning by mechanical application of rules

  • Domain-independent planning : mystery vs nomystery

    Logistic domains where symbol names are mangled (truck → shark)

Composable

A latent vector is a conjunction (and)

Heuristic functions use modus ponens to derive guidance

30 This is the start of neural-symbolic AI!

latplanlogo-in-sicp.png

(yeah I really like a catchy phrase)

Author: Masataro Asai

Created: 2018-02-01 木 02:03

Validate