Classical Planning in

Deep Latent Space:

Bridging the Subsymbolic-Symbolic Boundary

Masataro Asai

The University of Tokyo

Note to Basel people: this presentation is aimed at the general audience who may not be familiar or even haven't heard about AI planning.

Made by guicho2.71828 (Masataro Asai)

1 Introduction

Hello, Machine Learning people!

Deep Learning, RL, data mining, etc…

Do you identify yourself as a connectionist? ✋

First of all, hello machine learning people. Im not a machine learning person. Im usually studying a type of symbolic AI and I visited here expecting an interesting conversation with machine learning professionals.

1.1 Introduction

Hello, Symbolic AI people!

SAT, CP, Logic, TCS, …

Have you tried any DL libraries? ✋

And also, hello other symbolic AI people. Although I am not doing exactly the same thing, we have common roots.

Let me give you one question before presenting my paper…

1.2 Introduction

I am from AI Planning community (symbolic, logic-based)

International Conference on Automated Planning and Scheduling

(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)

1.3 We are working on Automated Planners

High-level plan for agents

plan = (action₁, action₂, … action_n)

Initial state → 🚶 → 🚗 → 🔨 → Goal state

1st-Order Logic

STRIPS/PDDL

modelling language

Close-world assumption

Combinatorial Explosion

High-dimensional

implicit state space

PSPACE-Hard

Expensive, Critical Applications

Satellite Operation

Deep Space 1

Mars Exploration Rover

Soundness, Completeness, Optimality

of the algorithms are important

(compared to "planning" in other AI fields)

1.4 Introduction

Today, I talk about how to integrate DL and logic-based planning systems.

2 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

3 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

4 Sliding tile puzzle (a.k.a 8-puzzle)

Initial State

Goal State

You need to find a plan to reach the goal.

4.1 States

States as 1st-order logic formula, described in PDDL language.

Empty(x₀, y₀)

Is(panel₆, x₁, y₀)

Up(y₀, y₁), Down(y₁, y₀)…

Right(x₀, x₁), Left(x₀, x₁)…

(empty x0 y0)
(is panel6 x1 y0)
(up    y0 y1), (down y1 y0)...
(right x0 x1), (left x0 x1)...

Initial State

Goal State

4.2 Transitions / Actions

Representing sliding up → Transition Rules (Actions)

When the new location is empty:

The current location becomes empty.

New location is occupied and is not empty.

initial state

First-order Logic Formula

When Empty(x, y_old) ∧ is(panel, x, y_new) ∧ up(y_new, y_old) ;

then ¬ Empty(x,y_old) ∧ Empty(x,y_new) ∧ ¬ is(panel, x, y_new) …

PDDL Model : Actual Input to the Planner

(:action slide-up ...
 :precondition (and (empty ?x ?y-old) ...)
 :effects (and (not (empty ?x ?y-old)) (empty ?x ?y-new) ...))

4.3 Symbolic planners cannot solve an Image-based 8-puzzle

PDDL for 8-puzzle can be solved optimally << 1sec

BUT

WE

DO

NOT

HAVE

A

PDDL

MODEL!

Impractical in an unknown environment where no human is available

E.g. autonomous space exploration!

4.4 Knowledge-Acquisition Bottleneck (Cullen, 1988):

The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems

For image-based tasks, we must automate 2 processes:

1. Symbol Grounding:

Symbols = identifiable entities

Types Examples

Objects panel₇, x₀, y₀ …

Predicates (empty ?x ?y)

Propositions p₂₈ = (empty x₀ y₀)

Actions (slide-up panel₇ x₀ y₁)

Types	Examples
Objects	panel₇, x₀, y₀ …
Predicates	(empty ?x ?y)
Propositions	p₂₈ = (empty x₀ y₀)
Actions	(slide-up panel₇ x₀ y₁)

2. Action Model Acquisition (AMA):

Describe actions.

　

When Empty(x, y_old) ∧ … ;

Then ¬Empty(x,y_old) ∧ …

A Long-Standing Problem…
Now we propose a system which automate these processes…

4.5 Latent-Space Planner (Latplan)

(embedding the lambda calculus inside a neural architecture)

5 Survey of Exisiting Action Model Acquisition Techniques

Survey of Exisiting Action Model Acquisition Techniques

i.e. Systems that find action models

5.1 Limitations of Existing Systems

So far, ALL existing AMA systems require symbolic / near-symbolic, accurate state inputs and/or discrete action labels.

i.e. They need symbols to find an action model

5.2 So far, ALL existing AMA systems require symbolic inputs

ARMS (Yang AIJ07) LOCM (ICAPS09) Argall (AIJ09) Mourao (UAI12)

All taking the symbolic inputs to find the action models

5.3 Framer (ICAPS17)

Near-Symbols : Parses natural language sentences with a clear grammatical structure.

Alleviates the burden of domain experts, but still requires human
Not handling "Natural Language":

Pick up that .. ur .. parcel over there … yes, it has a label on it, it says Parcel1, you can see it from here, the Location B. Then put it in the car, I mean the truck, the red one whose rear bumper is a bit rusty.

5.4 Konidaris, Kaelbring (AAAI14, IJCAI15)

"Constructing Symbolic Representations for High-Level Planning" (AAAI14)

What it does

Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation

i.e. Model-to-Model conversion, not generating a model from the scratch

Semi-MDP contains Action Symbols

move and interact (Playroom)

Sensor inputs are structured (Labels for "State Variable" are known)

x/y-distance, light level, whether a monkey cries

→ Each sensor has a distinct meaning (no overwrap)

5.5 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16)

Handles Images, but with strong assumptions (almost symbol) e.g.

Tic-Tac-Toe with Ellipse Detectors (Bardu 10)

→ Almost immediately provides propositions

→ Also, Domain-dependent ("3x3 grid" "Ellipse" are hard-coded)

6 Problem Setting

Problem Setting

Latent-Space Planner

6.1 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

6.2 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

No Prior Knowledge : labels/symbols such as "9 tiles", "moving"

6.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

No Prior Knowledge : Domain-independent Image-based planner

Tower of Hanoi

Lights-Out

6.4 Inputs

The system is given 2 types of input:

Training Input
Planning Input

6.5 Input1: Training Input – Image Pairs

6.6 Input1: Training Input – Image Pairs

They are randomly sampled image transitions

No state descriptions (unlike AlphaGo (Silver et al. '16))
No expert traces (unlike AlphaGo)
No rewards (unlike DRL systems)
No access to the simulator (unlike DQN (Mnih et al. '15) for Atari)

→ cannot ask for more examples
No action symbols (no system to our knowledge)

　　(e.g. ↑↓←→AB in Atari, grids in Go)

→ doesn't know what's happening in each transition

6.7 Input2: Planning Input – Initial Image & Goal Image

6.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

6.9 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

7 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

8 Latent-Space Planner (LatPlan) architechture

8.1 Step 1: Propositional Symbol Grounding

8.2 Step 2: Action Model Acquisition (AMA)

8.3 Step 3: Solve the Symbolic Planning Problem

8.4 Step 4: Executing the Symbolic Plan

8.5 Step 5: Obtaining the Visualization

8.6 Summary

Latplan: Latent-space Planner.

bridges real-world and propositional representations.
performs a propositional, logical, sound reasoning.
returns a real-world output (e.g. images).
runs in a completely automated, unsupervised manner.

9 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

10 State AutoEncoder (SAE)

SAE is a neural network which provides two functions:

$b = Encode(r)$: maps a raw datum $r\;$to a bit vector $b\;$
$\tilde{r} = Decode(b)$: maps a bit vector $b\;$to a raw datum $\tilde{r}$

A bidirectional mapping between
- subsymbolic representation (image array) and
- symbolic representation
  
  (bit vectors = propositional variables)

10.1 Neural Network 101

10.2 Neural Network 101

10.3 Neural Network 101

10.4 Stochastic Gradient Descent + GPU

Plus misc techniques e.g. Batchnorm, Dropout

Pretty much everything is on the standard online tutorial / lecture cource / MOOP

Good libraries — Tensorflow, Keras — you can learn in 1-2 months

10.5 NN for Standard Classification Tasks (Supervised)

Task	Input x	Output y
Image classification	Image	Label (1=car, 2=cat, 3=monkey …)

This is not suitable for our task;
- There are no labels provided by humans, i.e.
- There are no real answers for symbol grounding
  
  　
  
  (People also ground symbols differently.
  
  cf. How many colors in a 🌈, 5,6 or 7?)

10.6 Unsupervised Learning for NN: AutoEncoder (AE)

Target Function: Identity $x=f(x)$

Encode $x\;$to a latent vector $z$
Decode $z\;$back to the input $x$
Training: Minimize $|x - f(x)|\;$ (reconstruction loss)

→ However, ✘ Latent vector Z is real-valued

INCOMPATIBLE to propositional reasoning

10.7 Variational AutoEncoder (VAE, Kingma 2014)

An AutoEncoder that enforces a certain distribution on $Z \subset \mathbb{R}^n$over the dataset $X$

You have $X=${ 10k images of apples }. If you train a Gaussian VAE on $X$, then $Z = Encode(X) \approx N(\mu,\sigma)$for some $\mu,\sigma \in \mathbb{R}^n$.

VAE needs a reparametrization trick because random distributions are non-differentiable.

Reparametrization for $N(\mu,\sigma)$: $\mu + \sigma N(0,1)$

$\mu$and $\sigma$are differentiable (trainable) vectors, $N(0,1)$is not.

10.8 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

Additional optimization penalty which enforces $Z \sim \textbf{Categorical}$:

　　　→ $z\;$converges to a 1-hot vector e.g. $\langle 0,0,1,0 \rangle$.

Example: Represent an MNIST image with 30 variables of 8 categories.

Key idea: These categorical variables are directly usable

as the source of propositional models

In particular, 2 categories → propositional variables (true/false)

10.9 State Autoencoder (before training)

10.10 State Autoencoder (after training)

11 Verifying the feasibility of Latplan system and SAE

Q: Does the SAE (neural network) produce

sound propositions for reasoning?

AMA₁ : a trivial, oracular AMA method w/o generalization.

Encode the entire raw state transitions in the environment

For each transition, perform the following conversion:

 0011 → 0101  ;; encoded bit vectors of a transition

　　　↓       ;; one action per transition
(:action       action-0011-0101
               ;; before-state is embedded directly
 :precondition (and (b0-false) (b1-false) (b2-true) (b3-true))
               ;; effect = state diff
 :effect       (and (not (b1-false)) (b1-true)
                    (not (b2-true))  (b2-false)))

11.1 Step 3: Solve the Symbolic Planning Problem

11.2 Step 4: Executing the Symbolic Plan

11.3 Step 5: Obtaining the Visualization

12 AMA₁ Experiments

SAE: trained with 20k images (note: > 360k entire states in 8-puzzle)

SAE is generalizing

AMA₁: requires the entire transistions (note: > 1M transitions in 8-puzzle)

AMA₁ is NOT generalizing, being an oracle

Planner: a State-of-the-Art, Fast Downward (Helmert, 08)

$A^*$(optimal search) : It must find an optimal solution
Runtime: ~3sec (instances are too small for symbolic systems)

12.1 8-puzzle Results with MNIST tiles (MNIST 8-puzzle)

8-puzzle using digits from MNIST database

An instance whose optimal solution length is known

→ 31 step optimal plan

12.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

MNIST 8-puzzle has cleanly separated objects -> This domain does not.

12.3 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

→ Optimal Solution

12.4 Results with photographic, unseparated tiles (Spider 8-puzzle)

Notice that latplan has no idea that this is an 8-puzzle; Thus MNIST, Mandrill, Spider are entirely different domains (state encoding is also different)

12.5 Tower of Hanoi (3 disks, 4 disks)

Completely different puzzle problems can be solved by the same system

→ Optimal Solution (7 steps,15 steps)

12.6 Lights Out

Does not assume "objects" are static (objects may disappear)

→ Optimal Solution

12.7 Twisted Lights Out

Does not assume grid-like structures

→ Optimal Solution

12.8 Handling the Noisy Input

SAE implementation uses a denoising layer (Denoising AE, Vincent 08)

→ Benefit from existing DL methods immediately

→ Improved speed, robustness, accuracy

13 Why bother the off-the-shelf planner? Shouldn't the blind search do?

Domain-independent lowerbounds work, SURPRISINGLY!

This is NOT a trivial finding!

lower-bounds are…
- taylored for man-made domains
- assumes the domain has a structure
Blind search even sometimes outperform sophisticated methods on man-made instances (Edelkamp 12)
lb works → more difficult problems can be solved (future work)

Number of states expanded, mean(stdev.)

domain	N	Dijkstra	A*+PDB
MNIST	13	210k(50k)	97k (53k)
Mandrill	12	176k(13k)	112k (57k)
Spider	13	275k(65k)	58k (30k)
Hanoi	30	20.7(29.7)	12.6 (22.0)
LightsOut	16	433(610.4)	130 (164)
Twisted	16	398(683.1)	29.3 (62.4)

N : number of instances.

14 Why Gumbel-Softmax is necessary?

Alternative 1: Use a normal NN and round the encoder output?
- ✘ The decoder is not trained with 0/1 value, cannot obtain the meaningful output
Alternative 2: Include the round operation in a NN?
- ✘ Rounding is non-differentiable / Backpropagation impossible

15 State Autoencoder Conclusion

SAE can learn from small examples

20k training images → learn to map 360k unseen images

SAE-based propositions are sound

Given the oracular AMA₁, planners can reason over the propositions

Latplan maintains the theoretical guarantee in the search algorithm

Given the complete state space graph,

Optimising algorithm (A*) returns an optimal solution
Completeness is guaranteed

16 * ☕ Break ☕ *

Background

Latplan Architecture

State AutoEncoder (SAE)

☕ Break ☕

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

17 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

18 SAE Feasible! Now what?

AMA₁ is impractical, cannot obtain entire data in the real world

AMA should

learn / generalize

from small examples

→ AMA₂, a novel neural architecture

19 The Task of AMA₂ : The Real AMA method

Input: Propositional transitions $\{ (s,t) \ldots \}$

($Encod$'ed from Training Input)

Action Symbol Grounding
```
(:action slide-up-tile7-at-x0-y1 ...
```
Action Preconditon Learning
```
:precondition (and (empty x0 y0) ...)
```

Action Effect Learning

:effects      (and (empty x0 y1) (not ...)))

19.1 Why action symbols are necessary?

Planners typically perform a forward search (Dijkstra, $A^＊$)

Without Action Symbols, successor generation becomes challenging

SAE with $|z|=36 \text{bit}$

→ Generate $2^{36}$potential successors, then filter the invalid
With action symbols, (e.g. up, down, left, right)
→ We can enumerate the candidates in constant time.

19.2 Challenge: Training Input lacks action symbols

Should identify the number of schemas from what is un/affected

19.3 Challenge: Precondition/Effect learning is not trivial

Case 1: Linear Space

We do not need an action label; it is linear anyways
Then, AMA ≡ prediction task ($\approx$scene prediction in video)
- We can train a neural network $a(s) = t\;$by minimizing $|t-a(s)|$
- Most DL literature follows this path

19.4 Challenge: Precondition/Effect learning is not trivial

Case 2: Graphs

Multiple action labels
Varying number of successors

for each state
Cannot train $a(s) = t\;$for each $a$ because we don't know which $(s,t)$belongs to which $a$.

Correct Question: What is the right function to learn?

20 AMA₂ Overview

20.1 AMA₂ Overview

20.2 AMA₂ Overview

20.3 AMA₂ Overview

20.4 AMA₂ Overview

21 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

22 Action AutoEncoder

What is the right function to learn?

$a(s) = t$?

We cannot train this

$a$is a variable!

$apply(a,s) = t$

Transition is a mapping

action $a → $ successor $t$

conditioned by

the current state $s$
The right function to learn!

22.1 Implementing $(a \rightarrow t) | s$

22.2 Implementing $(a \rightarrow t) | s$

22.3 Implementing $(a \rightarrow t) | s$

22.4 Implementing $(a \rightarrow t) | s$

23 Overview

Background

Latplan Architecture

State AutoEncoder (SAE)

break

AMA₂ Overview

Action AutoEncoder (AAE)

Action Discriminator (AD)

24 Action Discriminator learns preconditions

24.1 Action Discriminator learns Preconditions = Binary Classifer

Binary classifier telling which transitions are valid

Trained by a positive and negative dataset

We have data for valid transitions (observed data).
Where are the negative (invalid) datasets?

24.2 Negative dataset

Invalid transitions: All transitions that are not valid.
- We lack the definition; Cannot be generated.
Can we collect the invalid data?
- Teleportation violates the laws of physics. (at least in a macro scale)

Invalid transitions never happen → an autonomous agent cannot possibly collect data in the real world.
Collecting a negative dataset is fundamentally impossible

24.3 Solution: PU-Learning (Elkan & Noto, KDD 08')

Training a positive & negative classifier from the positive & unlabelled datasets.

Positive: Training Input. Observed data are valid, guaranteed
Unlabelled: Successor candidates generated by the AAE
- Some are valid, some are invalid

Now we can train the AD!

(SD explanation: skipped)

25 Planning using AMA₂

AAE enumerates the canditates; AD/SD filters the invalid.

\begin{align*} Succ(s) &= \{t = apply(a,s) \; | \; a \in \{0\ldots 127\},\\ & \qquad \land AD(s,t) \geq 0.5 \\ & \qquad \land SD(t) \geq 0.5 \} \end{align*}

We have action symbols and propositional states!

Now we can run $A^*$with goal-count heuristics

26 AMA₂ Experiments 1

Is it feasibile to do planning with AAE and AD?

100 instances for each domain
- self-avoiding random walks from the goal state
- (benchmark A) 7-step
- (benchmark B) 14-step
180 sec. time limit
Domain-specific plan validators.

26.1 Results

Noise are applied to the planning inputs (init/goal images)

G: Gaussian noise, s/p: salt/pepper noise

Easy instances: Majority of instances are solved
Harder instances: Still many instances are solved

/	<		>	<		>
step	7	7	7	14	14	14
noise	std	G	s/p	std	G	s/p
MNIST	72	64	64	6	4	3
Mandrill	100	100	100	9	14	14
Spider	94	99	98	29	36	38
LightsOut	100	99	100	59	60	51
Twisted LO	96	65	98	75	68	72

27 AMA₂ Experiments 2

How accurate are Action Discriminators?

Measure the type-1 / type-2 error in %

	type1	type2
MNIST	1.55	6.15
Mandrill	1.10	2.93
Spider	1.22	4.97
L. Out	0.03	1.64
Twisted	0.02	1.82
Hanoi	0.25	3.79

(AD type-1): Generate all valid transitions and count the number of misclassification.
(AD type-2): For 1000 randomly selected valid states, generate all successors, prune by SD, remove the valid transitions (w/ validator), then count the transitions misclassified as valid.

Reasonably accurate.

28 Conclusion

Latplan: The first system which completely automatically generates a classical, symbolic planning model from scratch
State AutoEncoder(SAE) : real-world states ↔ propositional states

Action AutoEncoder(AAE) : transitions ↔ action labels, effects

Action Discriminator(AD) : transition → bool with PU-Learning

Two of the major grounding problems were solved!

Types of symbols

Propositional symbols Solved!

Action symbols Solved!

Object symbols Computer Vision techniques?

Predicate symbols ???

Future work: improved accuracy, runtime efficiency.

This opens many avenue for future research.

Types of symbols
Propositional symbols	Solved!
Action symbols	Solved!
Object symbols	Computer Vision techniques?
Predicate symbols	???

28.1 Future Work (Input format)

LatPlan is an architecture : Any system with SAE is an implementation

Different SAE → reason about different types of raw data

Autoencoders for text, audio [Li 2015, Deng 2010]
Example:
- Transition rule "This is an apple, this is a pen → oh, ApplePen!"
When this happens, Latplan brings AI to a new level e.g.

1000 steps of natural language reasoning with logic, not by reflex
- Fundamentally impossible for short-sighted / greedy agents

"A hierarchical neural autoencoder for paragraphs and documents." (2015)

"Binary coding of speech spectrograms using a deep auto-encoder." (2010)

28.2 Future Work (Extended planning formalism)

Latplan assumes nothing about the environment machinery (grids, movable tiles…)
Latplan assumes fully-observable, deterministic domains
Next step: Extending Latplan to MDP, POMDP
- Gumbel-Softmax layer → just a Softmax layer? (probability)
- $AO^*$, $LAO^*$algorithms for optimal planning under uncertainty

28.3 Future Work (Propositional → First-order)

SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)

1st-order logic (predicate $p(a,b)$)
We need object recognition from images (parameters $a,b$)
SAE with networks for object recognition (e.g. R-CNN) should achieve this

28.4 Future Work (Knowledge extraction)

AMA₂ (AAE/AD) is a neural network, thus precondition/effects are in a blackbox

Hinders deriving heuristic functions

Knowledge extraction by Konidaris et al.('14,'15), δ-ILP (Deepmind) ?

29 Appendix

Using the Remaining Time (Discussion)

29.1 Konidaris et. al (2014, 2015):

Structured input (e.g. light switch, x/y-distance) ↔ unstructured image

Action Symbols (move, interact)

Just converting Semi-MDP model to a PDDL model.

Could be used for extracting PDDL from AAE/AD

29.2 NNs for solving combinatorial tasks:

TSP (Hopfield and Tank 1985)

Neurosolver for ToH (Bieszczad and Kuchar 2015)

The input/output are symbolic.

29.3 Other Work Combining Symbolic Search and NNs

Embedded NNs inside a search to provide the search control knowledge

(i.e. node evaluation function)

Sliding-tile puzzle and Rubik’s Cube (Arfaee et al. 2011)

Classical planning (Satzger and Kramer 2013)

The game of Go (AlphaGo, Silver et al. 2016)

29.4 Deep Reinforcement Learning (DRL) (Mnih et al. 2015, DQN)

DQN assumes predetermined action symbols (↑↓←→+ buttons).

DQN relies on simulators. ↔ Latplan reverse-engineers a simulator.

DQN does not work when it does not know what action is even possible!

29.5 Other Interesting Systems

SCAN system (deepmind)

Maps continuous latent vector and human-provided symbolic vector
- This does not make sense. They are missing points
- Because they don't know what symbols are for
- They are for efficient, automated reasoning!
- Floats are not efficient, humans shouldn't be involved

δ-ILP (Inductive Logic Programming)

ILP robust to noise
- Extracting rules from AAE/AD to form a PDDL?

29.6 Why not individual pixels? Why DL?

Systems based on individual pixels lack generalization

Noise / variations can make the data entirely different
must acquire the generalized features
= a nonlinear function that recognize the entanglements between multiple pixels

29.7 Learning vs Planning

Main differences: Purposes and the abstraction layer

Machine Learning, Neural Networks

for Recognition, Reflex

Subsymbolic Input (continuous)

Images, Audio, unstructured text:
Soft Intelligence:

　 Reflex Agent, Immediate actions

Pavlov's dog : bell → drool

Autonomous Driving : Pedestrian → Stop.

Machine Translation : Sentence → Sentence

Eval. Function for Go : board → win-rate

☺ Efficient 1-to-1 mapping

☹ Simple tasks

Deliberation, Search

for Planning, Game, Theorem Proving

Symbolic Input/Output

Logic, objects, induction rules
Hard Intelligence by Logic:

　 Multi-step strategies

Rescue Robot : actions → help the surviver

Theorem Proving : theorems → QED

Compiler : x86 instructions

Game of Go : stones → Win

☺ Ordering constraint + complex tasks

AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)

29.8 Human-Competitive Systems

AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)

However, domain-specific – specialized in Go, "Grids" / "Stones" are known
Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
Is supervised learning necessary for human?

True intelligence should search / collect data by itself

DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)

Domain-independent Atari Game solver (Invader, Packman…), however:

RL Acting: Greedily follow the learned policy → no deliberation!
You can survive most Atari games by reflex

29.9 Latplan Advantages

Perception based on DLNN

— Robust systems augmented by the latest DL tech

Decision Making based on Classical Planning

— Better Theoretical Guarantee than Reinforcement Learning

Completeness (Finds solution whenever possible), Solution Optimality

— Decision Making Independent from Learning

Unsupervised (No data required), Explainable (Search by logic)

29.10 When Latplan returns a wrong solution?

Machine learning may contain errors (convergence only on $t\rightarrow \infty$, not on real time)

Images → Fraud symbols/model/graph
Optimal path on a fraud graph or graph disconnected

A* completeness, soundness, optimality (admissible heuristics)
Fraud visualized plan (noisy) / no plan found

LatPlan may make wrong observations but no wrong decisions

BTW, "correctness" is defined by error prone observations by humans anyways …

(completeness, optimality) → better reliablility than Reinforcement Learning

29.11 Reinforcement Learning

Both perception and decision making depend on training

When the learned policy is wrong,
- the solution could be suboptimal
- It could even fail to find solutions (incomplete agent)

… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.

Cade Metz. "In Two Moves, that Redifined the Future." Wired, 2016

RL may make wrong decisions.

29.12 Why symbols?

Symbols are strong abstraction mechanisms becasue

Meanings do not matter

No understanding necessary: Does a symbol $X$mean an apple or a car?

Logical reasoning by mechanical application of rules

Domain-independent planning : mystery vs nomystery

Logistic domains where symbol names are mangled (truck → shark)

Composable

A latent vector is a conjunction (and)

Heuristic functions use modus ponens to derive guidance

30 This is the start of neural-symbolic AI!

(yeah I really like a catchy phrase)

1 Introduction

1.1 Introduction

1.2 Introduction

1.3 We are working on Automated Planners

1.4 Introduction

2 Overview

3 Overview

4 Sliding tile puzzle (a.k.a 8-puzzle)

4.1 States

4.2 Transitions / Actions

4.3 Symbolic planners cannot solve an Image-based 8-puzzle

4.4 Knowledge-Acquisition Bottleneck (Cullen, 1988):

4.5 Latent-Space Planner (Latplan)

5 Survey of Exisiting Action Model Acquisition Techniques

5.1 Limitations of Existing Systems

5.2 So far, ALL existing AMA systems require symbolic inputs

5.3 Framer (ICAPS17)

5.4 Konidaris, Kaelbring (AAAI14, IJCAI15)

5.5 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16)

6 Problem Setting

6.1 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

6.2 Task: Solving an Imaged-Based 8-puzzle w/o Prior Explicit Knowledge

6.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

6.4 Inputs

6.5 Input1: Training Input – Image Pairs

6.6 Input1: Training Input – Image Pairs

6.7 Input2: Planning Input – Initial Image & Goal Image

6.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

6.9 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge

7 Overview

8 Latent-Space Planner (LatPlan) architechture

8.1 Step 1: Propositional Symbol Grounding

8.2 Step 2: Action Model Acquisition (AMA)

8.3 Step 3: Solve the Symbolic Planning Problem

8.4 Step 4: Executing the Symbolic Plan

8.5 Step 5: Obtaining the Visualization

8.6 Summary

9 Overview

10 State AutoEncoder (SAE)

10.1 Neural Network 101

10.2 Neural Network 101

10.3 Neural Network 101

10.4 Stochastic Gradient Descent + GPU

10.5 NN for Standard Classification Tasks (Supervised)

10.6 Unsupervised Learning for NN: AutoEncoder (AE)

10.7 Variational AutoEncoder (VAE, Kingma 2014)

10.8 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)

10.9 State Autoencoder (before training)

10.10 State Autoencoder (after training)

11 Verifying the feasibility of Latplan system and SAE

11.1 Step 3: Solve the Symbolic Planning Problem

11.2 Step 4: Executing the Symbolic Plan

11.3 Step 5: Obtaining the Visualization

12 AMA1 Experiments

12.1 8-puzzle Results with MNIST tiles (MNIST 8-puzzle)

12.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

12.3 Results with photographic, unseparated tiles (Mandrill 8-puzzle)

12.4 Results with photographic, unseparated tiles (Spider 8-puzzle)

12.5 Tower of Hanoi (3 disks, 4 disks)

12.6 Lights Out

12.7 Twisted Lights Out

12.8 Handling the Noisy Input

13 Why bother the off-the-shelf planner? Shouldn't the blind search do?

14 Why Gumbel-Softmax is necessary?

15 State Autoencoder Conclusion

16 * ☕ Break ☕ *

17 Overview

18 SAE Feasible! Now what?

19 The Task of AMA2 : The Real AMA method

19.1 Why action symbols are necessary?

19.2 Challenge: Training Input lacks action symbols

19.3 Challenge: Precondition/Effect learning is not trivial

19.4 Challenge: Precondition/Effect learning is not trivial

20 AMA2 Overview

20.1 AMA2 Overview

20.2 AMA2 Overview

20.3 AMA2 Overview

20.4 AMA2 Overview

21 Overview

22 Action AutoEncoder

12 AMA₁ Experiments

19 The Task of AMA₂ : The Real AMA method

20 AMA₂ Overview

20.1 AMA₂ Overview

20.2 AMA₂ Overview

20.3 AMA₂ Overview

20.4 AMA₂ Overview

25 Planning using AMA₂

26 AMA₂ Experiments 1

27 AMA₂ Experiments 2