Classical Planning in Deep Latent Space: From Unlabelled Images to PDDL (and back)
Masataro Asai, Alex Fukunaga, The University of Tokyo
20+ min
Made by guicho2.71828 (Masataro Asai)
1
Hello, Machine Learning people!
First of all, hello machine learning people. Im not a machine learning person. Im usually studying a type of symbolic AI and I visited here expecting an interesting conversation with machine learning professionals.
2
Hello, Logic Programming, Data Mining, AGI, Theoretical CS people, too!
And also, hello other symbolic AI people. Although I am not doing exactly the same thing, we have common roots.
Let me give you one question before presenting my paper…
3 How many people have learned AI with these books?
- Nice.
4
I am from Planning community
International Conference on Automated Planning and Scheduling
(AAAI sister conference, 33% avg. accept ratio, since 1990, ECP+AIPS→ICAPS)
5 Planning people are working on Automated Planners
- Compute the high-level plan for an agent: plan = (action1, action2, … actionn)
- Search-based planning projects many steps further into the future than RL
- ↔ reflex (0-step), limited deliberation (1,2-steps)
- High-dimensional, implicit state space search
- Entire space does not fit in memory, only transition rules are given
- STRIPS-like PDDL modelling language, based on
- 1st-order logic, close-world assumption
- Soundness, completeness, optimality matters
- Expensive Applications: Satellite operations (Deep Space 1), Mars Exploration Rover (MER), underwater vehicles
6 High-level Planning: Sliding tile puzzle (a.k.a 8-puzzle)
- Slide a tile to an empty place (each step)
- → make the tile placement identical to the goal
- → (abstracts the low level control e.g. arm motion)
Search states represented by conjunctions of 1st-order logic (objects, predicates, propositions)
(empty x0 y0) ; or, empty(x0, y0) (is panel6 x1 y0) ; or, is(panel6, x1, y0) (up y0 y1)... (down y1 y0)... (right x0 x1)... (left x0 x1)...
initial state
goal state
To symbolic systems names do not matter (just matters to human) as long as there is consistency
(car banana0 kiwi0) ; numbers (0,1) don't matter either (shark bogo6 banana1 kiwi0) ; just to make sense for us (apple kiwi0 kiwi1)... (pinapple kiwi1 kiwi0)... (pen banana0 banana1)... (applepen banana0 banana1)...
6.1 High-level Planning Task: State Transitions
How to represent "sliding up tile 7" ? → actions
Transition Rules of "sliding up":
- You can slide up a tile only to an empty location.
- When you slide it, the orignal location becomes empty.
- New location is no longer empty, is occupied by the tile.
initial state
When Empty(x, yold) ∧ is(panel, x, ynew) ∧ up(ynew, yold) ;
then ¬ Empty(x,yold) ∧ Empty(x,ynew) ∧ ¬ is(panel, x, ynew) ∧ is(panel, x, yold)
;; Translates to a PDDL model below: (:action slide-up ... :precondition (and (empty ?x ?y-old) (is ?panel ?x ?y-new) ...) :effects (and (not (empty ?x ?y-old)) (empty ?x ?y-new) (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))
But where does this representation come from?
6.2 Solving these problems: Heuristic state-space search
6.3 High-level Planning Task: State Transitions
How to represent "sliding up tile 7" ? → actions
Transition Rules of "sliding up":
- You can slide up a tile only to an empty location.
- When you slide it, the orignal location becomes empty.
- New location is no longer empty, is occupied by the tile.
initial state
When Empty(x, yold) ∧ is(panel, x, ynew) ∧ up(ynew, yold) ;
then ¬ Empty(x,yold) ∧ Empty(x,ynew) ∧ ¬ is(panel, x, ynew) ∧ is(panel, x, yold)
;; Translates to a PDDL model below: (:action slide-up ... :precondition (and (empty ?x ?y-old) (is ?panel ?x ?y-new) ...) :effects (and (not (empty ?x ?y-old)) (empty ?x ?y-new) (not (is ?panel ?x ?y-new)) (is ?panel ?x ?y-old)))
But where does this representation come from?
6.4 PDDL representations come from human encoding.
Knowledge-Acquisition Bottleneck (Cullen, 1988):
The cost of human involved for converting real-world problems into inputs for domain-independent symbolic systems
Symbols: Labels for identifiable entities (it's not just about objects!)
Types of symbols Object symbols panel7, x0, y0 … Predicate symbols (empty ?x ?y) (up ?y0 ?y1) Propositions p28 = (empty x0 y0) — application Action symbols (slide-up panel7 x0 y1) Action Model: Description of Actions using Symbols
When Empty(x, yold) ∧ … ; Then /¬ Empty(x,yold) ∧ / …
The knowledge acquisition bottleneck: time for reassessment? : Cullen, J and Bryman, A Expert Syst. Vol 5 No 3 (August 1988) pp 216-225
6.5 Can we solve an 8-puzzle optimally, w/o human encoding?
- 3x3 Image-based Sliding Tile Puzzle: maximum 31 steps optimal plan
- 4x4、5x5 : infeasible under blind search (memory exhaust)
- Symbolic systems can compute optimal solutions w/ A* + admissible heuristics
BUT
WE
DO
NOT
HAVE
SYMBOLS
NOR
AN ACTION MODEL!
7 Backgrounds
Survey of Exisiting Action Model Acquisition Techniques
i.e. Systems that find action models
7.1 Limitations of Existing Systems
So far, ALL existing AMA systems require symbolic / near-symbolic, accurate state inputs and/or discrete action labels.
i.e. They need symbols to find an action model
7.2 ARMS (Action-Relation Modelling System) (Yang AIJ07)
Taking the symbolic inputs
7.3 LOCM (ICAPS09)
Taking the symbolic inputs
7.4 Framer (ICAPS17)
Near-Symbols : Parses natural language sentences with a clear grammatical structure.
- Alleviates the burden of domain experts, but still requires human
Not handling "Natural Language":
Pick up that parcel over there … yeah, it has a label on it, it says Parcel1, you can see it from here, the Location B. Then put it in the car, I mean the truck, the red one.
7.5 Konidaris, Kaelbring (AAAI14, IJCAI15)
"Constructing Symbolic Representations for High-Level Planning" (AAAI14)
- What it does
Converting a Semi-MDP Model to a PDDL Model by set-theoretic representation
i.e. Model-to-Model conversion, not generating a model from the scratch
- Semi-MDP contains Action Labels
move
andinteract
(Playroom)- Sensor inputs are structured (Labels for "State Variable" are known)
x/y-distance, light level, whether a monkey cries
→ Each sensor has a distinct meaning (no overwrap)
7.6 Learning from Observation (Argall AIJ09, Mourao UAI12) skip
Noisy, incomplete, but symbolic states/actions
7.7 Learning from Video for Board Game (Bardu ICRA10; Kaiser AAAI12; Kirk 16) skip
- Handles Images, but with strong assumptions (almost symbol)
e.g. Understands Tic-Tac-Toe with 3x3 Ellipse Detector (Bardu 10)
Almost immediately provides propositions
Domain-dependent ("3x3 grid" is hard-coded)
8 Problem Description
We do not have SYMBOLS nor ACTION MODELS
So we cannot apply existing AMA methods!
8.1 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge
No Prior Knowledge : labels/symbols such as "9 tiles", "moving"
8.2 Task: Solving Imaged-Based 8-puzzle w/o Prior Explicit Knowledge
No Prior Knowledge : labels/symbols such as "9 tiles", "moving"
8.3 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge
No Prior Knowledge : Domain-independent Image-based planner
Tower of Hanoi
Lights-Out
8.4 Input1: Training Inputs – Image Pairs
Image pairs showing the before/after states of valid actions (randomly sampled)
8.5 Input1: Training Inputs – Image Pairs
8.6 Input2: Planning Inputs – Initial Image & Goal Image
Visual depiction of the initial state and a single goal state
8.7 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge
8.8 Task: Solving ANY Imaged-Based tasks w/o Prior Explicit Knowledge
9 Our Core Contribution : State AutoEncoder (SAE)
SAE is a neural network which provides two functions:
- $b = Encode(r)$: Function that maps a raw datum $r\;$to a bit vector $b\;$(propositions)
- $\tilde{r} = Decode(b)$: Function that maps a bit vector $b\;$to a raw datum $\tilde{r}$
- A bidirectional mapping between subsymbolic & symbolic representation
9.1 AutoEncoder : Unsupervised Learning
Auto = "self" — Autoencoding = "encoding itself"
- Target Function: Identity $x=f^*(x)$
- Compression: $X \rightarrow Z \rightarrow X$
- Map the input $x\;$to a latent vector $z$, then back to $x$
- Extract the de/compressor: $Encode: x \mapsto z, Decode: z \mapsto x$
→ However, ✘ Latent vector is real-valued
INCOMPATIBLE to classical planning
9.2 Gumbel-Softmax VAE (Jang, Gu, ICLR2017)
A reparametrization trick for categorical distribution: 1-hot vector e.g. $\langle 0,0,0,1,0,0 \rangle$.
Below: Represent an MNIST image with 30 variables of 8 categories.
Key idea: These categorical variables are directly usable
as a source of PDDL/SAS
In particular, 2 categories → propositional variables (true/false)
9.3 State Autoencoder (before training)
9.4 State Autoencoder (after training)
10 Using SAE, we propose LatPlan, an architechture
and LatPlanα, an implementation
10.1 Step 1: State Autoencoder (Core Contribution)
A neural network bridging the Symbolic/Subsymbolic boundary
10.2 Step 2: Action Model Acquisition (NOT our contribution)
Bitvectors → PDDL domain description
10.2.1 Our PDDL generation in LatPlan α (NOT our contribution)
Placeholder for existing/future Action Model Acquisition methods
Individual actions are mapped to PDDL actions (trivial Actiom Model Acquisition)
0011 → 0101 ↓ (:action ... :precondition (and (b0-false) (b1-false) (b2-true) (b3-true)) ; before-state :effect (and (not (b1-false)) (b1-true) ; state diff (not (b2-true)) (b2-false)))
Conversion from a bit to a proposition:
$i$-th bit is 1 → Proposition ($b_i$-true)
$i$-th bit is 0 → Proposition ($b_i$-false)
Disclaimer: We have new AMA method based on NN (we do not disclose it yet)
10.2.2 Example PDDL
(define (domain latent) (:requirements :strips) (:predicates (b0-true) (b0-false) (b1-true) ... (b24-false)) (:action a10000010010110111100011111000010001011111110011111 :parameters () :precondition (and (b0-true) (b1-false) (b2-false) ... (b24-true)) :effect (and (not (b5-false)) (b5-true) (not (b6-true)) (b6-false) (not (b13-false)) (b13-true) (not (b20-false)) (b20-true))) (:action a10000010010110111100011110000001001011011110001110 ...
10.2.3 Step 2: Action Model Acquisition (NOT our contribution)
Bitvectors → PDDL domain description
10.3 Step 3: Solve the PDDL instance w/ off-the-shelf planner
10.4 Step 4: Executing the symbolic plan
We get an incomprehensive state sequence
10.5 Step 5: Mapping the plan to images (human-comprehensive)
11 Results (MNIST 8-puzzle)
8-puzzle using digits from MNIST database
An instance whose the optimal solution length is known
→ 31 step optimal plan
11.1 Results with photographic, unseparated tiles (Mandrill 8-puzzle)
MNIST 8-puzzle has cleanly separated objects -> This domain does not.
11.2 Results with photographic, unseparated tiles (Mandrill 8-puzzle)
→ Optimal Solution
11.3 Tower of Hanoi (3 disks, 4 disks)
Completely different puzzle problem can be solved with no change
→ Optimal Solution (7 steps,15 steps)
11.4 Lights Out
Completely different puzzle problem can be solved with no change
→ Optimal Solution
11.5 Twisted Lights Out
Does not assume grid-like structures
→ Optimal Solution
11.6 Handling the Noisy Input
→ Optimal Solutions
12 Why bother the off-the-shelf planner? Shouldn't the blind search do?
Domain-independent heuristics works, SURPRISINGLY!
This is NOT a trivial finding!
Heuristics are…
- taylored for man-made domains
- assuming a certain structure
PDB may even sometimes lose to Blind on IPC instances (Edelkamp 12)
- Would allow to solve the more difficult problems (future work)
Expanded nodes | Dijkstra | A*+PDB | Speedup |
---|---|---|---|
MNIST 8-puzzle | 193924 | 109096 | x2 |
MNIST 8-puzzle | 201156 | 111642 | x2 |
MNIST 8-puzzle | 186767 | 84561 | x2 |
MNIST 8-puzzle | 183336 | 82518 | x2 |
MNIST 8-puzzle | 169907 | 52084 | x3 |
MNIST 8-puzzle | 130863 | 26967 | x5 |
Hanoi (4 peg) | 55 | 17 | x3 |
LightsOut (4x4) | 952 | 27 | x30 |
Spiral LightsOut (3x3) | 522 | 214 | x2.5 |
Mandrill 8-puzzle | 335378 | 88851 | x4 |
→ Leverage the effort from ICAPS community!
13 Conclusion
- Input: Unlabelled pairs of images, initial image, goal image
- Output: Visualized plans to achieve the goal
- State AutoEncoder(SAE): VAE with Gumbel-Softmax
Contribution : SAE generates propositions from raw data
→ compatible to the symbolic KE systems, classical planners
Latplan α : A simplified prototype
→ Can leverege from the development of both symbolic/subsymbolic research
13.1 Future Work (input format)
LatPlan is an architecture : Any system with SAE is a LatPlan implementation
Different SAE allows LatPlan to reason about different types of raw data
- Autoencoders for unstructured text [Li et.al. 2015], audio [Deng, Li, et al. 2010]
- Examples:
This is an apple, this is a pen → oh, ApplePen! (ugh embarassing)
when actions resembling "word concatenation" was learned
SAE + Automated Planning will bring AI to a whole new level
e.g. 1000 steps of optimal reasoning over natural language
Hire me! I'm finding a Postdoc position ☺
"A hierarchical neural autoencoder for paragraphs and documents." (2015)
"Binary coding of speech spectrograms using a deep auto-encoder." (2010)
14 Appendix
14.1 I expect mixed responses such as:
- Wait, what!? You solved the symbol grounding!? (No)
Huh, I hate deep learning hype, NN cannot be trusted.
(LatPlan can be trusted)
This is not an action model acquisition / not finding symbols.
(We don't do AML; we find different symbols)
14.2 Did we find symbols? It doesn't sound like what I think symbols are
You solved the symbol grounding!? / This is not finding symbols. / This isn't a action model acquisition.
PDDL implies there are several kinds of symbols and we solved only one issue
Each issue requires a different approach. LatPlan should be combined with these work.
Types of symbols addressed by Propositions SAE Objects R-CNN (Computer Vision) Predicates ??? Actions ???
14.3 Did we find symbols? Why not individual pixels? Why DL?
Systems based on individual pixels lack generalization
Noise / variations can make the data entirely different
- must acquire the generalized features
- = a nonlinear function that recognize the entanglements between multiple pixels
14.4 Learning vs Planning
Main differences: Purposes and the abstraction layer
Machine Learning, Neural Networks
for Recognition, Reflex
Subsymbolic Input (continuous)
Images, Audio, unstructured text:
Soft Intelligence:
Reflex Agent, Immediate actions
Pavlov's dog : bell → drool
Autonomous Driving : Pedestrian → Stop.
Machine Translation : Sentence → Sentence
Eval. Function for Go : board → win-rate
☺ Efficient 1-to-1 mapping
☹ Simple tasks
Deliberation, Search
for Planning, Game, Theorem Proving
Symbolic Input/Output
Logic, objects, induction rules
Hard Intelligence by Logic:
Multi-step strategies
Rescue Robot : actions → help the surviver
Theorem Proving : theorems → QED
Compiler : x86 instructions
Game of Go : stones → Win
☺ Ordering constraint + complex tasks
- AlphaGo = Subsymbolic (DLNN eval. function) + Symbolic (MCTS)
14.5 Human-Competitive Systems
AlphaGo = Subsymbolic (NN eval. func) + Symbolic (MCTS)
- However, domain-specific – specialized in Go, "Grids" / "Stones" are known
- Huge expert trace DB — Not applicable when data are scarse (e.g. space exploration)
Is supervised learning necessary for human?
True intelligence should search / collect data by itself
DQN = Subsymbolic (DLNN) + Reinforcement Learning (DLNN)
Domain-independent Atari Game solver (Invader, Packman…), however:
- RL Acting: Greedily follow the learned policy → no deliberation!
- You can survive most Atari games by reflex
14.6 Latplan Advantages
Perception based on DLNN
— Robust systems augmented by the latest DL tech
Decision Making based on Classical Planning
— Better Theoretical Guarantee than Reinforcement Learning
Completeness (Finds solution whenever possible), Solution Optimality
— Decision Making Independent from Learning
Unsupervised (No data required), Explainable (Search by logic)
14.7 When Latplan returns a wrong solution?
Machine learning may contain errors (convergence only on $t\rightarrow \infty$, not on real time)
- Images → Fraud symbols/model/graph
Optimal path on a fraud graph or graph disconnected
A* completeness, soundness, optimality (admissible heuristics)
- Fraud visualized plan (noisy) / no plan found
LatPlan may make wrong observations but no wrong decisions
BTW, "correctness" is defined by error prone observations by humans anyways …
(completeness, optimality) → better reliablility than Reinforcement Learning
14.7.1 Reinforcement Learning
Not only perception but decision making also depends on training
- Each training result does not have admissibility
- When the learned policy is wrong, the solution could be suboptimal
… AlphaGo was unprepared for Lee Sedol’s Move 78 because it didn’t think that a human would ever play it.
Cade Metz. "In Two Moves, that Redifined the Future." Wired, 2016
RL may make wrong decisions.
14.8 Future Work (SAE)
SAE can generate propositional symbols (state $s = \{q,r\ldots\}$)
- 1st-order logic (predicate $p(a,b)$)
- We need object recognition from images (parameters $a,b$)
- SAE with networks for object recognition (e.g. R-CNN) should achieve this
14.9 SAE implementation in LatPlan α
Keras, Adam optimizer (learning rate:0.001)
1764(42x42)
[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2
→FC(49,GumbelSoftmax) (variational loss)
[→FC(4000,ReLu)→Batchnorm→Dropout(0.4)] × 2
→1764(42x42) (loss: Binary crossentropy)
- Why full-connected layers ?
Main focus is on whether propositions made by SAE are feasible
→No complication due to CNN, ResNet, etc… (but I have a working implementation)
- Training on 8-puzzle
- 12000 images out of entire states (362880) → Generalization
14.10 Domain Acquisition implementation in LatPlan α (NOT our core contribution)
Entire transitions $R$ → $Encode(R)$→ PDDL
Why? → Trivial domain acquisition, no generalization
Ground actions 0011 → 0101 (state variables are fully specified) Generalized actions *01* → *10* (state variables are partially specified)
- LatPlanα: No generalization wrto domain acquisition
Main focus is on SAE: whether propositions made by SAE are feasible
→No complication due to domain acquisition
Developping a generalizer is an entirely different topic, we leave it to existing work
- We made flour and demonstrated a sugarless cookie with it.
Others study how to make the most beautiful piece of chocolate cake from given flour
— which is future work. (flour = symbol)
Domain acquisition typically requires exisiting models (such as Semi-MDP)
We made inputs for existing work.
[Konidaris et.al. 14; Cresswell et al 13]
14.11 Why symbols?
Symbols are strong abstraction mechanisms becasue
- Meanings do not matter
You do not have to understand it: Does a symbol $X$mean an apple or a car?
Logical reasoning can be performed by mechanical application of rules
Domain-independent planning : mystery vs nomystery
Logistic domains where symbol names are mangled (truck → shark)
- Composable
A latent vector is a conjunction (and)
Heuristic functions use modus ponens to derive guidance
14.12 More details
GTX1070, PhenomII X6 (3.4GHz OC), 16GB Mem
- Training: ~30 min
- Solving: ~3 sec
14.12.1 State AutoEncoder (Train data)
1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$
14.12.2 State AutoEncoder (Validation)
1: $x$2: $z$3: $y$4: $round(z)$5: $Decode(round(z))$
14.12.3 State AutoEncoder
Input 2: intial/goal images
14.12.4 PDDL Domain Definition
Examples in $N=25$(in the paper we bypassed PDDL-SAS converter though)
(define (domain latent) (:requirements :strips :negative-preconditions) (:predicates (z0) (z1) (z2) (z3) (z4) (z5) (z6) (z7) (z8) (z9) (z10) (z11) (z12) (z13) (z14) (z15) (z16) (z17) (z18) (z19) (z20) (z21) (z22) (z23) (z24)) (:action a10000010010110111100011111000010001011111110011111 :parameters () :precondition (and (z0) (not (z1)) (not (z2)) (not (z3)) (not (z4)) (not (z5)) (z6) (not (z7)) (not (z8)) (z9) (not (z10)) (z11) (z12) (not (z13)) (z14) (z15) (z16) (z17) (not (z18)) (not (z19)) (not (z20)) (z21) (z22) (z23) (z24)) :effect (and (z5) (not (z6)) (z13) (z20))) (:action a10000010010110111100011110000001001011011110001110 ...
14.12.5 Results in one place
14.13 Does it have something to do with symbolic planning (BDDs) ?
No.