Thema4590

Efficient Uncertainty Aware Latent Model-Based Optimization

Informationen zur Arbeit

Abschlussarbeitstyp: Master
Betreuer: Mohammd Karam Daaboul
Forschungsgruppe: Angewandte Technisch-Kognitive Systeme
Partner: FZI
Archivierungsnummer: 4590
Abschlussarbeitsstatus: Offen
Beginn: 15. November 2020
Abgabe: unbekannt

Weitere Informationen

Model-Based Reinforcement Learning (MBRL) methods learn a model of the dynamics of In this model, policy optimization is then performed. These methods are more efficient than their model-free counterparts. They can be applied to real-world tasks where low sample complexity is crucial for a successful application. On the other side of the coin, model-based methods have to learn a global model of the system, which can be extremely challenging for complex robotic tasks. To learn a global model, a large data set should be collected, and the variety of these data should be so large. Policies tend to exploit uncertain regions of the model that are usually not aligned to the real environment. This disagreement then leads to catastrophic failures and poor performance in the real world. Another significant problem with MBRL is that most algorithms learn directly from the state space of the environment and do not work well with high-dimensional input spaces, such as images. Visual control tasks, such as a robot learning to stack Lego cubes with only camera images as input, complicate the problem as the input space becomes high-dimensional and complex. A promising approach to deal with high-dimensional observations is to find a representation that summarizes the high-dimensional observations in a way that can be used as a state.

Task

The goal of this thesis is to use MBRL to train an efficient policy for robot tasks. We will try to solve four challenges:

Collect a lot of diverse data to learn a global model.
Reduce the effect of model uncertainty on policy optimization by punishing the policy to visit an area where the uncertainty in the model is enormous.
Learn smooth, structured embedding of the observation space.
Train a Policy to drive a robot optimally to different goals in the state space.

References

Wang, T. et al. “Benchmarking Model-Based Reinforcement Learning”
Nagabandi, A. et al. “Deep Dynamics Models for Learning Dexterous Manipulation.”
Kurutach, T. et al. “Model-Ensemble Trust-Region Policy Optimization.”
Fu J. EX2 et al., “Exploration with exemplar models for deep reinforcement learning.”
Finn C. et al., “Deep spatial autoencoders for visuomotor learning.”
Lee A. et al., “Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model.”
Florensa C. et al., “Automatic Goal Generation for Reinforcement Learning Agents.”

We Offer

an interdisciplinary research environment with partners from science and industry
constructive cooperation with bright, motivated employees
a comfortable working atmosphere

We Expect

Knowledge in the field of artificial intelligence and Machine Learning
Ability to implement both state of the art and experimental algorithms
Good Python knowledge
High creativity and productivity
Experience with Reinforcement Learning is an advantage

Required Documents

current grade report
CV

Contact

Mohammd Karam Daaboul