Stage-oe-small.jpg

Thema4679

Aus Aifbportal
Version vom 9. März 2021, 16:50 Uhr von Dh1659 (Diskussion | Beiträge)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu:Navigation, Suche



Efficient Deep Reinforcement Learning by Combining Variational Autoencoders with Soft Actor Critic


Moritz Nekolla



Informationen zur Arbeit

Abschlussarbeitstyp: Bachelor
Betreuer: Mohammd Karam Daaboul
Forschungsgruppe: Angewandte Technisch-Kognitive Systeme
Partner: FZI
Archivierungsnummer: 4679
Abschlussarbeitsstatus: Abgeschlossen
Beginn: 01. Juli 2020
Abgabe: 01. Dezember 2020

Weitere Informationen

Model-free reinforcement learning (RL) in robotic tasks has achieved excellent results in a variety of tasks. Usually, specific sensors are used to extract relevant information from the agent's environment, such as its position, velocity, and other objects' position in the environment. These sensors are usually expensive. On the other hand, sensors such as cameras provide much information about the environment, and they are relatively inexpensive. A high-capacity representation such as a neural network allows a good strategy to be learned end-to-end from images; in other words, the agent infers its behavior directly from the raw image. Nevertheless, images are high-dimensional measurements of the environment and contain redundant information that makes strategy learning inefficient. Therefore, autonomous feature extraction based only on image data is necessary. This technique should be trainable in an unsupervised manner to maintain scaling. In this work, we develop a combined reinforcement learning model that relies entirely on image data. A variational autoencoder (VAE) is trained to generate a dense representation of the observations. This dense representation serves as input to the agent. The complete training of VAE is unsupervised and offline, so it occurs before the reinforcement learning agent's actual training. This approach should increase RL efficiency by enabling faster convergence of the training compared to the end-to-end approach.