Thema4536: Unterschied zwischen den Versionen
|Zeile 10:||Zeile 10:|
<p>The goal of this work is to train a driving policy efficiently by using an off-policy
<p>The goal of this work is to train a driving policy efficiently by using an off-policy learning algorithm. To drive a car safely to its destination.
Version vom 27. November 2019, 16:38 Uhr
Abschlussarbeitstyp: Bachelor, Master
Betreuer: Mohammd Karam Daaboul
Forschungsgruppe: Angewandte Technisch-Kognitive Systeme
Partner: FZI Forschungszentrum Informatik
Beginn: 27. November 2019
Reinforcement learning has achieved remarkable results in areas such as simulated robotics or at playing Atari computer games. As reinforcement learning agents learn through exploration by trial-and-error, training in the real world would result in undesirable actions leading to possible damage to the system and its environment. To train a reinforcement learning agent using functional approximations such as the neural network, the agent interacts with the world, the world returns a reward if the action used was good or penalty, if not. This reward is used to train the network. During training, the reward function must increase and the loss function should converge to zero. A poor choice of hyperparameters such as the architecture of the network can lead to bad results. Therefore we may have to tune the hyperparameters several times to get a better result. Every time we tune the parameters, we have to restart the training. A big challenge in training an agent in the real world is learning from limited samples. Almost all of these real systems are either slow, fragile or so expensive that the data they generate is expensive, and policy learning must be data efficient. Off-policy reinforcement learning aims to leverage experience collected from prior policies for sample-efficient learning.
The goal of this work is to train a driving policy efficiently by using an off-policy reinforcement learning algorithm. To drive a car safely to its destination.
- an interdisciplinary research environment with partners from science and industry
- constructive cooperation with bright, motivated employees
- a comfortable working atmosphere
- Knowledge in the field of artificial intelligence and Machine Learning
- Ability to implement both state of the art and experimental algorithms
- Good Python knowledge
- High creativity and productivity
- Experience with Reinforcement Learning is an advantage
- current grade report
Mohammd Karam Daaboul