Thema4870

Using Large Language Models for Data Augmentation

Informationen zur Arbeit

Abschlussarbeitstyp: Bachelor, Master
Betreuer: Nicholas Popovic
Forschungsgruppe: Web Science

Archivierungsnummer: 4870
Abschlussarbeitsstatus: Offen
Beginn: 01. März 2022
Abgabe: unbekannt

Weitere Informationen

Topic

Large language models, such as GPT-3 [1], are capable of generating natural language outputs in response to a prompt and have shown to perform well on few-shot learning tasks. One key issue which makes the use of these models impractical, however, is their size: In order to use models with hundreds of billions of parameters, specialized and expensive hardware is needed. For example, the weights of GPT-3 require hundreds of GB of GPU memory, while current high end consumer GPUs typically have a maximum 24GB of memory.

The idea of data augmentation is to artificially increase the amount of training data available for a given task by automatically creating new training examples. The larger training corpus can then be used to train a considerably smaller model for the task at hand. Using a small set of labeled examples, the goal of the proposed thesis is to use a large language model, such as GPT-NeoX-20B [2], to perform data augmentation for relation extraction, where the goal is to detect relations between entities (such as a person or organization) mentioned in a text.

An example of GPT-3 being used for data augmentation can be found in [3].

Prerequisites

Hands-on experience in machine learning, no fear to implement neural network models (under guidance of the supervisors).

[1] https://arxiv.org/pdf/2005.14165.pdf

[2] http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf

[3] https://aclanthology.org/2021.findings-emnlp.192.pdf