Lehre/Seminar Seminar Representation Learning for Knowledge Graphs
Seminar Seminar Representation Learning for Knowledge Graphs
|Dozent(en)||Harald Sack, Mehwish Alam|
|Erfolgskontrolle||Each student will be assigned at max 2 papers on the topic. Out of which the student will have to give a seminar presentation and write a seminar report paper of 15 pages explaining the methods from at least one of the two assigned papers, in their own words.Der für das Attribut „Erfolgskontrolle“ des Datentyps Seite angegebene Wert „Each student will be assigned at max 2 papers on the topic. Out of which the student will have to give a seminar presentation and write a seminar report paper of 15 pages explaining the methods from at least one of the two assigned papers, in their own words.“ enthält ungültige Zeichen oder ist unvollständig. Er kann deshalb während einer Abfrage oder bei einer Annotation unerwartete Ergebnisse verursachen.|
Data representation or feature representation plays a key role in the performance of machine learning algorithms. In recent years, rapid growth has been observed in Representation Learning (RL) of words and Knowledge Graphs (KG) into low dimensional vector spaces and its applications to many real-world scenarios. Word embeddings are a low dimensional vector representation of words that are capable of capturing the context of a word in a document, semantic similarity as well as its relation with other words. Similarly, KG embeddings are a low dimensional vector representation of entities and relations from a KG preserving its inherent structure and capturing the semantic similarity between the entities. Each embedding space exhibits different semantic characteristics based on the source of information, e.g, text or KGs as well as the learning of the embedding algorithms. The same algorithm, when applied to different representations of the same training data, leads to different results due to the variation in the features encoded in the respective representations. The distributed representation of text in the form of the word and document vectors as well as of the entities and relations of the KG in form of entity and relation vectors have evolved as the key elements of various natural language processing tasks such as Entity Linking, Named Entity Recognition and disambiguation, etc. Different embedding spaces are generated for textual documents of different languages, hence aligning the embedding spaces has become a stepping stone for machine translation. On the other hand, in addition to multilingualism and domain-specific information, different KGs of the same domain have structural differences, making the alignment of the KG embeddings more challenging. In order to generate coherent embedding spaces for knowledge-driven applications such as question answering, named entity disambiguation, knowledge graph completion, etc., it is necessary to align the embedding spaces generated from different sources.
In this seminar, we would like to study the different state of the art algorithms for aligning embedding space. We would focus on two types of alignment algorithms: (1) Entity - Entity alignment, and (2) Entity - Word alignment.
If code is available from the authors, then re-implementation of it for small scale experiments using Google Colab with python.
Participation is restricted to 10 students max.