Semantic Exploration of Text Documents with Multi-Faceted Metadata Employing Document Embeddings: Patent Landscaping Use Case
Betreuer: Harald Sack, Hidir Aras
Forschungsgruppe: Information Service Engineering
Partner: Inovex GmbH
Archivierungsnummer: tic Exploration of Text Documents with Multi-Faceted Metadata Employing Document Embeddings: Patent Landscaping Use Case
Abschlussarbeitsstatus: In Bearbeitung
Beginn: 07. Dezember 2018
Semantic embeddings are used in natural language processing to capture relationships between text documents. However, positions and distances in the embedding space are not easily explainable and can hardly be understood by a user as is. Additional data dimensions incorporated into the representation of a semantic space provide immense added value. This is especially the case when visually exploring large document collections, where human perception must be aided in the task of finding patterns in data to prevent cognitive overload. One example of a domain where such exploration takes place is patent landscaping. Patents are an enormously valuable source of technology intelligence. They exemplify the problem at hand because they are text documents with a clearly defined structure, lots of metadata and references to other patent documents. Through analyzing patents companies acquire competitive advantages and steer their research and development efforts. With about 3,1 million patent applications filed worldwide in 2016 (WIPO Intellectual Property Statistics Data Center, 2018) and thousands of patent documents subject to analysis for a single domain, an effective approach facilitating the analysis is critically important. Therefore, this work incorporates semantic embeddings and faceted metadata in an explorative data visualization approach that is applied to the example task of patent landscaping.