Stage-oe-small.jpg

Phdthesis3000: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Blohm |ErsterAutorVorname=Sebastian }} {{Phdthesis |Title= Large-Scale Pattern-Based Information Extraction from …“)
 
 
(3 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 4: Zeile 4:
 
}}
 
}}
 
{{Phdthesis
 
{{Phdthesis
|Title= Large-Scale Pattern-Based Information Extraction from the World Wide Web
+
|Title=Large-Scale Pattern-Based Information Extraction from the World Wide Web
 
|Instructor=Prof. Dr. Rudi Studer
 
|Instructor=Prof. Dr. Rudi Studer
 
|Date=2010/01/22
 
|Date=2010/01/22
Zeile 15: Zeile 15:
 
One particular type of Information Extraction models are textual patterns. Textual patterns are underspecified explicit descriptions of text fragments. The automatic induction of such patterns from example text fragments which are known to contain target information is a common way to learn this type of extraction models.  
 
One particular type of Information Extraction models are textual patterns. Textual patterns are underspecified explicit descriptions of text fragments. The automatic induction of such patterns from example text fragments which are known to contain target information is a common way to learn this type of extraction models.  
  
This thesis explores the potential of using textual patterns for Information Extraction from the World Wide Web. We review and discuss a large body of related work by describing it within a common framework. Then, we empirically analyze the effects of a multitude of design choices in pattern-based Information Extraction systems. In particular, we investigate how patterns can be filtered appropriately. We show how corpora of different nature can be exploited beneficially and how the nature of the patterns influences extraction quality. Finally, we present new ways of mining textual patterns by modelling pattern induction as a well-understood type of Data Mining problems.  
+
This thesis explores the potential of using textual patterns for Information Extraction from the World Wide Web. We review and discuss a large body of related work by describing it within a common framework. Then, we empirically analyze the effects of a multitude of design choices in pattern-based Information Extraction systems. In particular, we investigate how patterns can be filtered appropriately. We show how corpora of different nature can be exploited beneficially and how the nature of the patterns influences extraction quality. Finally, we present new ways of mining textual patterns by modelling pattern induction as a well-understood type of Data Mining problems.
 
+
|Download=Diss-sebastian-blohm.pdf
 
 
 
|Link=http://digbib.ubka.uni-karlsruhe.de/volltexte/1000015423
 
|Link=http://digbib.ubka.uni-karlsruhe.de/volltexte/1000015423
|DOI Name=urn:nbn:de:swb:90-154237  
+
|DOI Name=urn:nbn:de:swb:90-154237
 
|Projekt=X-Media
 
|Projekt=X-Media
 
|Forschungsgruppe=Wissensmanagement
 
|Forschungsgruppe=Wissensmanagement

Aktuelle Version vom 25. Januar 2010, 12:51 Uhr

Large-Scale Pattern-Based Information Extraction from the World Wide Web




Datum: 22. Januar 2010
KIT, Fakultät für Wirtschaftswissenschaften
Erscheinungsort / Ort: Karlsruhe
Referent(en): Prof. Dr. Rudi Studer
BibTeX


Kurzfassung
Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. Information Extraction systems require a model that describes how to identify relevant target information in texts. These models need to be adapted to the exact nature of the target information and to the nature of the textual input, which is typically accomplished by means of Machine Learning techniques that generate such models based on examples. One particular type of Information Extraction models are textual patterns. Textual patterns are underspecified explicit descriptions of text fragments. The automatic induction of such patterns from example text fragments which are known to contain target information is a common way to learn this type of extraction models.

This thesis explores the potential of using textual patterns for Information Extraction from the World Wide Web. We review and discuss a large body of related work by describing it within a common framework. Then, we empirically analyze the effects of a multitude of design choices in pattern-based Information Extraction systems. In particular, we investigate how patterns can be filtered appropriately. We show how corpora of different nature can be exploited beneficially and how the nature of the patterns influences extraction quality. Finally, we present new ways of mining textual patterns by modelling pattern induction as a well-understood type of Data Mining problems.

Download: Media:Diss-sebastian-blohm.pdf
Weitere Informationen unter: Link
DOI Link: urn:nbn:de:swb:90-154237

Projekt

X-Media



Forschungsgruppe

Wissensmanagement


Forschungsgebiet

Maschinelles Lernen, Text Mining, Informationsextraktion, Data Mining