Stage-oe-small.jpg

Article19: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
K (Added from ontology)
K (Textersetzung - „Betriebliche Informations- und Kommunikationssysteme“ durch „Betriebliche Informationssysteme“)
 
(4 dazwischenliegende Versionen von 2 Benutzern werden nicht angezeigt)
Zeile 1: Zeile 1:
 +
{{Publikation Erster Autor
 +
|ErsterAutorNachname=Hotho
 +
|ErsterAutorVorname=Andreas
 +
}}
 
{{Publikation Author
 
{{Publikation Author
|Rank=1
+
|Rank=2
|Author=Andreas Hotho
+
|Author=Alexander Maedche
 
}}
 
}}
 
{{Publikation Author
 
{{Publikation Author
 
|Rank=3
 
|Rank=3
 
|Author=Steffen Staab
 
|Author=Steffen Staab
}}
 
{{Publikation Author
 
|Rank=2
 
|Author=Alexander Maedche
 
 
}}
 
}}
 
{{Article
 
{{Article
Zeile 22: Zeile 22:
 
{{Publikation Details
 
{{Publikation Details
 
|Abstract=Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.
 
|Abstract=Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.
|VG Wort-Seiten=
 
 
|Download=2002_19_Hotho_Text_Clustering_1.pdf
 
|Download=2002_19_Hotho_Text_Clustering_1.pdf
|Projekt=
+
|Forschungsgruppe=Betriebliche Informationssysteme, Komplexitätsmanagement
|Forschungsgruppe=Effiziente Algorithmen, Komplexitätsmanagement, Betriebliche Informations- und Kommunikationssysteme,
 
}}
 
{{Forschungsgebiet Auswahl
 
|Forschungsgebiet=Wissensmanagement
 
 
}}
 
}}
 +
{{Forschungsgebiet Auswahl}}
 
{{Forschungsgebiet Auswahl
 
{{Forschungsgebiet Auswahl
 
|Forschungsgebiet=Text Mining
 
|Forschungsgebiet=Text Mining
 
}}
 
}}

Aktuelle Version vom 2. November 2009, 12:52 Uhr


Text Clustering Based on Good Aggregations


Text Clustering Based on Good Aggregations



Veröffentlicht: 2002

Journal: Künstliche Intelligenz (KI)
Nummer: 4
Seiten: 48-54

Volume: 16


Referierte Veröffentlichung

BibTeX




Kurzfassung
Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.

Download: Media:2002_19_Hotho_Text_Clustering_1.pdf



Forschungsgruppe

Betriebliche Informationssysteme,Komplexitätsmanagement


Forschungsgebiet

Text Mining