Aktuelle Version vom 2. November 2009, 12:52 Uhr

Text Clustering Based on Good Aggregations

Andreas Hotho, Alexander Maedche, Steffen Staab

Veröffentlicht: 2002

Journal: Künstliche Intelligenz (KI)
Nummer: 4
Seiten: 48-54

Volume: 16

Referierte Veröffentlichung

Kurzfassung
Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.

Download: Media:2002_19_Hotho_Text_Clustering_1.pdf

Forschungsgruppe

Betriebliche Informationssysteme,Komplexitätsmanagement

Forschungsgebiet

Text Mining

@@ Zeile 1: / Zeile 1: @@
-{{Publikation Author
+{{Publikation Erster Autor
-|Rank=3
+|ErsterAutorNachname=Hotho
-|Author=Steffen Staab
+|ErsterAutorVorname=Andreas
 }}
 {{Publikation Author
@@ Zeile 8: / Zeile 8: @@
 }}
 {{Publikation Author
-|Rank=1
+|Rank=3
-|Author=Andreas Hotho
+|Author=Steffen Staab
 }}
 {{Article
@@ Zeile 22: / Zeile 22: @@
 {{Publikation Details
 |Abstract=Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.
-|VG Wort-Seiten=
 |Download=2002_19_Hotho_Text_Clustering_1.pdf
-|Projekt=
+|Forschungsgruppe=Betriebliche Informationssysteme, Komplexitätsmanagement
-|Forschungsgruppe=Betriebliche Informations- und Kommunikationssysteme, Komplexitätsmanagement, Effiziente Algorithmen,
-}}
-{{Forschungsgebiet Auswahl
-|Forschungsgebiet=Wissensmanagement
 }}
+{{Forschungsgebiet Auswahl}}
 {{Forschungsgebiet Auswahl
 |Forschungsgebiet=Text Mining
 }}

Article19: Unterschied zwischen den Versionen

Aktuelle Version vom 2. November 2009, 12:52 Uhr

Text Clustering Based on Good Aggregations

Text Clustering Based on Good Aggregations