Article19: Unterschied zwischen den Versionen
K (Added from ontology) |
Dhe (Diskussion | Beiträge) K (Textersetzung - „Betriebliche Informations- und Kommunikationssysteme“ durch „Betriebliche Informationssysteme“) |
||
(4 dazwischenliegende Versionen von 2 Benutzern werden nicht angezeigt) | |||
Zeile 1: | Zeile 1: | ||
+ | {{Publikation Erster Autor | ||
+ | |ErsterAutorNachname=Hotho | ||
+ | |ErsterAutorVorname=Andreas | ||
+ | }} | ||
{{Publikation Author | {{Publikation Author | ||
− | |Rank= | + | |Rank=2 |
− | |Author= | + | |Author=Alexander Maedche |
}} | }} | ||
{{Publikation Author | {{Publikation Author | ||
|Rank=3 | |Rank=3 | ||
|Author=Steffen Staab | |Author=Steffen Staab | ||
− | |||
− | |||
− | |||
− | |||
}} | }} | ||
{{Article | {{Article | ||
Zeile 22: | Zeile 22: | ||
{{Publikation Details | {{Publikation Details | ||
|Abstract=Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy. | |Abstract=Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy. | ||
− | |||
|Download=2002_19_Hotho_Text_Clustering_1.pdf | |Download=2002_19_Hotho_Text_Clustering_1.pdf | ||
− | + | |Forschungsgruppe=Betriebliche Informationssysteme, Komplexitätsmanagement | |
− | |Forschungsgruppe= | ||
− | |||
− | |||
− | |||
}} | }} | ||
+ | {{Forschungsgebiet Auswahl}} | ||
{{Forschungsgebiet Auswahl | {{Forschungsgebiet Auswahl | ||
|Forschungsgebiet=Text Mining | |Forschungsgebiet=Text Mining | ||
}} | }} |
Aktuelle Version vom 2. November 2009, 12:52 Uhr
Text Clustering Based on Good Aggregations
Text Clustering Based on Good Aggregations
Veröffentlicht: 2002
Journal: Künstliche Intelligenz (KI)
Nummer: 4
Seiten: 48-54
Volume: 16
Referierte Veröffentlichung
Kurzfassung
Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.
Download: Media:2002_19_Hotho_Text_Clustering_1.pdf
Betriebliche Informationssysteme,Komplexitätsmanagement