Techreport783
Published: 2004
November
Institution: Insitute AIFB, University of Karlsruhe
Archivierungsnummer: 783
Kurzfassung
We present a novel approach to the automatic acquisition of taxonomies
or concept hierarchies from a text corpus. The approach is based on
Formal Concept Analysis (FCA), a method mainly used for the analysis of data,
i.e. for investigating and processing explicitly given information.
We follow Harris' distributional hypothesis and model the context
of a certain term as a vector representing syntactic dependencies
which are automatically acquired from the text corpus with a linguistic parser.
On the basis of this context information, FCA produces a lattice
that we convert into a special kind of partial order constituting
a concept hierarchy.
The approach is evaluated by comparing the resulting concept hierarchies
with hand-crafted taxonomies for two domains: tourism and finance.
We also directly compare our approach with hierarchical agglomerative
clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering
algorithm. Furthermore, we investigate the impact of using different
measures weighting the contribution of each attribute as well as of applying
a particular smoothing technique to cope with data sparseness.
Download: Media:2004_783_Cimiano_Learning_Concep_1.pdf,Media:2004_783_Cimiano_Learning_Concep_1.ps