Inproceedings1836
Scaling up Pattern Induction for Web Relation Extraction through Frequent Itemset Mining
Scaling up Pattern Induction for Web Relation Extraction through Frequent Itemset Mining
Published: 2008
September
Herausgeber: Benjamin Adrian, Günter Neumann, Alexander Troussov, Borislav Popov
Buchtitel: Proceedings of the KI 2008 Workshop on Ontology-Based Information Extraction Systems
Referierte Veröffentlichung
BibTeX
Kurzfassung
In this paper, we address the problem of extracting relational information
from the Web at a large scale. In particular we present a bootstrapping
approach to relation extraction which starts with a few seed tuples of the target
relation and induces patterns which can be used to extract further tuples. Our
contribution in this paper lies in the formulation of the pattern induction task as a
well-known machine learning problem, i.e. the one of determining frequent itemsets
on the basis of a set of transactions representing patterns. The formulation of
the extraction problem as the task of mining frequent itemsets is not only elegant,
but also speeds up the pattern induction step considerably with respect to previous
implementations of the bootstrapping procedure. We evaluate our approach
in terms of standard measures with respect to seven datasets of varying size and
complexity. In particular, by analyzing the extraction rate (extracted tuples per
time) we show that our approach reduces the pattern induction complexity from
quadratic to linear (in the size of the occurrences to be generalized), while mantaining
extraction quality at similar (or even marginally better) levels.
Download: Media:2008_1836_Blohm_Scaling_up_Patt_1.pdf
Informationsextraktion, Data Mining