Abstract: An approach for indirect spatial data extraction by learning restricted finite state automata from web
documents created using Bulgarian language are outlined in the paper. It uses heuristics to generalize initial
finite-state automata that recognizes only the positive examples and nothing else into automata that recognizes
as larger language as possible without extracting any non-positive examples from the training data set.
The learning method, program realization and experiments are presented. The investigation is carried out in
accordance and following the rules of EU INSPIRE Network.
Keywords: Automatic Data Extraction, Restricted Finite State Automata, Web Documents, Indirect Spatial Data,
INSPIRE network.
ACM Classification Keywords: H.2.8 Database Applications - Data mining; F.1.1 Models of Computation –
Finite State Automata
Link:
INDIRECT SPATIAL DATA EXTRACTION FROM WEB DOCUMENTS
Dimitar Blagoev, George Totkov, Milena Staneva,
Krassimira Ivanova, Krassimir Markov, Peter Stanchev
http://www.foibg.com/ibs_isc/ibs-14/ibs-14-p11.pdf