Abstract: An approach for indirect spatial data extraction by learning restricted finite state automata from web documents created using Bulgarian language are outlined in the paper. It uses heuristics to generalize initial finite-state automata that recognizes only the positive examples and nothing else into automata that recognizes as larger language as possible without extracting any non-positive examples from the training data set. The learning method, program realization and experiments are presented. The investigation is carried out in accordance and following the rules of EU INSPIRE Network.
Keywords: Indirect Spatial Data, Automatic Data Extraction, Restricted Finite State Automata, Web Documents.
ACM Classification Keywords: H.2.8 Database Applications - Data mining; F.1.1 Models of Computation - Finite State Automata
Link:
INDIRECT SPATIAL DATA EXTRACTION FROM WEB DOCUMENTS
Dimitar Blagoev, George Totkov, Milena Staneva, Krassimira Ivanova, Krassimir Markov, Peter Stanchev
http://foibg.com/ijitk/ijitk-vol03/IJITK03-4-p08.pdf