Abstract: Fact extraction from the text is one of the most important areas of Natural Language
Processing (NLP). Majority of existing approaches allows extracting facts from structured textual
information of the specific subject areas. This paper proposes a logical-linguistic model extracting facts
from semi-structured texts in English, which belong to unlimited subject areas. A fact is written in the
form of a triplet: Subject - Predicate - Object, in which the Predicate defines the relations and Subject
and Object define the subjects, objects or concepts. Our model defines meaning relations via
grammatical and semantic features of the words in English sentences. In order to formalize and
represent the participants of the fact triplet explicitly, we identify subject variables. The subject variables
define a finite set of morphological and syntactic features of the words in sentences. The model was
successfully implemented in the system of extraction and identification of a few types of the facts: the
fact of lacking, the fact of ownership, the fact of transferring, and the fact of the presence of the attribute
of time, location, and belonging for the first three fact actions. We estimated the effectiveness of our
model via the coefficients of precision and recall. Results оf the paper show that using of the model lets
increase the numerical values of these coefficients.
Keywords: facts extraction from the text, Natural Language Processing, semantic relations, the algebra
of finite predicates, recall and precision.
ACM Classification Keywords: H.3.3 .Information Search and Retrieval, I.2.4. Knowledge
Representation Formalisms and Methods
Link:
Facts extraction from the semi-structured text information
Nina Khairova, Nataliya Sharonova, Ajit Pratap Singh Gautam
http://www.foibg.com/ijima//vol05/ijima05-01-p07.pdf