Abstract: The paper presents the results of automatic term extraction from a special text corpus (a collection of
papers on corpus linguistics) by means of statistical methods (association measures) combined with certain
syntactic models. The approach undertaken in the paper is based on lexico-syntactic models that can be viewed
as models of phrases for the Russian language. The Sketch Engine system represents itself a corpus tool which
takes as input a corpus of any language and corresponding grammar patterns. The system gives information
about a word’s collocability on concrete dependency models, and generates lists of the most frequent phrases for
a given word based on appropriate models. The extracted terms belong to various clusters and represent the
lexical structure of the texts in question. The applied method includes statistical analysis that enables estimating
paradigmatic and syntagmatic relations between lexemes based on their distribution.
Keywords: Corpora, distributional and statistical methods, collocations, automatic term extraction, thesaurus.
ACM Classification Keywords: I.2.7 Natural Language Processing
Link:
STUDYING SPECIAL TEXT RUSSIAN CORPORA
BY THE LEXICO-SYNTACTIC MODELS
Maria Khokhlova, Victor Zakharov
http://foibg.com/ibs_isc/ibs-27/ibs-27-p12.pdf