ITHEA International Scientific Society : MULTI-AGENT SYSTEM FOR SIMILARITY SEARCH IN STRING SETS

ITHEA Classification Structure > I. Computing Methodologies > I.7 DOCUMENT AND TEXT PROCESSING

MULTI-AGENT SYSTEM FOR SIMILARITY SEARCH IN STRING SETS
By: Katarzyna Harężlak, Michał Sala (4626 reads)

Rating:

(1.00/10)

Abstract: The aim of the paper is to present the assumptions and the architecture of the system for searching similarity in string sets. During the research all the required steps of a procedure of text documents processing which includes text extraction, pruning, stemming and lemmatization were analysed. Models of a text documents’ description and the method of creating a vector of features were developed as well. This vector consists, inter alia, of chosen words and the number of their occurrences. The process of the text analysis is supported by a set of various dictionaries. These are Stop-words, Domain and Lemma dictionaries and all of them were considered in the context of the Polish language. Because the Lemma dictionary is supposed to consist of many entries, the efficient method of its access optimisation was elaborated. Various measures used for calculating degree of a text documents similarity were studied too. Moreover, the method for determining the quality of user queries and text documents adjustment were proposed. The system was realized in accordance with the idea of multi-agent systems. Its functionality is ensured by the set of agents acting on the basis of separate threads. In the research, tests of the system work efficiency were also performed.

Keywords: agent systems, text similarity search

ACM Classification Keywords: I.7 Document And Text Processing

Link:

MULTI-AGENT SYSTEM FOR SIMILARITY SEARCH IN STRING SETS

Katarzyna Harężlak, Michał Sala

http://www.foibg.com/ibs_isc/ibs-26/ibs-26-p09.pdf