Abstract: A new data analysis method is discussed that is based on calculating syndromes by training data sets.
Syndrome are defined as sub-regions in feature space where mean values of target Y deviates from mean value
of Y in whole data set. Described method of syndromes construction uses boundaries found with the help of
modified version of optimal valid partitioning (OVP) method. The modification is based on new validation
technique that allows more effectively delete redundant regularities from output set. OVP boundaries are used to
find sub-regions in features space with strong deviation of target Y from its mean by whole data set. Such subregions
further are called syndromes. Hierarchical tree method was applied to receive clusters of objects from
training dataset in space of binary indices indicating if feature description of object belongs to corresponding
syndrome. Such technique allows discovering sets of objects with similar syndromes. Experiments with
biomedical datasets are discussed.
Keywords: Optimal partitioning, statistical validity, permutation test, regularities, gerontology.
ACM Classification Keywords: H.2.8 Database Applications - Data mining, G.3 Probability and Statistics -
Nonparametric statistics, Probabilistic algorithms
Link:
METHOD OF DATA ANALYSIS BASED ON CLUSTERING IN “SYNDROMES”
INDICATORS SPACE
Senko Oleg, Kuznetsova Anna, Kostomarova Irina
http://www.foibg.com/ijitk/ijitk-vol07/ijitk07-04-p07.pdf