Abstract: Classification and feature selection techniques are among the most commonly used mathematical
approaches for analysis and interpretation of biological data. One of the important characteristics of any classifier
is its classification error, which is important to take into consideration for accurate data analysis. The most
popular error estimation techniques (resubstitution, bootstrapping, cross-validation) strikingly vary in performance.
It is well known that more accurate classifiers such as bootstrapping, cross-validation are very slow, while heavily
biased resubstitution is very fast. Recently, a new bolstered error estimation technique has been proposed that
optimally combines speed and accuracy. It uses a Monte-Carlo? sampling based algorithm for classification for the
general case, but for the case of linear classification, an analytical solution may be applied. In this paper we
introduce geometric approach for bolstered error estimation and compare its performance with other error
estimation algorithms. The results obtained show that geometric bolstered error estimation algorithms are very
fast error estimation techniques characterized by accuracy comparable with LOO and having lower variance.
These algorithms are useful for analyzing extremely large numbers of features and may find their applications in
wide fields of - omics data analysis.
Keywords: Biology and genetics, Classifier design and evaluation, Machine learning.
ACM Classification Keywords: A.0 General Literature - Conference proceedings, I.5.2 – Classifier design and
evaluation, Feature evaluation and selection. J.3 - Biology and genetics
Link:
GEOMETRIC APPROACH FOR GAUSSIAN-KERNEL BOLSTERED ERROR
ESTIMATION FOR LINEAR CLASSIFICATION IN COMPUTATIONAL BIOLOGY
Arsen Arakelyan, Lilit Nerisyan, Aram Gevorgyan, Anna Boyajyan