Abstract: In this paper we study some problems involved in analysis of Pareto-distributed scientometric data
(series of citations versus paper ranks). The problems include appropriate choices of i) the distribution type
(continuous, discrete or finite-size discrete) and ii) statistical methods to obtain unbiased estimates for the powerlaw
exponent (maximum likelihood procedure or least square regression.). Since relatively low magnitudes of the
power exponent (less than 2), are observed massively in scientometric databases, finite-size discrete Pareto
distribution (citations, distributed to finite number of paper ranks) appears to be more adequate for data analysis
than the traditional ones. This conclusion is illustrated with two examples (for synthetic and actual data,
respectively). We also derive empirical relationships, in particular, for the maximum and the total number of
citations dependence on the Hirsch index. The latter generalize results of previous studies.
Keywords: Scientometrics, Hirsch index, Pareto distributions, data analysis, empirical relationships
ACM Classification Keywords: H. Information Systems, H.2. Database Management, H.2.8. Database
applications, subject: Scientific databases; I. Computing methodologies, I.6 Simulation and Modeling, I.6.4. Model
Validation and Analysis
Link:
THEORETICAL ANALYSIS OF EMPIRICAL RELATIONSHIPS FOR PARETODISTRIBUTED
SCIENTOMETRIC DATA
Vladimir Atanassov, Ekaterina Detcheva
http://www.foibg.com/ijima/vol01/ijima01-3-p07.pdf