Skip to main content

Nonparametric Analysis of Univariate Heavy-Tailed Data: Research and Practice

Nonparametric Analysis of Univariate Heavy-Tailed Data: Research and Practice

Natalia Markovich

ISBN: 978-0-470-72360-9 November 2007 336 Pages


Heavy-tailed distributions are typical for phenomena in complex multi-component systems such as biometry, economics, ecological systems, sociology, web access statistics, internet traffic, biblio-metrics, finance and business. The analysis of such distributions requires special methods of estimation due to their specific features. These are not only the slow decay to zero of the tail, but also the violation of Cramer’s condition, possible non-existence of some moments, and sparse observations in the tail of the distribution.

The book focuses on the methods of statistical analysis of heavy-tailed independent identically distributed random variables by empirical samples of moderate sizes. It provides a detailed survey of classical results and recent developments in the theory of nonparametric estimation of the probability density function, the tail index, the hazard rate and the renewal function.

Both asymptotical results, for example convergence rates of the estimates, and results for the samples of moderate sizes supported by Monte-Carlo investigation, are considered. The text is illustrated by the application of the considered methodologies to real data of web traffic measurements.


1. Definitions and rough detection of tail heaviness.

1.1 Definitions and basic properties of classes of heavy-tailed Distributions.

1.2 Tail index estimation.

1.2.1 Estimators of a positive-valued tail index.

1.2.2 The choice of k in Hill's estimator.

1.2.3 Estimators of a real-valued tail index.

1.2.4 On-line estimation of the tail index.

1.3 Detection of tail heaviness and dependence.

1.3.1 Rough tests of tail heaviness.

1.3.2 Analysis of Web traffic and TCP flow data.

1.3.3 Dependence detection from univariate data.

1.3.4 Dependence detection from bivariate data.

1.3.5 Bivariate analysis of TCP flow data.

1.4 Notes and comments.

1.5 Exercises.

2. Classical methods of probability density estimation.

2.1 Principles of density estimation.

2.2 Methods of density estimation.

2.2.1 Kernel estimators.

2.2.2 Projection estimators.

2.2.3 Spline estimators.

2.2.4 Smoothing methods.

2.2.5 Illustrative examples.

2.3 Kernel estimation from dependent data.

2.3.1 Statement of the problem.

2.3.2 Numerical calculation of the bandwidth.

2.3.3 Data-driven selection of the bandwidth.

2.4 Applications.

2.4.1 Finance: evaluation of market risk.

2.4.2 Telecommunications.

2.4.3 Population analysis.

2.5 Exercises.

3. Heavy-tailed density estimation.

3.1 Problems of the estimation of heavy-tailed densities.

3.2 Combined parametric-nonparametric method.

3.2.1 Nonparametric estimation of the density by structural risk minimization.

3.2.2 Illustrative examples.

3.2.3 Web data analysis by a combined parametric-nonparametric method.

3.3 Barronâ??s estimator and Ï?2-optimality.

3.4 Kernel estimators with variable bandwidth.

3.5 Retransformed nonparametric estimators.

3.6 Exercises.

4. Transformations and heavy-tailed density estimation.

4.1 Problems of data transformations.

4.2 Estimates based on a fixed transformation.

4.3 Estimates based on an adaptive transformation.

4.3.1 Estimation algorithm.

4.3.2 Analysis of the algorithm.

4.3.3 Further remarks.

4.4 Estimating the accuracy of retransformed estimates.

4.5 Boundary kernels.

4.6 Accuracy of a nonvariable bandwidth kernel estimator.

4.7 The D method for a nonvariable bandwidth kernel estimator.

4.8 The D method for a variable bandwidth kernel estimator.

4.8.1 Method and results.

4.8.2 Application to Web traffic characteristics.

4.9 The Ï?2 method for the projection estimator.

4.10 Exercises.

5. Classification and retransformed density estimates.

5.1 Classification and quality of density estimation.

5.2 Convergence of the estimated probability of misclassification.

5.3 Simulation study.

5.4 Application of the classification technique to Web data analysis.

5.4.1 Intelligent browser.

5.4.2 Web data analysis by traffic classification.

5.4.3 Web prefetching.

5.5 Exercises.

6. Estimation of high quantiles.

6.1 Introduction.

6.2 Estimators of high quantiles.

6.3 Distribution of high quantile estimates.

6.4 Simulation study.

6.4.1 Comparison of high quantile estimates in terms of relative bias and mean squared error.

6.4.2 Comparison of high quantile estimates in terms of confidence intervals.

6.5 Application to Web traffic data.

6.6 Exercises.

7. Nonparametric estimation of the hazard rate function.

7.1 Definition of the hazard rate function.

7.2 Statistical regularization method.

7.3 Numerical solution of ill-posed problems.

7.4 Estimation of the hazard rate function of heavy-tailed distributions.

7.5 Hazard rate estimation for compactly supported distributions.

7.5.1 Estimation of the hazard rate from the simplest equations.

7.5.2 Estimation of the hazard rate from a special kernel equation.

7.6 Estimation of the ratio of hazard rates.

7.6.1 Failure time detection.

7.6.2 Hormesis detection.

7.7 Hazard rate estimation in teletraffic theory.

7.7.1 Teletraffic processes at the packet level.

7.7.2 Estimation of the intensity of a nonhomogeneous Poisson process.

7.8 Semi-Markov modeling in teletraffic engineering.

7.8.1 The Gilbert-Elliott model.

7.8.2 Estimation of a retrial process.

7.9 Exercises.

8. Nonparametric estimation of the renewal function.

8.1 Traffic modeling by recurrent marked point processes.

8.2 Introduction to renewal function estimation.

8.3 Histogram-type estimator of the renewal function.

8.4 Convergence of the histogram-type estimator.

8.5 Selection of k by a bootstrap method.

8.6 Selection of k by a plot.

8.7 Simulation study.

8.8 Application to the inter-arrival times of TCP connections.

8.9 Conclusions and discussion.

8.10 Exercises.


A Proofs of Chapter 2.

B Proofs of Chapter 4.

C Proofs of Chapter 5.

D Proofs of Chapter 6.

E Proofs of Chapter 7.

F Proofs of Chapter 8.

List of Main Symbols and Abbreviations.



"This book can be recommended to researchers in life sciences for the wealth of information about statistics of extremes and density function estimation.  The application to hormesis is interesting for those concerned with this strange positive effect that low doses of toxic substances can have in living organisms." (Biometrics, March 2009)

"It is ideally suited for statisticians, researchers and Ph.D. students in statistics and probability theory.  There is also much to benefit those working and studying a wide range of disciplines from computer science, telecommunications and performance evaluation, to demography and population analysis." (Mathematical Review, Issue 2009e)

  • Provides comprehensive coverage of a growing area of research.
  • Presents a good balance of theory and real world applications.
  • Uses examples drawn from finance and from internet traffic management to illustrate the concepts.
  • Presents a detailed survey of classical results alongside recent developments in the theory of nonparametric estimation of the probability density function, the tail index, the hazard rate and the renewal function.
  • Equips the reader with the knowledge to carry out basic and advanced statistical analyses of heavy-tailed data.
  • Accompanied by a website hosting exercises and solutions to problems presented within the book.