Statistical Implications of Turing's FormulaISBN: 9781119237068
296 pages
November 2016

Description
Features a broad introduction to recent research on Turing’s formula and presents modern applications in statistics, probability, information theory, and other areas of modern data science
Turing's formula is, perhaps, the only known method for estimating the underlying distributional characteristics beyond the range of observed data without making any parametric or semiparametric assumptions. This book presents a clear introduction to Turing’s formula and its connections to statistics. Topics with relevance to a variety of different fields of study are included such as information theory; statistics; probability; computer science inclusive of artificial intelligence and machine learning; big data; biology; ecology; and genetics. The author provides examinations of many core statistical issues within modern data science from Turing's perspective. A systematic approach to longstanding problems such as entropy and mutual information estimation, diversity index estimation, domains of attraction on general alphabets, and tail probability estimation is presented in light of the most uptodate understanding of Turing's formula. Featuring numerous exercises and examples throughout, the author provides a summary of the known properties of Turing's formula and explains how and when it works well; discusses the approach derived from Turing's formula in order to estimate a variety of quantities, all of which mainly come from information theory, but are also important for machine learning and for ecological applications; and uses Turing's formula to estimate certain heavytailed distributions.
In summary, this book:
• Features a unified and broad presentation of Turing’s formula, including its connections to statistics, probability, information theory, and other areas of modern data science
• Provides a presentation on the statistical estimation of information theoretic quantities
• Demonstrates the estimation problems of several statistical functions from Turing's perspective such as Simpson's indices, Shannon's entropy, general diversity indices, mutual information, and Kullback–Leibler divergence
• Includes numerous exercises and examples throughout with a fundamental perspective on the key results of Turing’s formula
Statistical Implications of Turing's Formula is an ideal reference for researchers and practitioners who need a review of the many critical statistical issues of modern data science. This book is also an appropriate learning resource for biologists, ecologists, and geneticists who are involved with the concept of diversity and its estimation and can be used as a textbook for graduate courses in mathematics, probability, statistics, computer science, artificial intelligence, machine learning, big data, and information theory.
Zhiyi Zhang, PhD, is Professor of Mathematics and Statistics at The University of North Carolina at Charlotte. He is an active consultant in both industry and government on a wide range of statistical issues, and his current research interests include Turing's formula and its statistical implications; probability and statistics on countable alphabets; nonparametric estimation of entropy and mutual information; tail probability and biodiversity indices; and applications involving extracting statistical information from lowfrequency data space. He earned his PhD in Statistics from Rutgers University.
Table of Contents
Preface xi
1 Turing’s Formula 1
1.1 Turing’s Formula 3
1.2 Univariate Normal Laws 10
1.3 Multivariate Normal Laws 22
1.4 Turing’s Formula Augmented 27
1.5 GoodnessofFit by Counting Zeros 33
1.6 Remarks 42
1.7 Exercises 45
2 Estimation of Simpson’s Indices 49
2.1 Generalized Simpson’s Indices 49
2.2 Estimation of Simpson’s Indices 52
2.3 Normal Laws 54
2.4 Illustrative Examples 61
2.5 Remarks 66
2.6 Exercises 68
3 Estimation of Shannon’s Entropy 71
3.1 A Brief Overview 72
3.2 The PlugIn Entropy Estimator 76
3.2.1 When K Is Finite 76
3.2.2 When K Is Countably Infinite 81
3.3 Entropy Estimator in Turing’s Perspective 86
3.3.1 When K Is Finite 88
3.3.2 When K Is Countably Infinite 94
3.4 Appendix 107
3.4.1 Proof of Lemma 3.2 107
3.4.2 Proof of Lemma 3.5 110
3.4.3 Proof of Corollary 3.5 111
3.4.4 Proof of Lemma 3.14 112
3.4.5 Proof of Lemma 3.18 116
3.5 Remarks 120
3.6 Exercises 121
4 Estimation of Diversity Indices 125
4.1 A Unified Perspective on Diversity Indices 126
4.2 Estimation of Linear Diversity Indices 131
4.3 Estimation of Rényi’s Entropy 138
4.4 Remarks 142
4.5 Exercises 145
5 Estimation of Information 149
5.1 Introduction 149
5.2 Estimation of Mutual Information 162
5.2.1 The PlugIn Estimator 163
5.2.2 Estimation in Turing’s Perspective 170
5.2.3 Estimation of StandardizedMutual Information 173
5.2.4 An Illustrative Example 176
5.3 Estimation of Kullback–Leibler Divergence 182
5.3.1 The PlugIn Estimator 184
5.3.2 Properties of the Augmented PlugIn Estimator 186
5.3.3 Estimation in Turing’s Perspective 189
5.3.4 Symmetrized Kullback–Leibler Divergence 193
5.4 Tests of Hypotheses 196
5.5 Appendix 199
5.5.1 Proof of Theorem 5.12 199
5.6 Exercises 204
6 Domains of Attraction on Countable Alphabets 209
6.1 Introduction 209
6.2 Domains of Attraction 212
6.3 Examples and Remarks 223
6.4 Appendix 228
6.4.1 Proof of Lemma 6.3 228
6.4.2 Proof of Theorem 6.2 229
6.4.3 Proof of Lemma 6.6 232
6.5 Exercises 236
7 Estimation of Tail Probability 241
7.1 Introduction 241
7.2 Estimation of Pareto Tail 244
7.3 Statistical Properties of AMLE 248
7.4 Remarks 253
7.5 Appendix 256
7.5.1 Proof of Lemma 7.7 256
7.5.2 Proof of Lemma 7.9 263
7.6 Exercises 267
References 269
Author Index 275
Subject Index 279
Author Information
Zhiyi Zhang, PhD, is Professor of Mathematics and Statistics at The University of North Carolina at Charlotte. He is an active consultant in both industry and government on a wide range of statistical issues, and his current research interests include Turing's formula and its statistical implications; probability and statistics on countable alphabets; nonparametric estimation of entropy and mutual information; tail probability and biodiversity indices; and applications involving extracting statistical information from lowfrequency data space. He earned his PhD in Statistics from Rutgers University.