Skip to main content

Ensemble Classification Methods with Applications in R

Ensemble Classification Methods with Applications in R

Esteban Alfaro, Matías Gámez, Noelia García

ISBN: 978-1-119-42109-2

Oct 2018

300 pages

Product not available for purchase

Description

An essential guide to two burgeoning topics in machine learning – classification trees and ensemble learning 

Ensemble Classification Methods with Applications in R introduces the concepts and principles of ensemble classifiers methods and includes a review of the most commonly used techniques. This important resource shows how ensemble classification has become an extension of the individual classifiers. The text puts the emphasis on two areas of machine learning: classification trees and ensemble learning. The authors explore ensemble classification methods’ basic characteristics and explain the types of problems that can emerge in its application.

Written by a team of noted experts in the field, the text is divided into two main sections. The first section outlines the theoretical underpinnings of the topic and the second section is designed to include examples of practical applications. The book contains a wealth of illustrative cases of business failure prediction, zoology, ecology and others. This vital guide:

  • Offers an important text that has been tested both in the classroom and at tutorials at conferences
  • Contains authoritative information written by leading experts in the field
  • Presents a comprehensive text that can be applied to courses in machine learning, data mining and artificial intelligence 
  • Combines in one volume two of the most intriguing topics in machine learning: ensemble learning and classification trees

Written for researchers from many fields such as biostatistics, economics, environment, zoology, as well as students of data mining and machine learning, Ensemble Classification Methods with Applications in R puts the focus on two topics in machine learning: classification trees and ensemble learning.

 

Related Resources

Instructor

Request an Evaluation Copy for this title

Contributors v

Preface xix

1 Introduction 1

1.1 Introduction 1

1.2 Definition 2

1.3 Taxonomy of supervised classification methods 3

1.4 Estimation of the accuracy of a classification system 5

1.4.1 The apparent error rate 6

1.4.2 Estimation of the true error rate 7

1.4.3 Error rates estimation methods 8

1.4.4 The standard error 12

1.5 Classification trees 12

1.5.1 Classification tree building 15

1.5.2 Splitting rule 16

1.5.3 Splitting criteria 18

1.5.4 Goodness of a split 19

1.5.5 The impurity of a tree 20

1.5.6 Stopping Criteria 21

1.5.7 Overfitting in classification trees 23

1.5.8 Pruning rules 26

2 Limitation of the individual classifiers 33

2.1 Introduction 33

2.2 Error decomposition. Bias and variance 35

2.3 Study of classifier instability 42

2.4 Advantages of ensemble classifiers 47

2.5 Bayesian perspective of ensemble classifiers 51

3 Ensemble classifiers methods 53

3.1 Introduction 53

3.2 Taxonomy of ensemble methods 54

3.2.1 Non-generative methods 56

3.2.2 Generative methods 57

3.3 Bagging 59

3.4 Boosting 63

3.4.1 AdaBoost training error 70

3.4.2 AdaBoost and the margin theory 72

3.4.3 Other boosting versions 75

3.4.4 Comparing Bagging and Boosting 81

3.5 Random forests 82

3.6 Out-of-bag estimations 88

4 Classification with individual and ensemble trees in R 91

4.1 Introduction 91

4.2 adabag: an R package for classification with boosting and bagging 94

4.2.1 The bagging, predict.bagging and bagging.cv functions 102

4.2.2 The boosting, predict.boosting and boosting.cv functions 119

4.2.3 The margins, plot.margins, errorevol and plot.errorevol functions 130

4.2.4 The MarginOrderedPruning.Bagging function 137

4.3 The “German Credit" example 144

4.3.1 Classification tree 148

4.3.2 Combination using bagging 154

4.3.3 Combination using boosting 158

4.3.4 Combination using random forest 162

4.3.5 Cross-validation comparation 170

5 Bankrupcty prediction through ensemble trees 173

5.1 Introduction 173

5.2 Problem description 174

5.3 Applications 178

5.3.1 The dichotomous case 178

5.3.2 The three-class case 195

5.4 Conclusions 207

6 Experiments with Adabag in biology classification tasks 209

6.1 Classification of color texture feature patterns extracted from cells in histological images of fish ovary 209

6.2 Direct Kernel Perceptron (DKP): ultra-fast kernel ELM-based classi_cation with non-iterative closed-form weight calculation 213

6.3 Do we need hundreds of classi_ers to solve real world classification problems? 218

6.4 On the use of nominal and ordinal classifiers for the discrimination of stages of development in fish oocytes 224

7 Generalization bounds for ranking algorithms 231

7.1 Introduction 231

7.2 Assumptions, main theorem and application 234

7.3 Experiments 237

7.4 Conclusions 239

8 Classification and regression trees for analysing irrigation decisions 241

8.1 Introduction 241

8.2 Theory 245

8.3 Case study and methods 247

8.3.1 Study site and data available 247

8.3.2 Model, specifications and performance evaluation 251

8.4 Results and discussion 253

8.5 Conclusions 259

9 Boosted rule learner and its properties 263

9.1 Introduction 263

9.2 Separate-and-conquer 266

9.3 Boosting in rule induction 268

9.4 Experiments 270

9.5 Conclusions 274

10 Credit scoring with individuals and ensemble trees 277

10.1 Introduction 277

10.2 Measures of accuracy 279

10.3 Data description 281

10.4 Classification of borrowers applying ensemble trees 286

10.5 Conclusions 293

11 An overview of multiple classifier systems based on GAM 295

11.1 Introduction 295

11.2 Multiple classifier systems based on Generalized Additive Models 297

11.2.1 Generalized Additive Models 298

11.2.2 GAM-based multiple classifier systems 300

11.2.3 GAMensPlus: extending GAMens for advanced interpretability 303

11.3 Experiments and applications 304

11.3.1 A multi-domain benchmark study of GAM-based ensemble classifiers 305

11.3.2 Benchmarking GAM-based ensemble classifiers inpredictive customer analytics 307

11.3.3 A case study of GAMensPlus to customer churn prediction in financial services 310

11.4 Software implementation in R: the GAMens package 314

11.5 Conclusions 314

Bibliography 317