Data Mining for Business Analytics: Concepts, Techniques, and Applications in RISBN: 9781118879368
584 pages
September 2017

Description
"This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject."
 Gareth M. James, University of Southern California and coauthor (with Witten, Hastie and Tibshirani), of the bestselling book "An Introduction to Statistical Learning, with Applications in R”
Incorporating an innovative focus on data visualization and time series forecasting, Data Mining for Business Analytics supplies insightful, detailed guidance on fundamental data mining techniques. The book guides readers through the use of the freelyavailable R software for developing predictive models and techniques in order to describe and find patterns in data. The authors use interesting, realworld examples to build a theoretical and practical understanding of key data mining methods. The book includes discussions of R subroutines, allowing readers to work handson with the provided data. Throughout the book, applications of the discussed topics focus on the business problem as motivation and avoid unnecessary statistical theory. Each chapter concludes with exercises that allow readers to expand their comprehension of the presented material. Over a dozen cases that require use of the different data mining techniques are introduced, and a related Web site features over two dozen data sets, exercise solutions, PowerPoint slides, and case solutions. Modern topics include text analytics, recommender systems, social network analysis, getting data from a database into the analytics process, and scoring and employing the results of an analysis to a database.
Table of Contents
Foreword 17
Preface to the R Edition 19
Acknowledgments 22
PART I PRELIMINARIES
CHAPTER 1 Introduction 3
1.1 What is Business Analytics? 3
1.2 What is Data Mining? 5
1.3 Data Mining and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 11
Order of Topics 11
CHAPTER 2 Overview of the Data Mining Process 17
2.1 Introduction 17
2.2 Core Ideas in Data Mining 18
Classification 18
Prediction 18
Association Rules and Recommendation Systems 18
Predictive Analytics 19
Data Reduction and Dimension Reduction 19
Data Exploration and Visualization 19
Supervised and Unsupervised Learning 20
2.3 The Steps in Data Mining 21
2.4 Preliminary Steps 23
Organization of Datasets 23
Predicting Home Values in the West Roxbury Neighborhood 23
Loading and looking at the data in R 24
Sampling from a Database 26
Oversampling Rare Events in Classification Tasks 27
Preprocessing and Cleaning the Data 28
2.5 Predictive Power and Overfitting 35
Overfitting 35
Creation and Use of Data Partitions 37
2.6 Building a Predictive Model 41
Modeling Process 41
2.7 Using R for Data Mining on a Local Machine 46
2.8 Automating Data Mining Solutions 46
Data Mining Software: The State of the Market (by Herb Edelstein) 47
Problems 51
PART II DATA EXPLORATION AND DIMENSION REDUCTION
CHAPTER 3 Data Visualization 57
3.1 Uses of Data Visualization 57
Base R or ggplot? 59
3.2 Data Examples 59
Example 1: Boston Housing Data 59
Example 2: Ridership on Amtrak Trains 61
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 61
Distribution Plots: Boxplots and Histograms 62
Heatmaps: Visualizing Correlations and Missing Values 66
3.4 MultiDimensional Visualization 67
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 70
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 72
Reference: Trend Lines and Labels 75
Scaling up to Large Datasets 77
Multivariate Plot: Parallel Coordinates Plot 77
Interactive Visualization 81
3.5 Specialized Visualizations 83
Visualizing Networked Data 83
Visualizing Hierarchical Data: Treemaps 86
Visualizing Geographical Data: Map Charts 86
3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 90
Prediction 90
Classification 90
Time Series Forecasting 90
Unsupervised Learning 91
Problems 92
CHAPTER 4 Dimension Reduction 95
4.1 Introduction 95
4.2 Curse of Dimensionality 96
4.3 Practical Considerations 96
Example 1: House Prices in Boston 97
4.4 Data Summaries 98
Summary Statistics 98
Aggregation and Pivot Tables 100
4.5 Correlation Analysis 103
4.6 Reducing the Number of Categories in Categorical Variables 103
4.7 Converting A Categorical Variable to A Numerical Variable 104
4.8 Principal Components Analysis 105
Example 2: Breakfast Cereals 105
Principal Components 111
Normalizing the Data 111
Using Principal Components for Classification and Prediction 115
4.9 Dimension Reduction Using Regression Models 115
4.10 Dimension Reduction Using Classification and Regression Trees 117
Problems 118
PART III PERFORMANCE EVALUATION
CHAPTER 5 Evaluating Predictive Performance 123
5.1 Introduction 123
5.2 Evaluating Predictive Performance 124
Naive Benchmark: The Average 124
Prediction Accuracy Measures 125
Comparing Training and Validation Performance 127
Lift Chart 127
5.3 Judging Classifier Performance 128
Benchmark: The Naive Rule 130
Class Separation 130
The Confusion (Classification) Matrix 130
Using the Validation Data 132
Accuracy Measures 133
Propensities and Cutoff for Classification 133
Performance in Case of Unequal Importance of Classes 138
Asymmetric Misclassification Costs 140
Generalization to More Than Two Classes 142
5.4 Judging Ranking Performance 143
Lift Charts for Binary Data 143
Decile Lift Charts 146
Beyond Two Classes 146
Lift Charts Incorporating Costs and Benefits 147
Lift as a Function of Cutoff 147
5.5 Oversampling 148
Oversampling the Training Set 151
Evaluating Model Performance Using a Nonoversampled Validation Set 151
Evaluating Model Performance If Only Oversampled Validation Set Exists 151
Problems 154
PART IV PREDICTION AND CLASSIFICATION METHODS
CHAPTER 6 Multiple Linear Regression 159
6.1 Introduction 159
6.2 Explanatory vs. Predictive Modeling 160
6.3 Estimating the Regression Equation and Prediction 162
Example: Predicting the Price of Used Toyota Corolla Cars 162
6.4 Variable Selection in Linear Regression 168
Reducing the Number of Predictors 168
How to Reduce the Number of Predictors 169
Problems 176
CHAPTER 7 kNearest Neighbors (kNN) 181
7.1 The kNN Classifier (categorical outcome) 181
Determining Neighbors 181
Classification Rule 182
Example: Riding Mowers 183
Choosing k 184
Setting the Cutoff Value 186
kNN with More Than Two Classes 186
Converting Categorical Variables to Binary Dummies 188
7.2 kNN for a Numerical Response 188
7.3 Advantages and Shortcomings of kNN Algorithms 190
Problems 192
CHAPTER 8 The Naive Bayes Classifier 195
8.1 Introduction 195
Cutoff Probability Method 196
Conditional Probability 196
Example 1: Predicting Fraudulent Financial Reporting 196
8.2 Applying the Full (Exact) Bayesian Classifier 197
Using the “Assign to the Most Probable Class” Method 198
Using the Cutoff Probability Method 198
Practical Difficulty with the Complete (Exact) Bayes Procedure 198
Solution: Naive Bayes 199
The Naive Bayes Assumption of Conditional Independence 200
Using the Cutoff Probability Method 200
Example 2: Predicting Fraudulent Financial Reports, Two Predictors 201
Example 3: Predicting Delayed Flights 202
8.3 Advantages and Shortcomings of the Naive Bayes Classifier 207
Problems 210
CHAPTER 9 Classification and Regression Trees 213
9.1 Introduction 213
9.2 Classification Trees 215
Recursive Partitioning 215
Example 1: Riding Mowers 215
Measures of Impurity 218
Tree Structure 220
Classifying a New Record 221
9.3 Evaluating the Performance of a Classification Tree 223
Example 2: Acceptance of Personal Loan 223
9.4 Avoiding Overfitting 229
Stopping Tree Growth: Conditional Inference Trees 229
Pruning the Tree 230
CrossValidation 230
Best Pruned Tree 234
9.5 Classification Rules from Trees 235
9.6 Classification Trees for More Than two Classes 235
9.7 Regression Trees 236
Prediction 237
Measuring Impurity 237
Evaluating Performance 237
9.8 Improving Prediction: Random Forests and Boosted Trees 238
Random Forests 238
Boosted Trees 240
9.9 Advantages and Weaknesses of a Tree 241
Problems 243
CHAPTER 10 Logistic Regression 247
10.1 Introduction 247
10.2 The Logistic Regression Model 249
10.3 Example: Acceptance of Personal Loan 250
Model with a Single Predictor 252
Estimating the Logistic Model from Data: Computing Parameter Estimates 253
Interpreting Results in Terms of Odds (for a Profiling Goal) 256
10.4 Evaluating Classification Performance 257
Variable Selection 258
10.5 Example of Complete Analysis: Predicting Delayed Flights 261
Data Preprocessing 265
Model Fitting and Estimation 265
Model Interpretation 265
Model Performance 265
Variable Selection 267
10.6 Appendix: Logistic Regression for Profiling 271
Appendix A: Why Linear Regression Is Problematic for a Categorical Outcome 271
Appendix B: Evaluating Explanatory Power 272
Appendix C: Logistic Regression for More Than Two Classes 276
Problems 280
CHAPTER 11 Neural Nets 283
11.1 Introduction 283
11.2 Concept and Structure of a Neural Network 284
11.3 Fitting a Network to Data 285
Example 1: Tiny Dataset 285
Computing Output of Nodes 286
Preprocessing the Data 289
Training the Model 290
Example 2: Classifying Accident Severity 294
Avoiding Overfitting 295
Using the Output for Prediction and Classification 295
11.4 Required User Input 297
11.5 Exploring the Relationship Between Predictors and Outcome 299
11.6 Advantages and Weaknesses of Neural Networks 301
Problems 302
CHAPTER 12 Discriminant Analysis 305
12.1 Introduction 305
Example 1: Riding Mowers 306
Example 2: Personal Loan Acceptance 306
12.2 Distance of a Record from a Class 308
12.3 Fisher’s Linear Classification Functions 309
12.4 Classification Performance of Discriminant Analysis 312
12.5 Prior Probabilities 314
12.6 Unequal Misclassification Costs 314
12.7 Classifying More Than Two Classes 315
Example 3: Medical Dispatch to Accident Scenes 315
12.8 Advantages and Weaknesses 319
Problems 320
CHAPTER 13 Combining Methods: Ensembles and Uplift Modeling 323
13.1 Ensembles 323
Why Ensembles Can Improve Predictive Power 324
Simple Averaging 326
Bagging 327
Boosting 327
Bagging and Boosting in R 327
Advantages and Weaknesses of Ensembles 327
13.2 Uplift (Persuasion) Modeling 330
12 CONTENTS
AB Testing 330
Uplift 331
Gathering the Data 331
A Simple Model 333
Modeling Individual Uplift 333
Computing Uplift with R 334
Using the Results of an Uplift Model 336
13.3 Summary 336
Problems 337
PART V MINING RELATIONSHIPS AMONG RECORDS
CHAPTER 14 Association Rules and Collaborative Filtering 341
14.1 Association Rules 342
Discovering Association Rules in Transaction Databases 342
Example 1: Synthetic Data on Purchases of Phone Faceplates 342
Generating Candidate Rules 344
The Apriori Algorithm 345
Selecting Strong Rules 345
Data Format 347
The Process of Rule Selection 349
Interpreting the Results 349
Rules and Chance 351
Example 2: Rules for Similar Book Purchases 353
14.2 Collaborative Filtering 355
Data Type and Format 355
Example 3: Netflix Prize Contest 356
UserBased Collaborative Filtering: “People Like You” 357
ItemBased Collaborative Filtering 360
Advantages and Weaknesses of Collaborative Filtering 360
Collaborative Filtering vs. Association Rules 362
14.3 Summary 363
Problems 365
CHAPTER 15 Cluster Analysis 369
15.1 Introduction 369
Example: Public Utilities 371
15.2 Measuring Distance Between Two Records 373
Euclidean Distance 373
Normalizing Numerical Measurements 374
Other Distance Measures for Numerical Data 374
Distance Measures for Categorical Data 377
Distance Measures for Mixed Data 378
15.3 Measuring Distance Between Two Clusters 378
Minimum Distance 378
Maximum Distance 378
Average Distance 379
Centroid Distance 379
15.4 Hierarchical (Agglomerative) Clustering 381
Single Linkage 381
Complete Linkage 382
Average Linkage 382
Centroid Linkage 382
Ward’s Method 382
Dendrograms: Displaying Clustering Process and Results 383
Validating Clusters 385
Limitations of Hierarchical Clustering 388
15.5 Nonhierarchical Clustering: The kMeans Algorithm 388
Choosing The Number of Clusters (k) 390
Problems 395
PART VI FORECASTING TIME SERIES
CHAPTER 16 Handling Time Series 401
16.1 Introduction 401
16.2 Descriptive vs. Predictive Modeling 403
16.3 Popular Forecasting Methods in Business 403
Combining Methods 403
16.4 Time Series Components 404
Example: Ridership on Amtrak Trains 404
16.5 Data Partitioning and Performance Evaluation 409
Benchmark Performance: Naive Forecasts 410
Generating Future Forecasts 412
Problems 413
CHAPTER 17 RegressionBased Forecasting 417
17.1 A Model with Trend 417
Linear Trend 417
Exponential Trend 421
Polynomial Trend 423
17.2 A Model with Seasonality 423
17.3 A model with trend and seasonality 428
17.4 Autocorrelation and ARIMA Models 430
Computing Autocorrelation 430
Improving Forecasts by Integrating Autocorrelation Information 433
Evaluating Predictability 437
Problems 439
CHAPTER 18 Smoothing Methods 449
18.1 Introduction 449
18.2 Moving Average 450
Centered Moving Average for Visualization 450
Trailing Moving Average for Forecasting 451
Choosing Window Width (w) 455
18.3 Simple Exponential Smoothing 455
Choosing Smoothing Parameter _ 456
Relation Between Moving Average and Simple Exponential Smoothing 458
18.4 Advanced Exponential Smoothing 458
Series with a Trend 458
Series with a Trend and Seasonality 459
Series with Seasonality (No Trend) 460
Problems 463
PART VII DATA ANALYTICS
CHAPTER 19 Social Network Analytics 473
19.1 Introduction 473
19.2 Directed vs. Undirected Networks 475
19.3 Visualizing and analyzing networks 476
Graph Layout 476
Edge List 479
Adjacency Matrix 479
Using Network Data in Classification and Prediction 479
19.4 Social Data Metrics and Taxonomy 480
NodeLevel Centrality Metrics 481
Egocentric Network 482
Network Metrics 484
19.5 Using Network Metrics in Prediction and Classification 486
Link Prediction 486
Entity Resolution 486
Collaborative Filtering 489
Collecting Social Network Data With R 491
Advantages and Disadvantages 493
Problems 495
CHAPTER 20 Text Mining 497
20.1 Introduction1 497
20.2 The Tabular Representation of Text: TermDocument Matrix and “BagofWords” 498
20.3 BagofWords vs. Meaning Extraction at Document Level 499
20.4 Preprocessing the Text 500
Tokenization 502
Text Reduction 503
Presence/Absence vs. Frequency 505
Term Frequency  Inverse Document Frequency (TFIDF) 505
From Terms to Concepts: Latent Semantic Indexing 506
Extracting Meaning 507
20.5 Implementing data mining methods 507
20.6 Example: Online Discussions on Autos and Electronics 508
Importing and Labeling the Records 508
Text Preprocessing in R 510
Producing a Concept Matrix 510
Fitting a Predictive Model 510
Prediction 512
20.7 Summary 512
Problems 513
PART VIII CASES
CHAPTER 21 Cases 517
21.1 Charles Book Club2 517
The Book Industry 517
Database Marketing at Charles 518
Data Mining Techniques 520
Assignment 522
21.2 German Credit 524
Background 524
Data 524
Assignment 528
21.3 Tayko Software Cataloger3 529
Background 529
The Mailing Experiment 529
Data 529
Assignment 530
21.4 Political Persuasion4 533
Background 533
Predictive Analytics Arrives in US Politics 533
Political Targeting 533
Uplift 534
Data 535
Assignment 535
21.5 Taxi Cancellations5 537
Business Situation 537
Assignment 537
21.6 Segmenting Consumers of Bath Soap6 539
Business Situation 539
Key Problems 539
Data 540
Measuring Brand Loyalty 540
Assignment 540
21.7 DirectMail Fundraising 543
Background 543
Data 543
Assignment 543
21.8 Catalog CrossSelling7 546
Background 546
Assignment 546
21.9 Predicting Bankruptcy 548
Predicting Corporate Bankruptcy 548
Assignment 549
21.10 Time Series Case: Forecasting Public Transportation Demand 551
Background 551
Problem Description 551
Available Data 551
Assignment Goal 551
Assignment 552
Tips and Suggested Steps 552
References 553
Data Files Used in the Book 555
Index