Textbook
Quantitative Methods In LinguisticsISBN: 9781405144254
296 pages
March 2008, ©2008, WileyBlackwell

Quantitative Methods in Linguistics introduces the general strategies and methods of quantitative analysis. The book dedicates individual chapters to phonetics, psycholinguistics, sociolinguistics, historical linguistics, and syntax, as well as two introductory chapters on probability distribution and quantitative methods.
Each chapter uses actual data sets which have been contributed by researchers working in the field to illustrate key principles. The book also provides detailed instruction in practical aspects of handling quantitative linguistic data by using statistical software package (R) to discover patterns in quantitative data and to test linguistic hypotheses. Endofchapter assignments and a balanced presentation make this an ideal text for students.
Further information and resources are available from the accompanying website at www.blackwellpublishing.com/quantmethods.
Each chapter uses actual data sets which have been contributed by researchers working in the field to illustrate key principles. The book also provides detailed instruction in practical aspects of handling quantitative linguistic data by using statistical software package (R) to discover patterns in quantitative data and to test linguistic hypotheses. Endofchapter assignments and a balanced presentation make this an ideal text for students.
Further information and resources are available from the accompanying website at www.blackwellpublishing.com/quantmethods.
See More
Acknowledgements.
Design of the book.
1. Fundamentals of quantitative analysis.
1.1 What we accomplish in quantitative analysis.
1.2 How to describe an observation.
1.3 Frequency distributions  a fundamental building block of quantitative analysis 1.4 Types of distributions.
1.5 Is normal data, well, normal?.
1.6 Measures of Central Tendency.
1.7 Measures of Dispersion.
1.8 Standard deviation of the normal distribution.
1.9 Exercises.
2. Patterns and tests.
2.1 Sampling.
2.2 Data.
2.3 Hypothesis testing.
2.3.1 The Central Limit Theorem.
2.3.2 Score keeping.
2.3.3 H0: ? = 100.
2.3.4 Type 1 and Type II error.
2.4 Correlation.
2.4.1 Covariance and correlation.
2.4.2 The regression line.
2.4.3 Amount of variance accounted for.
2.5 Exercises.
3. Phonetics.
3.1 Comparing mean values.
3.1.1 Cherokee Voice Onset Time: ?1971=?2001.
3.1.2 Samples have equal variance.
3.1.3 If the samples do not have equal variance.
3.1.4 Paired t test: Are men different from women?.
3.1.5 The sign test.
3.2 Predicting the back of the tongue from the front: Multiple regression.
3.2.1 The covariance matrix.
3.2.2 More than one slope: the bi.
3.2.3 Selecting a model.
3.3 Tongue shape factors: Principal components analysis.
3.4 Exercises.
4. Psycholinguistics.
4.1 Analysis of Variance: One factor, more than two levels.
4.2 Two factors  interaction.
4.3 Repeated measures.
4.3.1 An example of repeated measures ANOVA.
4.3.2 Repeated measures ANOVA with a betweensubjects factor.
4.4 The "language as fixed effect" fallacy.
4.5 Exercises.
5. Sociolinguistics.
5.1 When the data are counts  contingency tables.
5.1.2 Frequency in a contingency table.
5.2 Working with probabilities  the binomial distribution.
5.2.1 Bush or Kerry?.
5.3 An aside about Maximum Likelihood Estimation.
5.4 Logistic regression.
5.5 An example from the [S]treets of Columbus.
5.5.1 On the relationship between x2 and G2.
5.5.2 More than one predictor.
5.6 Logistic regression as regression: An ordinal effect  age.
5.7 Varbrul/R comparison.
5.8 Exercises.
6. Historical linguistics.
6.1 Cladistics: Where linguistics and evolutionary biology meet.
6.2 Clustering on the basis of shared vocabulary.
6.3 Cladistic analysis: Combining characterbased subtrees.
6.4 Clustering on the basis of spelling similarity.
6.5 Multidimensional Scaling  a language similarity space.
6.6 Exercises.
7. Syntax.
7.1 Measuring sentence acceptability.
7.2 A psychogrammatical law?.
7.3 Linear mixed effects in the syntactic expression of agents in English.
7.3.1 Linear regression  overall, and separately by verbs.
7.3.2 Fitting a linear mixed effects model  fixed and random effects.
7.3.3 Fitting five more mixed effects models  finding the best model.
7.4 Predicting the dative alternation  logistic modeling of syntactic corpora data.
7.4.1 Logistic model of dative alternation.
7.4.2 Evaluating the fit of the model.
7.4.3 Adding a random factor  mixed effects logistic regression.
7.5 Exercises.
Appendix 7.A.
References
Design of the book.
1. Fundamentals of quantitative analysis.
1.1 What we accomplish in quantitative analysis.
1.2 How to describe an observation.
1.3 Frequency distributions  a fundamental building block of quantitative analysis 1.4 Types of distributions.
1.5 Is normal data, well, normal?.
1.6 Measures of Central Tendency.
1.7 Measures of Dispersion.
1.8 Standard deviation of the normal distribution.
1.9 Exercises.
2. Patterns and tests.
2.1 Sampling.
2.2 Data.
2.3 Hypothesis testing.
2.3.1 The Central Limit Theorem.
2.3.2 Score keeping.
2.3.3 H0: ? = 100.
2.3.4 Type 1 and Type II error.
2.4 Correlation.
2.4.1 Covariance and correlation.
2.4.2 The regression line.
2.4.3 Amount of variance accounted for.
2.5 Exercises.
3. Phonetics.
3.1 Comparing mean values.
3.1.1 Cherokee Voice Onset Time: ?1971=?2001.
3.1.2 Samples have equal variance.
3.1.3 If the samples do not have equal variance.
3.1.4 Paired t test: Are men different from women?.
3.1.5 The sign test.
3.2 Predicting the back of the tongue from the front: Multiple regression.
3.2.1 The covariance matrix.
3.2.2 More than one slope: the bi.
3.2.3 Selecting a model.
3.3 Tongue shape factors: Principal components analysis.
3.4 Exercises.
4. Psycholinguistics.
4.1 Analysis of Variance: One factor, more than two levels.
4.2 Two factors  interaction.
4.3 Repeated measures.
4.3.1 An example of repeated measures ANOVA.
4.3.2 Repeated measures ANOVA with a betweensubjects factor.
4.4 The "language as fixed effect" fallacy.
4.5 Exercises.
5. Sociolinguistics.
5.1 When the data are counts  contingency tables.
5.1.2 Frequency in a contingency table.
5.2 Working with probabilities  the binomial distribution.
5.2.1 Bush or Kerry?.
5.3 An aside about Maximum Likelihood Estimation.
5.4 Logistic regression.
5.5 An example from the [S]treets of Columbus.
5.5.1 On the relationship between x2 and G2.
5.5.2 More than one predictor.
5.6 Logistic regression as regression: An ordinal effect  age.
5.7 Varbrul/R comparison.
5.8 Exercises.
6. Historical linguistics.
6.1 Cladistics: Where linguistics and evolutionary biology meet.
6.2 Clustering on the basis of shared vocabulary.
6.3 Cladistic analysis: Combining characterbased subtrees.
6.4 Clustering on the basis of spelling similarity.
6.5 Multidimensional Scaling  a language similarity space.
6.6 Exercises.
7. Syntax.
7.1 Measuring sentence acceptability.
7.2 A psychogrammatical law?.
7.3 Linear mixed effects in the syntactic expression of agents in English.
7.3.1 Linear regression  overall, and separately by verbs.
7.3.2 Fitting a linear mixed effects model  fixed and random effects.
7.3.3 Fitting five more mixed effects models  finding the best model.
7.4 Predicting the dative alternation  logistic modeling of syntactic corpora data.
7.4.1 Logistic model of dative alternation.
7.4.2 Evaluating the fit of the model.
7.4.3 Adding a random factor  mixed effects logistic regression.
7.5 Exercises.
Appendix 7.A.
References
See More
Keith Johnson is a professor of linguistics at the University of California at Berkeley. He is the author of Acoustic and Auditory Phonetics (second edition, Blackwell, 2002), as well as numerous articles on phonetics and speech perception.
See More
 Introduces the general strategies and methods of quantitative analysis for use in linguistic research
 Provides balanced treatment of the practical aspects of handling quantitative linguistic data
 Includes sample datasets contributed by researchers working in a variety of subdisciplines of linguistics
 Uses R, the statistical software package most commonly used by linguists, to discover patterns in quantitative data and to test linguistic hypotheses
 Features studentfriendly endofchapter assignments and is accompanied by online resources.
See More
"As research in the language sciences becomes more interdisciplinary, students must become proficient in a wider range of data analysis methods. Johnson’s text is a comprehensive and detailed introduction to some of the most widely used statistical methods in language research. The book teaches by example, walking the reader through the analysis of data sets using the software package R, which provides concrete understanding of how to apply the methods, not just understand them conceptually. This is a good practical text, one that can serve as a handbook, and is appropriate for graduate students and advanced undergraduates who are doing research in the broad field of language." Mark A Pitt, Ohio State University
"Johnson's book is a catalyst for change in linguistics. Increasingly, the subjective, impressionistic data collection method is being replaced by objective, quantitative measurements. This book serves an important function in this process leading students stepbystep toward using statistical methods to analyze complex data." Chilin Shih, University of Illinois at UrbanaChampaign
"This rich and rewarding textbook is a mustread for all students and researchers who wish to follow the new wave of sophisticated empirical models and methods now sweeping the field of linguistics from phonetics to syntax and semantics." Joan Bresnan, Stanford University
See More
Download Title  Size  Download 

Data Sets and Scripts This file is stored in a ZIP archive. If your computer is not capable of opening ZIP archives, you can download a trial version of WinZip at WinZip.com. 

2. Patterns and Tests (.zip) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chisquare distribution (chisq). 
4.27 KB  Click to Download 
2. Patterns and Tests (.rar) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chisquare distribution (chisq). 
4.00 KB  Click to Download 
3. Phonetics (.zip) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Madeup regression example (regression.txt) 
9.15 KB  Click to Download 
3. Phonetics (.rar) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Madeup regression example (regression.txt) 
8.98 KB  Click to Download 
4. Psycholinguistics (.zip) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanishspeaking and Englishspeaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. Data: Boomershine's dialect identification data for exercise 5. 
16.10 KB  Click to Download 
4. Psycholinguistics (.rar) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanishspeaking and Englishspeaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. 
9.26 KB  Click to Download 
5. Sociolinguistics (.zip) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. 
9.97 KB  Click to Download 
5. Sociolinguistics (.rar) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. 
4.95 KB  Click to Download 
6. Historical Linguistics (.zip) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 IndoEuropean languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IElists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. 
139.78 KB  Click to Download 
6. Historical Linguistics (.rar) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 IndoEuropean languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IElists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. 
132.64 KB  Click to Download 
7. Syntax (.zip) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL2005. Script: Crossvalidation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. 
265.36 KB  Click to Download 
7. Syntax (.rar) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL2005. Script: Crossvalidation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. 
218.03 KB  Click to Download 
See More