Textbook
Quantitative Methods In LinguisticsISBN: 9781405144254
296 pages
March 2008, ©2008, WileyBlackwell

Description
 Provides balanced treatment of the practical aspects of handling quantitative linguistic data
 Includes sample datasets contributed by researchers working in a variety of subdisciplines of linguistics
 Uses R, the statistical software package most commonly used by linguists, to discover patterns in quantitative data and to test linguistic hypotheses
 Includes studentfriendly endofchapter assignments and is accompanied by online resources at available in the 'Downloads' section, below
Table of Contents
Design of the Book.
1. Fundamentals of Quantitative Analysis.
1.1 What We Accomplish in Quantitative Analysis.
1.2 How to Describe an Observation.
1.3 Frequency Distributions: A Fundamental Building Block of Quantitative Analysis.
1.4 Types of Distributions.
1.5 Is Normal Data, Well, Normal?.
1.6 Measures of Central Tendency.
1.7 Measures of Dispersion.
1.8 Standard Deviation of the Normal Distribution.
Exercises.
2. Patterns and Tests.
2.1 Sampling.
2.2 Data.
2.3 Hypothesis Testing.
2.3.1 The Central Limit Theorem.
2.3.2 Score Keeping.
2.3.3 H0: µ = 100.
2.3.4 Type I and Type II Error.
2.4 Correlation.
2.4.1 Covariance and Correlation.
2.4.2 The Regression Line.
2.4.3 Amount of Variance Accounted For.
Exercises.
3. Phonetics.
3.1 Comparing Mean Values.
3.1.1 Cherokee Voice Onset Time: µ1971=µ2001.
3.1.2 Samples Have Equal Variance.
3.1.3 If the Samples Do Not Have Equal Variance.
3.1.4 Paired t Test: Are Men Different from Women?.
3.1.5 The Sign Test.
3.2 Predicting the Back of the Tongue from the Front: Multiple Regression.
3.2.1 The Covariance Matrix.
3.2.2 More than One slope: The bi.
3.2.3 Selecting a Model.
3.3 Tongue Shape Factors: Principal Components Analysis.
Exercises.
4. Psycholinguistics.
4.1 Analysis of Variance: One Factor, More than Two Levels.
4.2 Two Factors: Interaction.
4.3 Repeated Measures.
4.3.1 An Example of Repeated Measures ANOVA.
4.3.2 Repeated Measures ANOVA with a BetweenSubjects Factor.
4.4 The “Language as Fixed Effect” Fallacy.
4.5 Exercises.
5. Sociolinguistics.
5.1 When the Data are Counts  Contingency Tables.
5.1.1 Frequency in a Contingency Table.
5.2 Working with Probabilities: The Binomial Distribution.
5.2.1 Bush or Kerry?.
5.3 An Aside about Maximum Likelihood Estimation.
5.4 Logistic Regression.
5.5 An Example from the [∫]treets of Columbus.
5.5.1 On the Relationship between x2 and G2.
5.5.2 More than One Predictor.
5.6 Logistic Regression as Regression: An Ordinal Effect  Age.
5.7 Varbrul/R Comparison.
Exercises.
6. Historical Linguistics.
6.1 Cladistics: Where Linguistics and Evolutionary Biology Meet.
6.2 Clustering on the Basis of Shared Vocabulary.
6.3 Cladistic Analysis: Combining CharacterBased Subtrees.
6.4 Clustering on the Basis of Spelling Similarity.
6.5 Multidimensional Scaling: A Language Similarity Space.
Exercises.
7. Syntax.
7.1 Measuring Sentence Acceptability.
7.2 A Psychogrammatical Law?.
7.3 Linear Mixed Effects in the Syntactic Expression of Agents in English.
7.3.1 Linear Regression: Overall, and Separately by Verbs.
7.3.2 Fitting a Linear MixedEffects Model: Fixed and Random Effects.
7.3.3 Fitting Five More MixedEffects Models: Finding the Best Model.
7.4 Predicting the Dative Alternation: Logistic Modeling of Syntactic Corpora Data.
7.4.1 Logistic Model of Dative Alternation.
7.4.2 Evaluating the Fit of the Model.
7.4.3 Adding a Random Factor: Mixed Effects Logistic Regression.
Exercises.
Appendix 7A.
References.
Index
Author Information
The Wiley Advantage
 Introduces the general strategies and methods of quantitative analysis for use in linguistic research
 Provides balanced treatment of the practical aspects of handling quantitative linguistic data
 Includes sample datasets contributed by researchers working in a variety of subdisciplines of linguistics
 Uses R, the statistical software package most commonly used by linguists, to discover patterns in quantitative data and to test linguistic hypotheses
 Features studentfriendly endofchapter assignments
Reviews
"Johnson's book is a catalyst for change in linguistics.
Increasingly, the subjective, impressionistic data collection
method is being replaced by objective, quantitative measurements.
This book serves an important function in this process leading
students stepbystep toward using statistical methods to analyze
complex data." Chilin Shih, University of Illinois at
UrbanaChampaign
"This rich and rewarding textbook is a mustread for all students and researchers who wish to follow the new wave of sophisticated empirical models and methods now sweeping the field of linguistics from phonetics to syntax and semantics." Joan Bresnan, Stanford University
Downloads
Download Title  Size  Download 

Data Sets and Scripts This file is stored in a ZIP archive. If your computer is not capable of opening ZIP archives, you can download a trial version of WinZip at WinZip.com. 

2. Patterns and Tests (.zip) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chisquare distribution (chisq). 
4.27 KB  Click to Download 
2. Patterns and Tests (.rar) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chisquare distribution (chisq). 
4.00 KB  Click to Download 
3. Phonetics (.zip) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Madeup regression example (regression.txt) 
9.15 KB  Click to Download 
3. Phonetics (.rar) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Madeup regression example (regression.txt) 
8.98 KB  Click to Download 
4. Psycholinguistics (.zip) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanishspeaking and Englishspeaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. Data: Boomershine's dialect identification data for exercise 5. 
16.10 KB  Click to Download 
4. Psycholinguistics (.rar) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanishspeaking and Englishspeaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. 
9.26 KB  Click to Download 
5. Sociolinguistics (.zip) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. 
9.97 KB  Click to Download 
5. Sociolinguistics (.rar) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. 
4.95 KB  Click to Download 
6. Historical Linguistics (.zip) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 IndoEuropean languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IElists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. 
139.78 KB  Click to Download 
6. Historical Linguistics (.rar) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 IndoEuropean languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IElists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. 
132.64 KB  Click to Download 
7. Syntax (.zip) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL2005. Script: Crossvalidation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. 
265.36 KB  Click to Download 
7. Syntax (.rar) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL2005. Script: Crossvalidation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. 
218.03 KB  Click to Download 