# Quantitative Methods In Linguistics

# Quantitative Methods In Linguistics

ISBN: 978-1-444-36043-1 September 2011 Wiley-Blackwell 296 Pages

## Description

*Quantitative Methods in Linguistics*offers a practical introduction to statistics and quantitative analysis with data sets drawn from the field and coverage of phonetics, psycholinguistics, sociolinguistics, historical linguistics, and syntax, as well as probability distribution and quantitative methods.

- Provides balanced treatment of the practical aspects of handling quantitative linguistic data
- Includes sample datasets contributed by researchers working in a variety of sub-disciplines of linguistics
- Uses R, the statistical software package most commonly used by linguists, to discover patterns in quantitative data and to test linguistic hypotheses
- Includes student-friendly end-of-chapter assignments and is accompanied by online resources at available in the 'Downloads' section, below

## Table of contents

Design of the Book.

**1. Fundamentals of Quantitative Analysis**.

1.1 What We Accomplish in Quantitative Analysis.

1.2 How to Describe an Observation.

1.3 Frequency Distributions: A Fundamental Building Block of Quantitative Analysis.

1.4 Types of Distributions.

1.5 Is Normal Data, Well, Normal?.

1.6 Measures of Central Tendency.

1.7 Measures of Dispersion.

1.8 Standard Deviation of the Normal Distribution.

Exercises.

**2. Patterns and Tests**.

2.1 Sampling.

2.2 Data.

2.3 Hypothesis Testing.

2.3.1 The Central Limit Theorem.

2.3.2 Score Keeping.

2.3.3 H0: µ = 100.

2.3.4 Type I and Type II Error.

2.4 Correlation.

2.4.1 Covariance and Correlation.

2.4.2 The Regression Line.

2.4.3 Amount of Variance Accounted For.

Exercises.

**3. Phonetics**.

3.1 Comparing Mean Values.

3.1.1 Cherokee Voice Onset Time: µ1971=µ2001.

3.1.2 Samples Have Equal Variance.

3.1.3 If the Samples Do Not Have Equal Variance.

3.1.4 Paired t Test: Are Men Different from Women?.

3.1.5 The Sign Test.

3.2 Predicting the Back of the Tongue from the Front: Multiple Regression.

3.2.1 The Covariance Matrix.

3.2.2 More than One slope: The bi.

3.2.3 Selecting a Model.

3.3 Tongue Shape Factors: Principal Components Analysis.

Exercises.

**4. Psycholinguistics**.

4.1 Analysis of Variance: One Factor, More than Two Levels.

4.2 Two Factors: Interaction.

4.3 Repeated Measures.

4.3.1 An Example of Repeated Measures ANOVA.

4.3.2 Repeated Measures ANOVA with a Between-Subjects Factor.

4.4 The “Language as Fixed Effect” Fallacy.

4.5 Exercises.

**5. Sociolinguistics.**

5.1 When the Data are Counts - Contingency Tables.

5.1.1 Frequency in a Contingency Table.

5.2 Working with Probabilities: The Binomial Distribution.

5.2.1 Bush or Kerry?.

5.3 An Aside about Maximum Likelihood Estimation.

5.4 Logistic Regression.

5.5 An Example from the [∫]treets of Columbus.

5.5.1 On the Relationship between x2 and G2.

5.5.2 More than One Predictor.

5.6 Logistic Regression as Regression: An Ordinal Effect - Age.

5.7 Varbrul/R Comparison.

Exercises.

**6. Historical Linguistics**.

6.1 Cladistics: Where Linguistics and Evolutionary Biology Meet.

6.2 Clustering on the Basis of Shared Vocabulary.

6.3 Cladistic Analysis: Combining Character-Based Subtrees.

6.4 Clustering on the Basis of Spelling Similarity.

6.5 Multidimensional Scaling: A Language Similarity Space.

Exercises.

**7. Syntax**.

7.1 Measuring Sentence Acceptability.

7.2 A Psychogrammatical Law?.

7.3 Linear Mixed Effects in the Syntactic Expression of Agents in English.

7.3.1 Linear Regression: Overall, and Separately by Verbs.

7.3.2 Fitting a Linear Mixed-Effects Model: Fixed and Random Effects.

7.3.3 Fitting Five More Mixed-Effects Models: Finding the Best Model.

7.4 Predicting the Dative Alternation: Logistic Modeling of Syntactic Corpora Data.

7.4.1 Logistic Model of Dative Alternation.

7.4.2 Evaluating the Fit of the Model.

7.4.3 Adding a Random Factor: Mixed Effects Logistic Regression.

Exercises.

Appendix 7A.

References.

Index

## Reviews

*Mark A Pitt, Ohio State University*

"Johnson's book is a catalyst for change in linguistics. Increasingly, the subjective, impressionistic data collection method is being replaced by objective, quantitative measurements. This book serves an important function in this process leading students step-by-step toward using statistical methods to analyze complex data." *Chilin Shih, University of Illinois at Urbana-Champaign*

"This rich and rewarding textbook is a must-read for all students and researchers who wish to follow the new wave of sophisticated empirical models and methods now sweeping the field of linguistics from phonetics to syntax and semantics." *Joan Bresnan, Stanford University*

## Downloads

Data Sets and Scripts This file is stored in a ZIP archive. If your computer is not capable of opening ZIP archives, you can download a trial version of WinZip at WinZip.com. | |

2. Patterns and Tests (.zip) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chi-square distribution (chisq). | Download |

2. Patterns and Tests (.rar) Files included in archive: Script: Figure 2.1 Script: The central limit function from a uniform distribution (central.limit.unif). Script: The central limit function from a skewed distribution (central.limit). Script: The central limit function from a normal distribution (central.limit.norm). Script: Figure 2.5 Script: Figure 2.6 (shade.tails) Data: Male and female F1 frequency data (F1_data.txt). Script: Explore the chi-square distribution (chisq). | Download |

3. Phonetics (.zip) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Made-up regression example (regression.txt) | Download |

3. Phonetics (.rar) Files included in archive: Data: Cherokee voice onset times (cherokeeVOT.txt). Data: The tongue shape data (chaindata.txt). Script: Commands to calculate and plot the first principal component of tongue shape (principal_components). Script: Explore the F distribution (shade.tails.df) Data: Made-up regression example (regression.txt) | Download |

4. Psycholinguistics (.zip) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanish-speaking and English-speaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. Data: Boomershine's dialect identification data for exercise 5. | Download |

4. Psycholinguistics (.rar) Files included in archive: Data: One observation of phonological priming per listener from Pitt & Shoaf's (2002) Data: One observation per listener from two groups (overlap versus no overlap) from Pitt & Shoaf's study. Data: Hypothetical data to illustrate repeated measures of analysis. Data: The full Pitt & Shoaf data set. Data: Reaction time data on perception of flap, /d/, and eth by Spanish-speaking and English-speaking listeners. Data: Luka & Barsalou (2005) "by subjects" data. Data: Luka & Barsalou (2005) "by items" data. | Download |

5. Sociolinguistics (.zip) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. | Download |

5. Sociolinguistics (.zip) Files included in archive: Data: Robin Dodsworth's preliminary data on /l/ vocalization in Worthington, Ohio. Data: Data from David Durian's rapid anonymous survey on /str/ in Columbus, Ohio. Data: Hope Dawson's Sanskrit data. | Download |

5. Sociolinguistics (.rar) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 Indo-European languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IE-lists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. | Download |

6. Historical Linguistics (.zip) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 Indo-European languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IE-lists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. | Download |

6. Historical Linguistics (.rar) File included in archive: Script: A script that draws Figure 6.1 Data: Dyen et al.'s (1984) distance matrix for 84 Indo-European languages based on the percentage of cognate words between languages. Data: A (rather arbitrary) subset of the Dyen et al. (1984) data coded as input to the Phylip program "pars". Data: IE-lists.txt: A version of the Dyen et al. word lists that is readable in the scripts below. Script: make_dist: This perl script tabulates all of the letters used in the Dyen et al. word lists." Script: get_IE_distance: This perl script implements the "spelling distance" metric that was used to calculate distances between words in the Dyen et al. list. Script: make_matrix: Another perl script. This one takes the output of get_IE_distance and writes it back out as a matrix that R can easily read. Data: A distance matrix produced from the spellings of words in the Dyen et al. (1984) dataset. Data: Distance matrix for eight Bantu languages from the Tanzanian Language Survey. Data: A phonetic distance matrix of Bantu languages from Ladefoged, Glick & Criper (1971). Data: The TLS Bantu data arranged as input for phylogenetic parsimony analysis using the Phylip program pars. | Download |

7. Syntax (.zip) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL-2005. Script: Cross-validation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. | Download |

7. Syntax (.rar) Files included in archive: Data: Results from a magnitude estimation study. Data: Verb argument data from CoNLL-2005. Script: Cross-validation of linear mixed effects models. Data: Bresnan et al.'s dative alternation data. | Download |

## Features

- Introduces the general strategies and methods of quantitative analysis for use in linguistic research
- Provides balanced treatment of the practical aspects of handling quantitative linguistic data
- Includes sample datasets contributed by researchers working in a variety of sub-disciplines of linguistics
- Uses R, the statistical software package most commonly used by linguists, to discover patterns in quantitative data and to test linguistic hypotheses
- Features student-friendly end-of-chapter assignments