Skip to main content

Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

Gokhan Tur, Renato De Mori

ISBN: 978-1-119-99269-1 March 2011 480 Pages


Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors.

Both human/machine and human/human communications can benefit from the application of SLU, using differing tasks and approaches to better understand and utilize such communications. This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields. Key features include:

  • Presents a fully integrated view of the two distinct disciplines of speech processing and language processing for SLU tasks.
  • Defines what is possible today for SLU as an enabling technology for enterprise (e.g., customer care centers or company meetings), and consumer (e.g., entertainment, mobile, car, robot, or smart environments) applications and outlines the key research areas.
  • Provides a unique source of distilled information on methods for computer modeling of semantic information in human/machine and human/human conversations.

This book can be successfully used for graduate courses in electronics engineering, computer science or computational linguistics. Moreover, technologists interested in processing spoken communications will find it a useful source of collated information of the topic drawn from the two distinct disciplines of speech processing and language processing under the new area of SLU.

List of Contributors.



1 Introduction (Gokhan Tur and Renato De Mori).

1.1 A Brief History of Spoken Language Understanding.

1.2 Organization of the Book.


2 History of Knowledge and Processes for Spoken Language Understanding (Renato De Mori).

2.1 Introduction.

2.2 Meaning Representation and Sentence Interpretation.

2.3 Knowledge Fragments and Semantic Composition.

2.4 Probabilistic Interpretation in SLU Systems.

2.5 Interpretation with Partial Syntactic Analysis.

2.6 Classification Models for Interpretation.

2.7 Advanced Methods and Resources for Semantic Modeling and Interpretation.

2.8 Recent Systems.

2.9 Conclusions.


3 Semantic Frame-based Spoken Language Understanding (Ye-Yi Wang, Li Deng and Alex Acero).

3.1 Background.

3.2 Knowledge-based Solutions.

3.3 Data-driven Approaches.

3.4 Summary.


4 Intent Determination and Spoken Utterance Classification (Gokhan Tur and Li Deng).

4.1 Background.

4.2 Task Description.

4.3 Technical Challenges.

4.4 Benchmark Data Sets.

4.5 Evaluation Metrics.

4.6 Technical Approaches.

4.7 Discussion and Conclusions.


5 Voice Search (Ye-Yi Wang, Dong Yu, Yun-Cheng Ju and Alex Acero).

5.1 Background.

5.2 Technology Review.

5.3 Summary.


6 Spoken Question Answering (Sophie Rosset, Olivier Galibert and Lori Lamel).

6.1 Introduction.

6.2 Specific Aspects of Handling Speech in QA Systems.

6.3 QA Evaluation Campaigns.

6.4 Question-answering Systems.

6.5 Projects Integrating Spoken Requests and Question Answering.

6.6 Conclusions.


7 SLU in Commercial and Research Spoken Dialogue Systems (David Suendermann and Roberto Pieraccini).

7.1 Why Spoken Dialogue Systems (Do Not) Have to Understand.

7.2 Approaches to SLU for Dialogue Systems.

7.3 From Call Flow to POMDP: How Dialogue Management Integrates with SLU.

7.4 Benchmark Projects and Data Sets.

7.5 Time is Money: The Relationship between SLU and Overall Dialogue System Performance.

7.6 Conclusion.


8 Active Learning (Dilek Hakkani-Tür and Giuseppe Riccardi).

8.1 Introduction.

8.2 Motivation.

8.3 Learning Architectures.

8.4 Active Learning Methods.

8.5 Combining Active Learning with Semi-supervised Learning.

8.6 Applications.

8.7 Evaluation of Active Learning Methods.

8.8 Discussion and Conclusions.



9 Human/Human Conversation Understanding (Gokhan Tur and Dilek Hakkani-Tür).

9.1 Background.

9.2 Human/Human Conversation Understanding Tasks.

9.3 Dialogue Act Segmentation and Tagging.

9.4 Action Item and Decision Detection.

9.5 Addressee Detection and Co-reference Resolution.

9.6 Hot Spot Detection.

9.7 Subjectivity, Sentiment, and Opinion Detection.

9.8 Speaker Role Detection.

9.9 Modeling Dominance.

9.10 Argument Diagramming.

9.11 Discussion and Conclusions.


10 Named Entity Recognition (Frédéric Béchet).

10.1 Task Description.

10.2 Challenges Using Speech Input.

10.3 Benchmark Data Sets, Applications.

10.4 Evaluation Metrics.

10.5 Main Approaches for Extracting NEs from Text.

10.6 Comparative Methods for NER from Speech.

10.7 New Trends in NER from Speech.

10.8 Conclusions.


11 Topic Segmentation (Matthew Purver).

11.1 Task Description.

11.2 Basic Approaches, and the Challenge of Speech.

11.3 Applications and Benchmark Datasets.

11.4 Evaluation Metrics.

11.5 Technical Approaches.

11.6 New Trends and Future Directions.


12 Topic Identification (Timothy J. Hazen).

12.1 Task Description.

12.2 Challenges Using Speech Input.

12.3 Applications and Benchmark Tasks.

12.4 Evaluation Metrics.

12.5 Technical Approaches.

12.6 New Trends and Future Directions.


13 Speech Summarization (Yang Liu and Dilek Hakkani-Tür).

13.1 Task Description.

13.2 Challenges when Using Speech Input.

13.3 Data Sets.

13.4 Evaluation Metrics.

13.5 General Approaches.

13.6 More Discussions on Speech versus Text Summarization.

13.7 Conclusions.


14 Speech Analytics (I. Dan Melamed and Mazin Gilbert)

14.1 Introduction.

14.2 System Architecture.

14.3 Speech Transcription.

14.4 Text Feature Extraction.

14.5 Acoustic Feature Extraction.

14.6 Relational Feature Extraction.

14.7 DBMS.

14.8 Media Server and Player.

14.9 Trend Analysis.

14.10 Alerting System.

14.11 Conclusion.


15 Speech Retrieval (Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran and Murat Saraçlar).

15.1 Task Description.

15.2 Applications.

15.3 Challenges Using Speech Input.

15.4 Evaluation Metrics.

15.5 Benchmark Data Sets.

15.6 Approaches.

15.7 New Trends.

15.8 Discussion and Conclusions.



“The book also contains references to existing datasets that can be used by researchers interested in the field; these, together with the presented baseline, equip one with the necessary tools to step into this very daring and fascinating domain.”  (Zentralblatt MATH, 2012)