This book addresses the field of geographic information extraction and retrieval from textual documents. Geographic information retrieval is a rapidly emerging subject, a trend fostered by the growing power of the Internet and the emerging possibilities of data dissemination.
After positioning his work in this field in Chapter 1, the author makes proposals in the following two chapters. Chapter 2 focuses on spatial and temporal information indexing and retrieval in corpora of textual documents. Propositions for both spatial and temporal information retrieval (IR) are made. Chapter 3 tackles the use of generalized spatial and temporal indexes, which are produced from there in the framework of multi-criteria IR. Geographic IR (GIR) is discussed at length, since this IR combines the criteria of spatial, temporal and thematic research.
The author provides a rich bibliographical study of the current approaches focused on the modeling and retrieval of spatial and temporal information in textual documents, and similarity measures developed thus far in the literature.
The book concludes with a broad perspective of the remaining scientific challenges. Several areas of research are discussed, such as integration of a domain-based ontology, modeling of spatial footprints from the interpretation of spatial relation, and parsing of relations between features deemed relevant within a document resulting from a GIR process.
Foreword, Christophe Claramunt.
1. Access by Geographic Content to Textual Corpora: What Orientations ?
2. Spatial and Temporal Information Retrieval in Textual Corpora.
3. Multicriteria Information Retrieval in Textual Corpora.
4. General Conclusion.
About the Authors
Christian Sallaberry is currently Assistant Professor at the Law, Economics and Management Faculty in Pau, France. His current research interests are in the fields of geographical information retrieval (GIR) in textual corpora: spatial, temporal and thematic information recognition, analyzing, indexing and retrieval. He is interested in spatial, temporal and thematic criteria combinations within a GIR process.
CHAPTER 1. ACCESS BY GEOGRAPHIC CONTENT TO TEXTUAL CORPORA: WHAT ORIENTATIONS? 1
1.1. Introduction 1
1.2. Access by geographic content to textual corpora 1
1.2.1. Document retrieval and textual corpora 2
1.2.2. Textual corpora with “territorial” denotations 2
1.2.3. Access to textual content 6
1.3. Reinforcement of GIR by contributions from NLP, reasoning and multicriteria IR 7
1.4. Toward the construction of a multicriteria IR engine 9
1.4.1. Challenges, hypotheses and research objectives 10
1.4.2. Approach 11
1.4.3. Applications 13
CHAPTER 2. SPATIAL AND TEMPORAL INFORMATION RETRIEVAL IN TEXTUAL CORPORA 17
2.1. Introduction 17
2.2. Review of challenges, hypotheses and research objectives 18
2.3. Spatial and temporal information in textual documents: literature review 19
2.3.1. Geographic information in text and IR 19
2.3.2. Named entities 19
2.3.3. Modeling languages 21
2.3.4. Reasoning 24
2.3.5. Linguistic processing 26
2.3.6. GIR: systems and similarity measure models 27
2.3.7. Evaluation campaigns, corpora and resources 31
2.3.8. Summary 34
2.4. Proposition for spatial and temporal information indexing and retrieval in textual corpora 35
2.4.1. Reminder and focus on the notion of space and time in “heritage” corpora 35
2.4.2. Core spatial model and core temporal model 36
2.4.3. Spatial and temporal relations 37
2.4.4. Spatial and temporal indexing process flows: PIV prototype 39
2.4.5. Spatial and temporal IR: PIV prototype 42
2.4.6. Evaluation and discussion 45
2.5. Summary 47
2.5.1. Contributions 47
2.5.2. Perspectives 49
CHAPTER 3. MULTICRITERIA INFORMATION RETRIEVAL IN TEXTUAL CORPORA 53
3.1. Introduction 53
3.2. Review of challenges, hypotheses and research objectives 54
3.3. Standardization and combination of criteria: literature review 56
3.3.1. Criterion standardization 56
3.3.2. Combination of criteria 58
3.3.3. Summary and positioning of a partially compensatory GIR 64
3.4. Proposition for indexing by tiling and multicriteria IR in textual corpora 65
3.4.1. Standardization by tiling 65
3.4.2. Spatial and temporal IR applied to tiling: PIV2 70
3.4.3. Multicriteria IR applied to tiling: PIV3 72
3.5. Evaluation and discussion 77
3.5.1. Evaluation framework of geographic IRSs: proposal for a test collection and an experimental protocol 78
3.5.2. Evaluation of the spatial and temporal IR applied to tiling 79
3.5.3. Evaluation of the multicriteria IR applied to tiling 81
3.6. Summary 84
3.6.1. Contributions 84
3.6.2. Perspectives 86
CHAPTER 4. GENERAL CONCLUSION 87
4.1. Summary 87
4.1.1. Contributions to the access by geographic content to textual corpora 87
4.1.2. Spatial and temporal IR in texts 88
4.1.3. Multicriteria IR in texts 89
4.2. Perspectives 90
4.2.1. Intradimensional axis 92
4.2.2. Interdimensional axis 97
4.2.3. Expansion of the vocabulary for a qualitative representation of the geographic dimensions 103