Skip to main content

Web Content Mining With Java: Techniques for Exploiting the World Wide Web

Web Content Mining With Java: Techniques for Exploiting the World Wide Web

Tony Loton

ISBN: 978-0-470-84311-6

Apr 2002

328 pages

Select type: Paperback

Out of stock



Unlock the potential of the world's biggest database.

This practical book shows you how to build portals, construct search engines and other knowledge-based applications to mine the information you need from the Web.

* Written by a developer for developers
* A practical, hands-on approach
* Illustrates how Java associated tools (XML, HTML) can be combined with database technology to display and manipulate Web-derived information more effectively.
* Demonstrates how to build a structure browser, portal, meta-search engine and how to make 'Talking Pages'

Preface xi

About the Author xix

Acknowledgements xxi

1 Surveying the Scene 1

2 Language of the Web 13

3 HTML and XML Parsing 33

4 Data Filters and Structured Queries 67

5 Building a Portal with Java 109

6 Building a Search Engine with Java 131

7 Mail Mining With Java 153

8 Introduction to Text Mining 177

9 Introduction to Data Mining 207

10 Loose Ends and Looking Ahead 231

Appendix A: Software Installation and Configuration 243

Appendix B: Javadoc Extracts 251

Appendix C: Earlier Versions of JAXP 271

Appendix D: License and Copyright Statements 275

Appendix E: Census 1891 Data XML 279

Appendix F: Share Price Cluster Data 287

Appendix G: Glossary of Acronyms 291

References 295

Further Reading 297

Index 299

"When I got this book, I couldn't put it down. A lot of computer books sit on the shelf or send me to sleep, but not this one. Not only is it both topical and useful, but it hits a just-about-ideal balance between code and food for thought. The author has a real knack for useful solutions to complex problems." (www. Java Ranch 17 May 2002)