DescriptionBased around eleven international real life case studies and including contributions from leading experts in the field this groundbreaking book explores the need for the grid-enabling of data mining applications and provides a comprehensive study of the technology, techniques and management skills necessary to create them. This book provides a simultaneous design blueprint, user guide, and research agenda for current and future developments and will appeal to a broad audience; from developers and users of data mining and grid technology, to advanced undergraduate and postgraduate students interested in this field.
List of contributors.
1. Data mining meets grid computing: time to dance (Alberto Sánchez, Jesús Montes, Werner Dubitzky, Julio J. Valdés, María S. Pérez and Pedro de Miguel).
1.2 Data mining.
1.3 Grid computing.
1.4 Data mining grid - mining grid data.
1.6 Summary of chapters in this volume.
2. Data analysis services in the Knowledge Grid (Eugenio Cesario, Antonio Congiusta, Domenico Talia and Paolo Trunfio).
2.3 Knowledge Grid services.
2.4 Data analysis services.
2.5 Design of Knowledge Grid applications.
3. GridMiner: an advanced support for e-science analytics (Peter Brezany, Ivan Janciak and A. Min Tjoa).
3.2 Rationale behind the design and development of GridMiner.
3.3 Use case.
3.4 Knowledge discovery process and its support by GridMiner.
3.5 Graphical user interface.
3.6 Future developments.
4. ADaM services: scientific data mining in the service-oriented architecture paradigm (Rahul Ramachandran, Sara Graves, John Rushing, Ken Keiser, Manil Maskey, Hong Lin and Helen Conover).
4.2 ADaM system overview.
4.3 ADaM toolkit overview.
4.4 Mining in a service-oriented architecture.
4.5 Mining Web services.
4.6 Mining grid services.
5. Mining for misconfigured machines in grid systems (Noam Palatin, Arie Leizarowitz, Assaf Schuster and Ran Wolff).
5.2 Preliminaries and related work.
5.3 Acquiring, pre-processing and storing data.
5.4 Data analysis.
5.5 The GMS.
5.7 Conclusions and future work.
6. FAEHIM: Federated Analysis Environment for Heterogeneous Intelligent Mining (Ali Shaikh Ali and Omer F. Rana).
6.2 Requirements of a distributed knowledge discovery framework.
6.3 Workflow-based knowledge discovery.
6.4 Data mining toolkit.
6.5 Data mining service framework.
6.6 Distributed data mining services.
6.7 Data manipulation tools.
6.9 Empirical experiments.
7. Scalable and privacy preserving distributed data analysis over a service-oriented platform (William K. Cheung).
7.2 A service-oriented solution.
7.4 Model-based scalable, privacy preserving, distributed data analysis.
7.5 Modelling distributed data mining and workflow processes.
7.6 Lessons learned.
7.7 Further research directions.
8. Building and using analytical workflows in Discovery Net (Moustafa Ghanem, Vasa Curcin, Patrick Wendel and Yike Guo).
8.2 Discovery Net system.
8.3 Architecture for Discovery Net.
8.4 Data management.
8.5 Example of a workflow study.
8.6 Future directions.
9. Building workflows that traverse the bioinformatics data landscape (Robert Stevens, Paul Fisher, Jun Zhao, Carole Goble and Andy Brass).
9.2 The bioinformatics data landscape.
9.3 The bioinformatics experiment landscape.
9.4 Taverna for bioinformatics experiments.
9.5 Building workflows in Taverna.
9.6 Workflow case study.
10. Specification of distributed data mining workflows with DataMiningGrid (Dennis Wegener and Michael May).
10.2 DataMiningGrid environment.
10.3 Operations for workflow construction.
10.5 Case studies.
10.6 Discussion and related work.
10.7 Open issues.
11. Anteater: service-oriented data mining (Renato A. Ferreira, Dorgival O. Guedes and Wagner Meira).
11.2 The architecture.
11.3 Runtime framework.
11.4 Parallel algorithms for data mining.
11.5 Visual metaphors.
11.6 Case studies.
11.7 Future developments.
11.8 Conclusions and future work.
12. DMGA: a generic brokering-based data mining grid architecture (Alberto Sánchez, María S. Pérez, Pierre Gueant, José M. Peña and Pilar Herrero).
12.2 DMGA overview.
12.3 Horizontal composition.
12.4 Vertical composition.
12.5 The need for brokering.
12.6 Brokering-based data mining grid architecture.
12.7 Use cases: Apriori, ID3 and J4.8 algorithms.
12.8 Related work.
13. Grid-based data mining with the Environmental Scenario Search Engine (ESSE) (Mikhail Zhizhin, Alexey Poyda, Dmitry Mishin, Dmitry Medvedev, Eric Kihn and Vassily Lyutsarev).
13.1 Environmental data source: NCEP/NCAR reanalysis data set.
13.2 Fuzzy search engine.
13.3 Software architecture.
14. Data pre-processing using OGSA-DAI (Martin Swain and Neil P. Chue Hong).
14.2 Data pre-processing for grid-enabled data mining.
14.3 Using OGSA-DAI to support data mining applications.
14.4 Data pre-processing scenarios in data mining applications.
14.5 State of the art solutions for grid data management.
14.7 Open issues.