Wiley.com
Print this page Share

Google BigQuery Analytics

ISBN: 978-1-118-82482-5
528 pages
June 2014
Google BigQuery Analytics (1118824822) cover image

Description

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets

Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results.

  • Features a companion website that includes all code and data sets from the book
  • Uses real-world examples to explain everything analysts need to know to effectively use BigQuery
  • Includes web application examples coded in Python
See More

Table of Contents

Introduction xiii

Part I BigQuery Fundamentals

Chapter 1 The Story of Big Data at Google 3

Big Data Stack 1.0 4

Big Data Stack 2.0 (and Beyond) 5

Open Source Stack 7

Google Cloud Platform 8

Cloud Processing 9

Cloud Storage 9

Cloud Analytics 9

Problem Statement 10

What Is Big Data? 10

Why Big Data? 10

Why Do You Need New Ways to Process Big Data? 11

How Can You Read a Terabyte in a Second? 12

What about MapReduce? 12

How Can You Ask Questions of Your Big Data and Quickly

Get Answers? 13

Summary 13

Chapter 2 BigQuery Fundamentals 15

What Is BigQuery? 15

SQL Queries over Big Data 16

Cloud Storage System 21

Distributed Cloud Computing 23

Analytics as a Service (AaaS?) 26

What BigQuery Isn’t 29

BigQuery Technology Stack 31

Google Cloud Platform 34

BigQuery Service History 37

BigQuery Sensors Application 39

Sensor Client Android App 40

BigQuery Sensors AppEngine App 41

Running Ad-Hoc Queries 42

Summary 43

Chapter 3 Getting Started with BigQuery 45

Creating a Project 45

Google APIs Console 46

Free Tier Limitations and Billing 49

Running Your First Query 51

Loading Data 54

Using the Command-Line Client 57

Install and Setup 58

Using the Client 60

Service Account Access 62

Setting Up Google Cloud Storage 64

Development Environment 66

Python Libraries 66

Java Libraries 67

Additional Tools 67

Summary 68

Chapter 4 Understanding the BigQuery Object Model 69

Projects 70

Project Names 70

Project Billing 72

Project Access Control 72

Projects and AppEngine 73

BigQuery Data 73

Naming in BigQuery 73

Schemas 75

Tables 76

Datasets 77

Jobs 78

Job Components 78

BigQuery Billing and Quotas 85

Storage Costs 85

Processing Costs 86

Query RPCs 87

TableData.insertAll() RPCs 87

Data Model for End-to-End Application 87

Project 87

Datasets 88

Tables 89

Summary 91

Part II Basic BigQuery 93

Chapter 5 Talking to the BigQuery API 95

Introduction to Google APIs 95

Authenticating API Access 96

RESTful Web Services for the SOAP-Less Masses 105

Discovering Google APIs 112

Common Operations 113

BigQuery REST Collections 122

Projects 123

Datasets 126

Tables 132

TableData 139

Jobs 144

BigQuery API Tour 151

Error Handling in BigQuery 154

Summary 158

Chapter 6 Loading Data 159

Bulk Loads 160

Moving Bytes 163

Destination Table 170

Data Formats 174

Errors 182

Limits and Quotas 186

Streaming Inserts 188

Summary 193

Chapter 7 Running Queries 195

BigQuery Query API 196

Query API Methods 196

Query API Features 208

Query Billing and Quotas 213

BigQuery Query Language 221

BigQuery SQL in Five Queries 222

Differences from Standard SQL 232

Summary 236

Chapter 8 Putting It Together 237

A Quick Tour 238

Mobile Client 242

Monitoring Service 243

Log Collection Service 252

Log Trampoline 253

Dashboard 260

Data Caching 261

Data Transformation 265

Web Client 269

Summary 272

Part III Advanced BigQuery 273

Chapter 9 Understanding Query Execution 275

Background 276

Storage Architecture 277

Colossus File System (CFS) 277

ColumnIO 278

Durability and Availability 281

Query Processing 282

Dremel Serving Trees 283

Architecture Comparisons 295

Relational Databases 295

MapReduce 298

Summary 303

Chapter 10 Advanced Queries 305

Advanced SQL 306

Subqueries 307

Combining Tables: Implicit UNION and JOIN 310

Analytic and Windowing Functions 315

BigQuery SQL Extensions 318

The EACH Keyword 318

Data Sampling 320

Repeated Fields 324

Query Errors 334

Result Too Large 334

Resources Exceeded 337

Recipes 338

Pivot 339

Cohort Analysis 340

Parallel Lists 343

Exact Count Distinct 344

Trailing Averages 346

Finding Concurrency 347

Summary 348

Chapter 11 Managing Data Stored in BigQuery 349

Query Caching 349

Result Caching 350

Table Snapshots 354

AppEngine Datastore Integration 358

Simple Kind 359

Mixing Types 366

Final Thoughts 368

Metatables and Table Sharding 368

Time Travel 368

Selecting Tables 374

Summary 378

Part IV BigQuery Applications 381

Chapter 12 External Data Processing 383

Getting Data Out of BigQuery 384

Extract Jobs 384

TableData.list() 396

AppEngine MapReduce 405

Sequential Solution 407

Basic AppEngine MapReduce 409

BigQuery Integration 412

Using BigQuery with Hadoop 418

Querying BigQuery from a Spreadsheet 419

BigQuery Queries in Google Spreadsheets (Apps Script) 419

BigQuery Queries in Microsoft Excel 429

Summary 433

Chapter 13 Using BigQuery from Third-Party Tools 435

BigQuery Adapters 436

Simba ODBC Connector 436

JDBC Connection Options 444

Client-Side Encryption with Encrypted BigQuery 445

Scientifi c Data Processing Tools in BigQuery 452

BigQuery from R 452

Python Pandas and BigQuery 461

Visualizing Data in BigQuery 467

Visualizing Your BigQuery Data with Tableau 467

Visualizing Your BigQuery Data with BIME 473

Other Data Visualization Options 477

Summary 478

Chapter 14 Querying Google Data Sources 479

Google Analytics 480

Setting Up BigQuery Access 480

Table Schema 481

Querying the Tables 483

Google AdSense 485

Table Structure 486

Leveraging BigQuery 490

Google Cloud Storage 491

Summary 494

Index 495

See More

Author Information

The authors are founding members of the BigQuery team and have helped build and run the service. Jordan Tigani is an active participant in the BigQuery StackOverflow virtual community. Siddartha Naidu has extensive experience helping customers integrate with BigQuery.

See More

Downloads

Download TitleSizeDownload
Code Download
All source code is available at https://code.google.com/p/bigquery-e2e, including an issue tracker for bugs. All code is open source. http://bigquery-sensors.appspot.com contains the dashboard and android download that is described in Chapter 8.
   
See More

Related Titles

Back to Top