Introduction to Data Engineering on Google Cloud (IDEG) – Contenuti

Contenuti dettagliati del Corso

Module 1 - Data Engineering Tasks and Components

Topics:

  • The role of a data engineer
  • Data sources versus data sinks
  • Data formats
  • Storage solution options on Google Cloud
  • Metadata management options on Google Cloud
  • Sharing datasets using Analytics Hub

Objectives:

  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.

Activities:

  • Lab: Loading Data into BigQuery
  • Quiz

Module 2 - Data Replication and Migration

Topics:

  • Replication and migration architecture
  • The gcloud command-line tool
  • Moving datasets
  • Datastream

Objectives:

  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command-line tool.
  • Explain the functionality and use cases for Storage Transfer Service.
  • Explain the functionality and use cases for Transfer Appliance.
  • Understand the features and deployment of Datastream.

Activities:

  • Lab: Datastream: PostgreSQL Replication to BigQuery (optional for ILT)
  • Quiz

Module 3 - The Extract and Load Data Pipeline Pattern

Topics:

  • Extract and load architecture
  • The bq command-line tool
  • BigQuery Data Transfer Service
  • BigLake

Objectives:

  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command-line tool.
  • Explain the functionality and use cases for BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.

Activities:

  • Lab: BigLake: Qwik Start
  • Quiz

Module 4 - The Extract, Load, and Transform Data Pipeline Pattern

Topics:

  • Extract, load, and transform (ELT) architecture
  • SQL scripting and scheduling with BigQuery
  • Dataform

Objectives:

  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery’s SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.

Activities:

  • Lab: Create and Execute a SQL Workflow in Dataform
  • Quiz

Module 5 - The Extract, Transform, and Load Data Pipeline Pattern

Topics:

  • Extract, transform, and load (ETL) architecture
  • Google Cloud GUI tools for ETL data pipelines
  • Batch data processing using Dataproc
  • Streaming data processing options
  • Bigtable and data pipelines

Objectives:

  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn how to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.

Activities:

  • Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional for ILT)
  • Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow
  • Quiz

Module 6 - Automation Techniques

Topics:

  • Automation patterns and options for pipelines
  • Cloud Scheduler and Workflows
  • Cloud Composer
  • Cloud Run Functions
  • Eventarc

Objectives:

  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and Workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.

Activities:

  • Lab: Use Cloud Run Functions to Load BigQuery (optional for ILT)
  • Quiz