Contenuti dettagliati del Corso
Module 1 - Data Engineering Tasks and Components
Topics:
- The role of a data engineer
- Data sources versus data sinks
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Sharing datasets using Analytics Hub
Objectives:
- Explain the role of a data engineer.
- Understand the differences between a data source and a data sink.
- Explain the different types of data formats.
- Explain the storage solution options on Google Cloud.
- Learn about the metadata management options on Google Cloud.
- Understand how to share datasets with ease using Analytics Hub.
- Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.
Activities:
- Lab: Loading Data into BigQuery
- Quiz
Module 2 - Data Replication and Migration
Topics:
- Replication and migration architecture
- The gcloud command-line tool
- Moving datasets
- Datastream
Objectives:
- Explain the baseline Google Cloud data replication and migration architecture.
- Understand the options and use cases for the gcloud command-line tool.
- Explain the functionality and use cases for Storage Transfer Service.
- Explain the functionality and use cases for Transfer Appliance.
- Understand the features and deployment of Datastream.
Activities:
- Lab: Datastream: PostgreSQL Replication to BigQuery (optional for ILT)
- Quiz
Module 3 - The Extract and Load Data Pipeline Pattern
Topics:
- Extract and load architecture
- The bq command-line tool
- BigQuery Data Transfer Service
- BigLake
Objectives:
- Explain the baseline extract and load architecture diagram.
- Understand the options of the bq command-line tool.
- Explain the functionality and use cases for BigQuery Data Transfer Service.
- Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Activities:
- Lab: BigLake: Qwik Start
- Quiz
Module 4 - The Extract, Load, and Transform Data Pipeline Pattern
Topics:
- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
Objectives:
- Explain the baseline extract, load, and transform architecture diagram.
- Understand a common ELT pipeline on Google Cloud.
- Learn about BigQuery’s SQL scripting and scheduling capabilities.
- Explain the functionality and use cases for Dataform.
Activities:
- Lab: Create and Execute a SQL Workflow in Dataform
- Quiz
Module 5 - The Extract, Transform, and Load Data Pipeline Pattern
Topics:
- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
Objectives:
- Explain the baseline extract, transform, and load architecture diagram.
- Learn about the GUI tools on Google Cloud used for ETL data pipelines.
- Explain batch data processing using Dataproc.
- Learn how to use Dataproc Serverless for Spark for ETL.
- Explain streaming data processing options.
- Explain the role Bigtable plays in data pipelines.
Activities:
- Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional for ILT)
- Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow
- Quiz
Module 6 - Automation Techniques
Topics:
- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run Functions
- Eventarc
Objectives:
- Explain the automation patterns and options available for pipelines.
- Learn about Cloud Scheduler and Workflows.
- Learn about Cloud Composer.
- Learn about Cloud Run functions.
- Explain the functionality and automation use cases for Eventarc.
Activities:
- Lab: Use Cloud Run Functions to Load BigQuery (optional for ILT)
- Quiz