GB
/
GBP
/
EN_GB

Shaping the future of IT skills

Maximising IT performance through learning

Serverless Data Processing with Dataflow

WGAC-GGL-SDPF

Google

Description

Show Tabs
Introduction
Module 1: Introduction
  • Introduce the course objectives.
  • Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
Module 2: Beam Portability
  • Summarize the benefits of the Beam Portability Framework.
  • Customize the data processing environment of your pipeline using custom containers.
  • Review use cases for cross-language transformations.
  • Enable the Portability framework for your Dataflow pipelines.
Module 3: Separating Compute and Storage with Dataflow
  • Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
  • Enable Flexible Resource Scheduling for more cost-efficient performance.
Module 4: IAM, Quotas, and Permissions
  • Select the right combination of IAM permissions for your Dataflow job.
  • Determine your capacity needs by inspecting the relevant quotas for your Dataflow jobs.
Module 5: Security
  • Select your zonal data processing strategy using Dataflow, depending on your data locality needs.
  • Implement best practices for a secure data processing environment.
Module 6: Beam Concepts Review
  • Review main Apache Beam concepts (Pipeline, PCollections, PTransforms, Runner, reading/writing, Utility PTransforms, side inputs), bundles and DoFn Lifecycle.
Module 7: Windows, Watermarks, Triggers
  • Implement logic to handle your late data.
  • Review different types of triggers.
  • Review core streaming concepts (unbounded PCollections, windows).
Module 8: Sources and Sinks
  • Write the I/O of your choice for your Dataflow pipeline.
  • Tune your source/sink transformation for maximum performance.
  • Create custom sources and sinks using SDF.
Module 9: Schemas
  • Introduce schemas, which give developers a way to express structured data in their Beam pipelines.
  • Use schemas to simplify your Beam code and improve the performance of your pipeline.
Module 10: State and Timers
  • Identify use cases for state and timer API implementations.
  • Select the right type of state and timers for your pipeline.
Module 11: Best Practices
  • Implement best practices for Dataflow pipelines.
Module 12: Dataflow SQL and DataFrames
  • Develop a Beam pipeline using SQL and DataFrames.
Module 13: Beam Notebooks
  • Prototype your pipeline in Python using Beam notebooks.
  • Use Beam magics to control the behavior of source recording in your notebook.
  • Launch a job to Dataflow from a notebook.
Module 14: Monitoring
  • Navigate the Dataflow Job Details UI.
  • Interpret Job Metrics charts to diagnose pipeline regressions.
  • Set alerts on Dataflow jobs using Cloud Monitoring.
Module 15: Logging and Error Reporting
  • Use the Dataflow logs and diagnostics widgets to troubleshoot pipeline issues.
Module 16: Troubleshooting and Debug
  • Use a structured approach to debug your Dataflow pipelines.
  • Examine common causes for pipeline failures.
Module 17: Performance
  • Understand performance considerations for pipelines.
  • Consider how the shape of your data can affect pipeline performance.
Module 18: Testing and CI/CD
  • Testing approaches for your Dataflow pipeline.
  • Review frameworks and features available to streamline your CI/CD workflow for Dataflow pipelines.
Module 19: Reliability
  • Implement reliability best practices for your Dataflow pipelines.
Module 20: Flex Templates
  • Using flex templates to standardize and reuse Dataflow pipeline code.
Module 21: Summary
  • Summary.
Prerequisites & Audience

To get the most out of this course, participants should have completed the following courses:

  • Building Batch Data Pipelines
  • Building Resilient Streaming Analytics Systems
Course Benefits
  • Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
  • Summarize the benefits of the Beam Portability Framework and enable it for your Dataflow pipelines.
  • Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
  • Enable Flexible Resource Scheduling for more cost-efficient performance.
  • Select the right combination of IAM permissions for your Dataflow job.
  • Implement best practices for a secure data processing environment.
  • Select and tune the I/O of your choice for your Dataflow pipeline.
  • Use schemas to simplify your Beam code and improve the performance of your pipeline.
  • Develop a Beam pipeline using SQL and DataFrames.
  • Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.

Google courses


Cloud Digital Leader
CODE: WGAC-GGL-CDL
Data Integration with Cloud Data Fusion
CODE: WGAC-GGL-DICDF
Preparing for Your Professional Cloud Network Engineer Journey
CODE: WGAC-GGL-PYPCNEJ
Deploying and Managing Windows Workloads on Google Cloud
CODE: WGAC-GGL-DMWWGC
Installing and Managing Google Cloud's Apigee API Platform for Private Cloud
CODE: WGAC-GGL-IMAPIPC
Customer Experiences with Contact Center AI - Dialogflow CX
CODE: WGAC-GGL-CCAIDCX
Customer Experiences with Contact Center AI - Dialogflow ES
CODE: WGAC-GGL-CCAIDES
Application Development with Cloud Run
CODE: WGAC-GGL-ADCR
Serverless Data Processing with Dataflow
CODE: WGAC-GGL-SDPF
Developing Data Models with LookML
CODE: WGAC-GGL-DDMLML
Analyzing and Visualizing Data with Looker
CODE: WGAC-GGL-AVDL
Machine Learning on Google Cloud
CODE: WGAC-GGL-MLGC
Developing APIs with Google Cloud's Apigee API platform
CODE: WGAC-GGL-T-APIENG-B
Managing Google Cloud's Apigee API Platform for Hybrid Cloud
CODE: WGAC-GGL-T-APIHYB-B
Logging, Monitoring, and Observability in Google Cloud
CODE: WGAC-GGL-LMOGC
Security in Google Cloud Platform
CODE: WGAC-GGL-SGCP-3D
Google Cloud Fundamentals for Azure Professionals
CODE: WGAC-GGL-GCPAZURE
Preparing for the Associate Cloud Engineer Examination
CODE: WGAC-GGL-PPACE
Architecting Hybrid Cloud Infrastructure with Anthos
CODE: WGAC-GGL-T-AHYBRID-I
Architecting with Google Kubernetes Engine
CODE: WGAC-GGL-AGKE
Architecting with Google Compute Engine
CODE: WGAC-GGL-AGCE
Preparing for the Professional Data Engineer Examination
CODE: WGAC-GGL-PPDEE
Networking in Google Cloud Platform
CODE: WGAC-GGL-NGCP
Preparing for the Professional Cloud Architect Examination
CODE: WGAC-GGL-PPCAE
Getting Started with Google Kubernetes Engine
CODE: WGAC-GGL-GCP-GSGKE
Google Cloud Platform Fundamentals for AWS Professionals
CODE: WGAC-GGL-GCP-FAP
Developing Applications with Google Cloud Platform
CODE: WGAC-GGL-DAGCP
From Data to Insights with Google Cloud Platform
CODE: WGAC-GGL-DIGCP
Data Engineering on Google Cloud Platform
CODE: WGAC-GGL-DEGCP
Google Cloud Fundamentals: Big Data and Machine Learning
CODE: WGAC-GGL-GCF-BDM
Architecting with Google Cloud Platform: Design and Process
CODE: WGAC-GGL-AGCP-DP
Google Cloud Fundamentals: Core Infrastructure
CODE: WGAC-GGL-GCF-CI
We use cookies to understand how you use our site and to improve your experience. To learn more, click here. Read our revised Privacy Policy and Terms and Conditions.