Cloud Computing Certification Course with Caltech CTME|Microsoft Azure Training for Certified Administrator.

Azure Data Engineering Training - Details

MODULE 1: INTRODUCTION TO DATA ENGINEERING ON AZURE

• Introduction

• What is data engineering

• Important data engineering concepts

• Data engineering in Microsoft Azure

MODULE 2: STORE DATA IN AZURE

• Classify your data

• Determine operational needs

• Group multiple operations in a transaction

• Choose a storage solution on Azure

• Create an Azure Storage account

• Create an Azure Storage account with the correct options for your business needs.

• Decide how many storage accounts you need

• Choose your account settings

• Choose an account creation tool

• Create a storage account using the Azure portal

MODULE 3: DESIGN AND STORE RELATIONAL CLOUD DATA– AZURE SQL DATABASES

• Partitioning Tables

• Designing Data Distribution in Azure

• Designing for Scale in SQL DB

• Designing for Disaster Recovery and High Availability

• Design a Disaster Recovery Strategy

• Managing an Azure SQL Database

MODULE 4: DESIGN AND STOR NON-RELATIONAL DATA– AZURE COSMOS DB

• Cosmos DB Essentials

• Implementing Consistency in Cosmos DB

• Partitioning and Horizontal Scaling in Cosmos DB

• Selecting and Implementing API in Cosmos DB

• Implementing Security in Cosmos DB

• Intro to Using Azure Blob Storage

• Provisioning a Cosmos DB Instance in Azure

MODULE 5: DESIGN AND STORE USING AZURE BLOB STORAGE

• Introduction to Blob storage

• Design a storage organization strategy

• Create Azure storage resources

• Configure and initialize the client library

• Blob uploads and downloads

• Explore Azure storage services

• Create an Azure storage account

• Interact with the Azure Storage APIs

• Connect to your Azure storage account

• Upload an image to your Azure Storage account

• Explore Azure Storage security features

• Understand storage account keys

• Understand shared access signatures

• Control network access to your storage account

• Understand Advanced Threat Protection for Azure Storage

• Explore Azure Data Lake Storage security features

MODULE 6 : AZURE DATA LAKE STORAGE GEN2

• Key features and benefits of Azure Data Lake Storage Gen2

• Enable Azure Data Lake Storage Gen2 in an Azure Storage account

• Compare Azure Data Lake Storage Gen2 and Azure Blob storage

• Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing

• Describe how Azure data Lake Storage Gen2 is used in common analytical workloads

• Large-Scale Data Processing with Azure Data Lake Storage Gen2

MODULE 7 : DATA INTEGRATION WITH AZURE DATA FACTORY

• Understand Azure Data Factory

• Describe data integration patterns

• Explain the data factory process

• Understand Azure Data Factory components

• Azure Data Factory security

• Set-up Azure Data Factory

• Create linked services

• Create datasets

• Create data factory activities and pipelines

• Manage integration runtimes

• Data integration with Azure Data Factory

• Code-free transformation at scale with Azure Data Factory

MODULE 8: DATA TRANSFORMATION WITH AZURE DATA FACTORY

• Transform Data with Azure Data Factory

• Execute code-free transformations at scale with Azure Data Factory

• Create data pipeline to import poorly formatted CSV files

• Create Mapping Data Flows

• Code-free transformation at scale with Azure Data Factory

• Populate slowly changing dimensions in Azure Data Factory

• Describe SQL Server integration services

• Understand the Azure SSIS Integration Runtime

• Set-up Azure SSIS Integration Runtime

• Migrate SSIS packages to Azure Data Factory

MODULE 9: DATA INTEGRATION AND TRANSFORMATION WITH AZURE SYNAPSE ANALYTICS

• Introduction to Azure Data Lake storage

• Describe Delta Lake architecture

• Explore compute and storage options for data engineering workloads

• Combine streaming and batch processing with a single pipeline

• Organize the data lake into levels of file transformation

• Index data lake storage for query and workload acceleration

• Design a Modern Data Warehouse using Azure Synapse Analytics

• Secure a data warehouse in Azure Synapse Analytics

• Managing files in an Azure data lake

• Securing files stored in an Azure data lake

MODULE 10 : RUN INTERACTIVE QUERIES USING AZURE SYNAPSE ANALYTICS SERVERLESS SQL POOLS

• Explore Azure Synapse serverless SQL pools capabilities

• Query data in the lake using Azure Synapse serverless SQL pools

• Create metadata objects in Azure Synapse serverless SQL pools

• Secure data and manage users in Azure Synapse serverless SQL pools

• Run interactive queries using serverless SQL pools

• Query Parquet data with serverless SQL pools

• Create external tables for Parquet and CSV files

• Create views with serverless SQL pools

• Secure access to data in a data lake when using serverless SQL pools

• Configure data lake security using Role-Based Access Control (RBAC) and Access Control List

MODULE 11: EXPLORE, TRANSFORM, AND LOAD DATA INTO THE DATA WAREHOUSE USING APACHE SPARK

• Understand big data engineering with Apache Spark in Azure Synapse Analytics

• Ingest data with Apache Spark notebooks in Azure Synapse Analytics

• Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics

• Integrate SQL and Apache Spark pools in Azure Synapse Analytics

• Explore, transform, and load data into the Data Warehouse using Apache Spark

• Perform Data Exploration in Synapse Studio

• Ingest data with Spark notebooks in Azure Synapse Analytics

• Transform data with DataFrames in Spark pools in Azure Synapse Analytics

• Integrate SQL and Spark pools in Azure Synapse Analytics

MODULE 12 : EXPLORE AZURE DATABRICKS

• Get started with Azure Databricks

• Identify Azure Databricks workloads

• Understand key concepts

• Explore Azure Databricks

• Create a Spark cluster

• Use Spark in notebooks

• Use Spark to work with data files

MODULE 13 : USE DELTA LAKE IN AZURE DATABRICKS

• Introduction

• Get Started with Delta Lake

• Create Delta Lake tables

• Create and query catalog tables

• Use Delta Lake for streaming data

• Use Delta Lake in Azure Databricks

MODULE 14 : USE SQL WAREHOUSES IN AZURE DATABRICKS

• Get started with SQL Warehouses

• Create databases and tables

• Create queries and dashboards

• Use a SQL Warehouse in Azure Databricks

• Module: Run Azure Databricks Notebooks with Azure Data Factory

MODULE 15: RUN AZURE DATABRICKS NOTEBOOKS WITH AZURE DATA FACTORY

• Understand Azure Databricks notebooks and pipelines

• Create a linked service for Azure Databricks

• Use a Notebook activity in a pipeline

• Use parameters in a notebook

• Run an Azure Databricks Notebook with Azure Data Factory

MODULE 16: DATA EXPLORATION AND TRANSFORMATION IN AZURE DATABRICKS

• Describe Azure Databricks

• Read and write data in Azure Databricks

• Work with DataFrames in Azure Databricks

• Work with DataFrames advanced methods in Azure Databricks

• Data Exploration and Transformation in Azure Databricks

• Use DataFrames in Azure Databricks to explore and filter data

• Cache a DataFrame for faster subsequent queries

• Remove duplicate data

• Manipulate date/time values

• Remove and rename DataFrame columns

• Aggregate data stored in a DataFrame

MODULE 17 : SUPPORT HYBRID TRANSACTIONAL ANALYTICAL PROCESSING (HTAP) WITH AZURE SYNAPSE LINK

• Design hybrid transactional and analytical processing using Azure Synapse Analytics

• Configure Azure Synapse Link with Azure Cosmos DB

• Query Azure Cosmos DB with Apache Spark pools

• Query Azure Cosmos DB with serverless SQL pools

• Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

• Configure Azure Synapse Link with Azure Cosmos DB

• Query Azure Cosmos DB with Apache Spark for Synapse Analytics

• Query Azure Cosmos DB with serverless SQL pool for Azure Synapse Analytics

MODULE 18: REAL-TIME STREAM PROCESSING WITH STREAMING ANALYTICS

• Enable reliable messaging for Big Data applications using Azure Event Hubs

• Work with data streams by using Azure Stream Analytics

• Ingest data streams with Azure Stream Analytics

• Real-time Stream Processing with Stream Analytics

• Use Stream Analytics to process real-time data from Event Hubs

• Use Stream Analytics windowing functions to build aggregates and output to Synapse Analytics

• Scale the Azure Stream Analytics job to increase throughput through partitioning

• Repartition the stream input to optimize parallelization

MODULE 19: CREATE A STREAM PROCESSING SOLUTION WITH EVENT HUBS AND AZURE DATABRICKS

• Process streaming data with Azure Databricks structured streaming

• Create a Stream Processing Solution with Event Hubs and Azure Databricks

• Explore key features and uses of Structured Streaming

• Stream data from a file and write it out to a distributed file system

• Use sliding windows to aggregate over chunks of data rather than all data

• Apply watermarking to remove stale data

• Connect to Event Hubs read and write streams