• Introduction
• What is data engineering
• Important data engineering concepts
• Data engineering in Microsoft Azure
• Classify your data
• Determine operational needs
• Group multiple operations in a transaction
• Choose a storage solution on Azure
• Create an Azure Storage account
• Create an Azure Storage account with the correct options for your business needs.
• Decide how many storage accounts you need
• Choose your account settings
• Choose an account creation tool
• Create a storage account using the Azure portal
• Partitioning Tables
• Designing Data Distribution in Azure
• Designing for Scale in SQL DB
• Designing for Disaster Recovery and High Availability
• Design a Disaster Recovery Strategy
• Managing an Azure SQL Database
• Cosmos DB Essentials
• Implementing Consistency in Cosmos DB
• Partitioning and Horizontal Scaling in Cosmos DB
• Selecting and Implementing API in Cosmos DB
• Implementing Security in Cosmos DB
• Intro to Using Azure Blob Storage
• Provisioning a Cosmos DB Instance in Azure
• Introduction to Blob storage
• Design a storage organization strategy
• Create Azure storage resources
• Configure and initialize the client library
• Blob uploads and downloads
• Explore Azure storage services
• Create an Azure storage account
• Interact with the Azure Storage APIs
• Connect to your Azure storage account
• Upload an image to your Azure Storage account
• Explore Azure Storage security features
• Understand storage account keys
• Understand shared access signatures
• Control network access to your storage account
• Understand Advanced Threat Protection for Azure Storage
• Explore Azure Data Lake Storage security features
• Key features and benefits of Azure Data Lake Storage Gen2
• Enable Azure Data Lake Storage Gen2 in an Azure Storage account
• Compare Azure Data Lake Storage Gen2 and Azure Blob storage
• Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing
• Describe how Azure data Lake Storage Gen2 is used in common analytical workloads
• Large-Scale Data Processing with Azure Data Lake Storage Gen2
• Understand Azure Data Factory
• Describe data integration patterns
• Explain the data factory process
• Understand Azure Data Factory components
• Azure Data Factory security
• Set-up Azure Data Factory
• Create linked services
• Create datasets
• Create data factory activities and pipelines
• Manage integration runtimes
• Data integration with Azure Data Factory
• Code-free transformation at scale with Azure Data Factory
• Transform Data with Azure Data Factory
• Execute code-free transformations at scale with Azure Data Factory
• Create data pipeline to import poorly formatted CSV files
• Create Mapping Data Flows
• Code-free transformation at scale with Azure Data Factory
• Populate slowly changing dimensions in Azure Data Factory
• Describe SQL Server integration services
• Understand the Azure SSIS Integration Runtime
• Set-up Azure SSIS Integration Runtime
• Migrate SSIS packages to Azure Data Factory
• Introduction to Azure Data Lake storage
• Describe Delta Lake architecture
• Explore compute and storage options for data engineering workloads
• Combine streaming and batch processing with a single pipeline
• Organize the data lake into levels of file transformation
• Index data lake storage for query and workload acceleration
• Design a Modern Data Warehouse using Azure Synapse Analytics
• Secure a data warehouse in Azure Synapse Analytics
• Managing files in an Azure data lake
• Securing files stored in an Azure data lake
• Explore Azure Synapse serverless SQL pools capabilities
• Query data in the lake using Azure Synapse serverless SQL pools
• Create metadata objects in Azure Synapse serverless SQL pools
• Secure data and manage users in Azure Synapse serverless SQL pools
• Run interactive queries using serverless SQL pools
• Query Parquet data with serverless SQL pools
• Create external tables for Parquet and CSV files
• Create views with serverless SQL pools
• Secure access to data in a data lake when using serverless SQL pools
• Configure data lake security using Role-Based Access Control (RBAC) and Access Control List
• Understand big data engineering with Apache Spark in Azure Synapse Analytics
• Ingest data with Apache Spark notebooks in Azure Synapse Analytics
• Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
• Integrate SQL and Apache Spark pools in Azure Synapse Analytics
• Explore, transform, and load data into the Data Warehouse using Apache Spark
• Perform Data Exploration in Synapse Studio
• Ingest data with Spark notebooks in Azure Synapse Analytics
• Transform data with DataFrames in Spark pools in Azure Synapse Analytics
• Integrate SQL and Spark pools in Azure Synapse Analytics
• Get started with Azure Databricks
• Identify Azure Databricks workloads
• Understand key concepts
• Explore Azure Databricks
• Create a Spark cluster
• Use Spark in notebooks
• Use Spark to work with data files
• Introduction
• Get Started with Delta Lake
• Create Delta Lake tables
• Create and query catalog tables
• Use Delta Lake for streaming data
• Use Delta Lake in Azure Databricks
• Get started with SQL Warehouses
• Create databases and tables
• Create queries and dashboards
• Use a SQL Warehouse in Azure Databricks
• Module: Run Azure Databricks Notebooks with Azure Data Factory
• Understand Azure Databricks notebooks and pipelines
• Create a linked service for Azure Databricks
• Use a Notebook activity in a pipeline
• Use parameters in a notebook
• Run an Azure Databricks Notebook with Azure Data Factory
• Describe Azure Databricks
• Read and write data in Azure Databricks
• Work with DataFrames in Azure Databricks
• Work with DataFrames advanced methods in Azure Databricks
• Data Exploration and Transformation in Azure Databricks
• Use DataFrames in Azure Databricks to explore and filter data
• Cache a DataFrame for faster subsequent queries
• Remove duplicate data
• Manipulate date/time values
• Remove and rename DataFrame columns
• Aggregate data stored in a DataFrame
• Design hybrid transactional and analytical processing using Azure Synapse Analytics
• Configure Azure Synapse Link with Azure Cosmos DB
• Query Azure Cosmos DB with Apache Spark pools
• Query Azure Cosmos DB with serverless SQL pools
• Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link
• Configure Azure Synapse Link with Azure Cosmos DB
• Query Azure Cosmos DB with Apache Spark for Synapse Analytics
• Query Azure Cosmos DB with serverless SQL pool for Azure Synapse Analytics
• Enable reliable messaging for Big Data applications using Azure Event Hubs
• Work with data streams by using Azure Stream Analytics
• Ingest data streams with Azure Stream Analytics
• Real-time Stream Processing with Stream Analytics
• Use Stream Analytics to process real-time data from Event Hubs
• Use Stream Analytics windowing functions to build aggregates and output to Synapse Analytics
• Scale the Azure Stream Analytics job to increase throughput through partitioning
• Repartition the stream input to optimize parallelization
• Process streaming data with Azure Databricks structured streaming
• Create a Stream Processing Solution with Event Hubs and Azure Databricks
• Explore key features and uses of Structured Streaming
• Stream data from a file and write it out to a distributed file system
• Use sliding windows to aggregate over chunks of data rather than all data
• Apply watermarking to remove stale data
• Connect to Event Hubs read and write streams