Chapter 1: Cloud Basics, Azure SQL DB
- Cloud Introduction and Azure Basics
- Azure Implementation: IaaS, PaaS, SaaS
- Benefits of Azure Cloud Environment
- Azure Data Engineer: Job Roles
- Azure Storage Components
- Azure ETL & Streaming Components
- Need for Azure Data Factory (ADF)
- Need for Azure Synapse Analytics
- Azure Resources and Resource Types
- Resource Groups in Azure Portal
- Azure SQL Server [Logical Server]
- Firewall Rules and Azure Services
- Connections with SSMS & ADS Tools
- Working with Azure Portal
- Resource Group Navigations, Options
|
Chapter 1: Azure Storage & Containers
- Storage Components in Microsoft Azure
- Azure Storage Services and Types - Uses
- High Availability, Durability & Scalability
- Blob: Binary Large Object Storage
- General Purpose: Gen 1 & Gen 2 Versions
- Blobs, File Share, Queues and Tables
- Data Lake Gen 2 Operations with Azure
- Azure Storage Account Creation
- Azure Storage Container: Usage
- Azure Data Explorer: Operations
- File Uploads, Edits and Access URLs
- Azure Storage Explorer Tool Usage
- Azure Account Options in Explorer
- Directory Creation, File Operations
- End User Access Options With Files
- Data Explorer Vs Storage Explorer Tool
|
Chapter 1: Azure Intro, Azure Databricks
- Azure Databricks : Purpose & Config
- Need for Azure Databricks (ADB)
- Azure Databricks Service Creation
- Azure Databricks Workspace & Usage
- Spark Cluster Configurations & Capacity
- Driver Nodes and Worker Nodes in Spark
- Master Node & Cluster Creation Process
- Cluster Types and Capacity Options
- Standard, High Concurrency Clusters
- Databricks Runtime Service & DBUs
- Databricks File System (DBFS) and Usage
- Azure Databricks Workspace Operations
- ETL and Data Storage Components
- Spark Concepts and Spark SQL
- Spark Context and Spark Session
- DataFrame, Dataset and Real-time Use
|
Chapter 2: Synapse SQL Pools (DWH)
- Dedicated SQL Pools in Azure
- Enterprise Data Warehouse with Synapse
- DWU: Data Warehouse Units, Resources
- Massively Parallel Processing (MPP)
- Control Nodes and Compute Nodes
- SQL Pool Access from SSMS Tool
- T-SQL Queries @ SQL Pools
- Start/Resume/Pause, Scaling Options
- Creating Tables in Azure SQL Pool
- Compression, MAX DOP & Indexes
- Distributions: Round Robin, Hash
- Distributions: Replicate and Usage
- Data Imports with COPY Table
- Dynamic Views (DMV) with PDW
- Data Loads Monitoring, Resource Class
|
Chapter 2: Azure Migration, BLOB Imports
- SQL Server (On-Premise) to Azure Migration
- Source Database Scripts & Validations
- BACPAC File Generation From SSMS Tool
- Azure Data Lake Storage and SSMS Access
- Azure Storage Container, BACPAC Files
- Azure SQL Server Creation From Portal
- Azure SQL DB Imports, Storage SAS Keys
- Azure SQL Database Migrations, Verification
- BLOB Data Access from On-Premise
- Data Imports From Excel and CSV Files
- BLOB Data Imports using T-SQL Queries
- SAS - Shared Access Signature Generation
- CSV File - Uploads, Downloads, Edits, Keys
- Master Keys, Credentials, External Sources
- BULK INSERT Statement and Data Imports
- T-SQL Imports : Practical Limitations
|
Chapter 2: SQL Notebooks & Python
- Notebooks: Concept, Usage Options
- Creating SQL Notebooks in Databricks
- Using DBFS Tables in SQL Notebooks
- Data Access and Analytics Options
- SparkSQL Queries: SELECT, GROUP BY
- SparkSQL Queries: Aggregates, Conditions
- Notebook Operations: Download, Clone
- Notebook Operations: Upload, Reuse
- SQL Notebooks with Python Code
- Using DBFS Sample Data Sources (CSV)
- Dataframes: Creation and Real-time Use
- Pandas Dataframe, Virtual Table Creation
- Dataframe Data Access, Caching Options
- Take() and Display() Functions in PySpark
- Temporary View Creation and Access
- SparkSQL Queries, Analytics, Chart Reports
|
Chapter 3: Azure Data Factory Concepts
- Azure Data Factory (ADF) Concepts
- Hybrid Data Integration at Scale
- ADF Pipeline Components & Usage
- Configure ADF Resource in Azure
- Understanding ADF Portal and IR
- Linked Services and Connections
- Datasets and Tables / Files for ETL
- ADF Pipelines: Design, Publish & Trigger
- ADF Pipeline with Copy Data Tool
- Creating Azure Storage Account
- Storage Container, BLOB File Uploads
- Data Loads with Azure BLOB Files
- DIU Allocations and Concurrency
- Creating Linked Services, Datasets
- Pipeline Trigger, Author and Monitor
|
Chapter 3: Azure Tables, Shares
- Azure Tables - Real-time Usage
- Schema-less Design and Access Options
- Structured and Relational Data Storage
- Tables, Entities and Properties Concepts
- Azure Tables: Creation and Data Inserts
- Azure Tables in Portal - GUI and Data Types
- Azure Tables: Data Imports in Explorer
- Data Edits, Queries & Delete Operations
- Azure Files - SMB Protocol, Creation, Usage
- Shared Access, Fully Managed & Resiliency
- Performance, Size Requirements for Shares
- Azure Storage Explorer Tool for File Shares
- Azure Queues: Message Queues, Limitations
- Adding Messages, Queuing and De-Queuing
- Data Access & Clear Queue from Explorer
- End Points for Azure Message Queues
|
Chapter 3: Python Notebooks
- Azure SQL Server Configurations
- Azure SQL Database Creation
- Azure Firewall Rules and IP Address
- Allow Azure Services, Remote Access
- Connection Tests with SSMS Tool
- Python Notebooks with Azure Databricks
- Data Imports and Table Creations (Code)
- Parquet Files and Usage in Databricks
- Using Dataframes for Data Operations
- SparkSQL Queries with SELECT, TOP
- Establishing Connections to Azure SQL DB
- JDBC Connection Strings, DataframeWriter
- JDBC Properties, Port Settings & Options
- Data Extraction, SQLContext & Dataframes
- Pandas Data Frame for Big Data Analytics
- JDBC URL Options & PySparkSQL Modules
|
Chapter 4: ADF Pipelines, Polybase
- Copy Data Tool For ETL Operations
- Azure SQL DB to Synapse Data Loads
- Working with Multi Tables Data Loads
- Query Options for Source Datasets
- Transformations with Copy Data Tool
- Rename, Rearrange & Remove Options
- Pipeline Execution: DTU & DOCP
- ADF Pipeline Monitoring Options
- ADF Pipelines: Execution Settings
- ADF Logging Options, Consistency Check
- Compression Option, DOP and DOCP
- ETL Staging Advantages & Performance
- Staging with Storage Account, Container
- ADF Pipeline Triggers and Monitoring
- Polybase For Azure Synapse, Advantages
|
Chapter 4: Azure Storage Security, Admin
- Azure Data Lake Storage Security Options
- Shared Access Keys - Primary, Secondary Keys
- SAS Key Generation: Container, Tables, Files
- SAS Key Permissions, Validation Options
- Access Keys: Account Level Permissions
- Azure Active Directory (AAD): Users, Groups
- Azure AD Security: RBAC with IAM, ACLs
- Owner Role, Contributor and Reader Role
- Azure Data Lake Storage Security Options
- ACL : Access Control Lists & Security
- Azure BLOB Storage Containers & ACLs
- Folder Level and File Level Security
- ACL Permissions: Read, Write & Execute
- Access Policy: Creation and Realtime Use
- Permissions: rwacdl; Azure Principals, CORS
- Comparing IAM and ACLs in Data Lake Store
|
Chapter 4: Open Data Sources, DeltaLakes
- Creating Python Notebooks with Databricks
- Spark Dataframes with Azure OpenDatasets
- Windows Azure Storage Blob [wasb] Sources
- Creating Dataframes & Temporary Views
- Using Print and Display Functions with ADB
- Big Data Analysis with BLOB Data & Charts
- Keys, Values, Aggregations, Display Type
- Databricks Notebooks, Jobs and Stages
- Azure DeltaLake Implementation
- ACID Properties and Upsert Advantages
- Delta Engine Optimizations & Uses
- Pipeline Creation with JSON Files in DBFS
- Delta Tables Creation, Data Loads
- Spark Cluster Settings: Auto Optimize
- Auto Compact and Delta Table Optimize
- Delta Locations; Data Retrieval, Versions
|
Chapter 5: OnPremise Data with ADF
- On-Premise Data Sources with Azure
- Self Hosted Integration Runtime (IR)
- Access Keys, Remote Linked Services
- Synapse SQL Pool (DW) with OnPremise
- Staged Data Copy and Performance
- Pipeline Executions and Monitoring
- Pipeline RunIDs and Audits / Tracing
- Incompatible Rows Skips, Fault Tolerance
- Incremental Loads with Files (BLOB)
- Pipeline Executions and Schedules
- Regular Schedules and Tumbling Window
- Execution Retry and Delay Options
- Binary Copy, Last Modified Date in Blob
- Automated Loops and Trigger Schedules
- Incremental Loads Verification Tests
|
Chapter 5: Azure Monitoring, Power BI
- Azure Monitor, Metrics & Logs
- Monitoring Azure Storage Namespaces
- Add KQL Metrics; Account, Blob and File
- Total Ingress and Egress Metrics: Charts
- Average Latency, Transaction Count
- Request Breakdowns, Signal Logic Options
- Azure Alerts and Conditions, Notifications
- Signal Logic Conditions and Emails
- Power BI Desktop Tool Installation
- Binary Data and Record Data Access
- Azure Data Lake Storage: Access Keys
- Azure Data Lake Storage with Power BI
- BLOB File Access with Power BI
- Azure Tables Creation and File Imports
- Azure Table Access with Power BI
|
Chapter 5: Databricks Security & Jobs
- Azure Databricks Security Operations
- Azure Active Directory (Azure AD)
- AD Users and RBAC with IAM
- Owner, Contributor & Reader Roles
- Workspace Admin Permissions
- Notebook Permissions and Share Options
- Shared Notebooks, User Access Options
- Notebook Operations: Clone & Export
- Databricks Jobs: Creation Options, Usage
- Job Limits, Workspace, Concurrency Limits
- Notebooks with and without Parameters
- Jobs with Default Parameters, Executions
- Interactive, Automated Clusters for Jobs
- Job Schedules and Manual Executions
- Active Jobs, Recently Run Jobs, Monitoring
- ADB Jobs with Azure OData Sources, BLOB
|
Chapter 6: ADF Data Flow - 1
- Limitations with Copy Data Tool
- Data Flow Task, Data Flow Activity
- Transformations with Data Flow
- Spark Cluster For Debugging
- Cluster Node Configurations
- Data Preview Options with DFT
- SELECT Transformation & Options
- JOIN Transformation and Usage
- Conditional Split Transformation
- Aggregate & Group By Transformations
- Synapse Sink Options with DFT
- DFT Optimization Techniques
- Pipeline Debug Runs and ETL Testing
- Spark Cluster For Pipeline Executions
- Pipeline Monitoring & Run IDs
|
Chapter 6: Azure Stream Analytics, IoT
- Azure Stream Analytics: Real-time Usage
- Real-time Data Processing, Event Tracking\
- Ingest, Deliver and Analysis Operations
- Azure Stream Analytics Jobs Concept
- Understanding Input & Output Options
- SAQL Queries for Stream Analytics Jobs
- IoT: Internet Of Things For Real-time Data
- Need for IoT Hubs and Event Hubs
- Creating IoT Device for Data Inputs
- Creating Azure Strean Analytics Resource
- Stream Analytics Jobs for Historical Data
- Azure SQL Database Options for ASA Jobs
- SAQL: Query Formatting and Validation
- Historical Data Uploads, ASA Job Execution
- Stream Analytics Job Monitoring Options
|
Chapter 6: Databricks @ BLOB, Power BI
- BLOB Data Access with Databricks
- Accessing Storage Account, Container
- Gerate, Use SAS: Shared Access Signature
- dbutils.fs.mount() with DBFS Store
- fs.azure.sas.container.strorageaccount
- spark.read() and DBFS Mounts
- Scala Transformations, Create Temp View
- Spark SQL Queries with Temp Views
- dataframe.write.jdbc() & JVM Properties
- spark.read.jdbc() with Azure SQL DB
- Power BI Integration with Databricks
- Server Host Name, Port and Http Path
- Cluster Configurations and JDBC
- User Access Token Generation, Usage
- Spark ClusterAccess, Power BI Analytics
|
Chapter 7: ADF Data Flow - 2
- ADF Pipelines For ETL Operations
- Data Flow Tasks and Activities in Synapse
- Pivot Transformation For Normalization
- Generating Pivot Column, Aggregations
- Pivot Transformation and Pivot Settings
- Pivot Key Selection, Value and Nulls
- Pivoted Columns and Column Pattern
- Column Prefix, Help Graphic & Metadata
- Window Functions & Usage in Data Flow
- Rank / DenseRank / Row Number
- Over Clause and Input Options
- Derived Column Transformations
- Exists & Lookup Transformations
- Reusing Data Flow Tasks in Synapse
- Pipeline Validations & Executions
|
Chapter 7: IoT Hubs & Event Hubs
- Azure Stream Analytics For API Data
- IoT Hubs & IoT Devices, Connection Strings
- Rasberry APP Connections with IoT Hub
- Azure Storage Account and Container
- Creating Azure Stream Analytics Job
- Configuring Input Aliases with IoT Hub
- Configuring Output Alias with ADLS Gen 2
- SAQL Query and Job Executions; Monitoring
- Azure Event Hubs and Event Instances
- Event Hub Namespaces, Partition Counts
- Access Policies, Permissions & Defaults
- RootManageSharedAccessKey & Options
- Connection Strings & Event Service Bus
- Telco App Installation, Executions. LIVE Data
- On-Premise App Integration with ASA Jobs
|
Chapter 7: Databricks Integrations
- Azure Databricks with Data Lake Storage
- Handling Unstructured Data in Azure
- Data Preparation and Staging Operations
- Azure App (Service Principal) Registration
- Azure Key Vault Creation & Key Usage
- Service Principal Permissions @ Data Loads
- Tenants and Authorization Settings
- Client Credentials, Token Provider Options
- Spark Notebooks For Dynamic Connections
- Parameterized Options & Blob Access
- Data Preparation & Big Data Ingestion
- Data Extraction and ADLS Storage
- show(), transformations, wasbs Options
- Azure SQL Server & Synapse Creations
- Data Loads with Incremental Changes
|
Chapter 8: Azure Synapse Analytics
- Azure Synapse Analytics Resource
- Azure Synapse Analytics Workspace
- Managed Resource Group, SQL Account
- SQL Admin Account and its Purpose
- Operations with Synapse Workspace
- ADLS Gen 2 Storage Account, Container
- Synapse Studio (Synapse Portal)
- Dedicated SQL Pools & Spark Pools
- Creating Dedicated SQL Pools
- Synapse Tables, Data Loads with T-SQL
- COPY INTO Statements with T-SQL
- Clustered Column Store Indexes
- Row Terminator and Compressions
- T-SQL Queries and Aggregations
- Aggregation Data Loads in Synapse
|
Chapter 8: Azure Stream Analytics Security
- Azure Key Vaults & ADLS [Data Lake] Security
- Azure Passwords, Keys and Certificates
- Azure Key Vaults - Name and Vault URI
- Inbuilt Managed Key and Azure Key Vault
- Standard Type, Premium Type Azure Key Vaults
- Secret Page, Key Backups and Key Restores
- Adding Keys to Azure Vaults. Key Type, Size
- Using Azure Key Vaults to secure Resources
- Azure Storage: Replications and DR Options
- LRS: Locally Redundant Storage
- GRS: Globally Redundant Storage
- ZRS: Zone Redundant Storage
- Replication Options and Advantages
- Replication Verification and Modifications
- Azure Storage Endpoints, Failover Partner
|
Real-Time Project
- ADF Integration, Real-time Project
- Azure Databricks Integrations with ADF
- Defining Scala Notebooks in ADB
- Using Notebooks in Azure Data Factory
- spark.conf.set & fs.azure.account.key
- spark.read.format, Option() and Head()
- Online Retail Database Data Source
- Azure Migrations and ETL Concepts
- Azure SQL Pool (Synapse DWH) Tables
- Apache Spark Pool : Databases, Tables
- Azure Data Lake Storage (ADLS Gen 2)
- Azure Stream Analytics Jobs with IoT
- Azure Data Bricks and DBFS, Notebooks
- Concept wise FAQs, Resume Guidance
- Project Requirement, Solution, FAQs
- DP 203 Certification Guidance
|
Chapter 9: Synapse Analytics with Spark
- Apache Spark Pool in Azure Synapse
- Spark Cluster Nodes: Vcores, Memory
- Creating Spark Clusters @ Synapse Studio
- Python Notebooks For Remote Access
- Creating Databases in Apache Spark Pool
- Data Loads from Dedicated SQL Pools
- Table Creations, Aggregation Operations
- PySpark Code for Data Operations, Writes
- Serverless Pool in Azure Synapse
- Connections, Usage with Serverless Pool
- Using Azure OpenDatasets in Synapse
- OPENROWSET and BULK Data Loads
- Azure Storage Account : Data Analysis
- Working with Parquet Files in Synapse
- Python Notebooks (Pyspark) in Synapse
|
Chapter 10: Incremental Loads @ Synapse
- Incremental Loads with Synapse Studio
- Multi Table Merge Operations
- On-Premise Data Sources & Timestamps
- Azure SQL DB Destinations, Watermarks
- Watermark Table Usage & Audits
- Stored Procedures for Timestamp Updates
- Table Data Type and Dynamic MERGE
- SQL Queries for Datasets and Fetch
- Lookup Activity and its Usage un Synapse
- Expressions in ADF Portal for Lookup
- Expressions in ADF Portal for Source
- Output Pipeline Expression, Data Window
- Concat Function, Run IDs Expressions
- JSON Parameters, Pipeline Scheduling
- Pipeline Validation, Trigger and Monitoring
|
Chapter 11: Optimizations, Power Query
- ADF ETL with GUI : Power Query
- Power Query Resoruce Creation, Use
- Source Data Configurations & Settings
- Rename, Remove, Pivot, Group By, Order
- Index, Filter, Remove Error Rows
- Using Power Query Activity, ADF Pipelines
- Spark Cluster Configurations for Pipelines
- Concurrency, Big Data Recommendations
- Storage Optimization Techniques
- ETL Optimization Techniques
- SQL Pool (Synapse) Optimizations
- Indexes, Partitions, Distributions, DOP
- Pipeline Optimization Techniques
- Partitions, DOCP, Compressions, DIU
- Staging, Polybase and Core Counts
|
Chapter 12: Pipeline Monitoring, Security
- Azure Monitor Resource and Usage
- Pipeline Monitoring Techniques
- ADF: Pipeline Monitoring and Alerts
- Synapse: Pipeline Monitoring and Alerts
- Synapse: Storage Monitoring and Alerts
- Conditions, Signal Rules and Metrics
- Email Notifications with Azure
- Concurrency, Big Data Recommendations
- Azure Active Directory (AAD) Users, Groups
- IAM: Identity & Access Management
- Synapse Workspace Security with RBAC
- ADF Security with RBAC: Owner, Contributor
- Azure Synapse SQL Pool Security: Logins
- Users, Roles and Resource Classes (RC)
- ADF V1 to V2 Migrations, Considerations
|