Skip to content

Anamika1-cpu/Netflix-Project-ADF-Databricks

Repository files navigation

Netflix-Project-ADF-Databricks(End-to-End DE Project)

Overview

This project showcases a complete end-to-end data engineering solution using Azure services, emphasizing dynamic pipeline orchestration, real-time data ingestion, and seamless integration with Power BI for interactive analytics.

Architecture

# Highlights
  • Designed and implemented dynamic ingestion pipelines in Azure Data Factory and Databricks to automate raw file ingestion into ADLS Gen2, applying consistent file naming conventions and schema structures for scalable processing.
  • Configured external locations and storage credentials in Unity Catalog to ensure secure, governed access to data lakes across staging and production environments.
  • Utilized Databricks Autoloader for efficient, incremental ingestion of streaming data, reducing latency and eliminating manual tracking of file states.
  • Built a real-time processing pipeline using Delta Live Tables (DLT) to handle continuous data loads with built-in quality checks, schema enforcement, and lineage tracking, replicating architectures similar to those used by Netflix.
  • Developed conditional workflows using dynamic parameters and if-else logic to route data through specific transformation paths based on file metadata and business rules.
  • Enabled business reporting by integrating processed data into Power BI, creating dynamic dashboards and KPI visualizations for real-time insights.
  • Ensured end-to-end orchestration, monitoring, and troubleshooting using Azure Monitor, Log Analytics, and Databricks notebooks for efficient operational management.

    Tech Stack

  • Azure Data Lake: Scalable, secure cloud storage solution for managing both structured and unstructured data at scale.
  • Azure Data Factory: Enables orchestration of automated, metadata-driven ETL pipelines for seamless data movement and transformation.
  • Databricks Autoloader: Facilitates efficient, incremental ingestion of new files from cloud storage into Delta Lake for real-time processing.
  • Workflow Logic (If-Else): Implements dynamic, rule-based branching in pipelines to automate and route data flows based on custom conditions.
  • Power BI: Empowers business users with real-time, interactive dashboards and data visualizations for actionable decision-making.
  • Delta Live Tables: Automates streaming and batch data transformations with built-in data quality checks, schema enforcement, and lineage tracking for real-time analytics.
  • About

    No description, website, or topics provided.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

     
     
     

    Contributors