Skip to content

marttipraks/dataeng

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering:

Repository for the Data Engineering Course (LTAT.02.007)

logout

logodsg

Graph View

inline

Teaching Assistants:

Acknowledgments

Special Thanks to Emanuele Della Valle and Marco Brambilla from Politecnico di Milano to letting me "steal" some of their great slides.

Lectures

Date Title Material Mandatory Reads Extras
01/09 Course Intro Slides - pdf slide 45-109)
03/09 Data Modeling Slides - pdf slide 1-44 Chp 4 p111-127, Chp 5 p151-156, Chp 6 p199-205 of [3]
10/09 DM for Relational Databases Slides - pdf slide 45-109 Chp 2, 6, and 7 (Normal Forms) of [1] Relational Model
10/09 DM for Data Warehouse Slides - pdfslide 109-118 pdf video Chp 2 of [2]
17/09 DM for Big Data Slides - pdf Chp 2 of [3], video paper
17/09 Key Value Stores Slides
Column Oriented Databases
Document Databases
Graph Databases
Data Engineering Pipelines Chp 1 of [3]
Keynote TBA
Streaming Data Chp 11 of [3]
Data Wrangling

Practices (Videos Will be Available after Group 2 issue)

Date Title Material Reads Videos Assignment Notes
07-8/09 Docker Slides - Lab Branch Video GP1 Video GP2 QA GP2 only
14-15 /09 Modeling and Querying Relational Data with Postgres Slides Chp 32 of [1]§ Video
21-22 /09 Modeling and Querying Key Value Data with Redis Slides Video
28-29/09 Modeling and Querying Document Data with MongoDB
5-6/10 Modeling and Querying Graph Data with Neo4J
Data Ingestion with Apache Kafka
Apache Airflow Data Pipelines
Stream Processing with Kafka Streams
Stream Processing with KSQL
Data Cleansing
Augmentation

Extras

Contributing

  • Modeling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a summary
  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Syllabus

  • What is (Big) Data?
  • The Role of Data Engineer
  • Data Modeling
    • Data Replication
    • Data Partitioning
    • Transactions
  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Data Vault
  • (Big) Data Pipelines
    • Big Data Systems Architectures
    • ETL and Data Pipelines
      • Best Practices and Anti-Patterns
    • Batch vs Streaming Processing
  • Data Cleansing
  • Data Augumentation

Books

[[slides/Slides]]

About

Repository fo Data Engineering Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published