Skip to content

Latest commit

 

History

History
24 lines (18 loc) · 833 Bytes

13a5.md

File metadata and controls

24 lines (18 loc) · 833 Bytes

Overview of Data Engineering Ecosystem

Data engineers ecosystem includes infrastructure to

  • extract
  • architecting and managing pipelines and data repositories
  • optimize workflow and data flow
  • developing applications needed

Data can be categorized into

  • structured (follows format and in rows and columns)
  • semi-structured (consistent but not rigid)
  • unstructured (complex and qualitative)

Types of data repositories

  • transactional (OLTP), store high volume of data, mostly relational
  • analytical (OLAP), relation or non-relational, data warehouses/lakes

Collated -> Processed -> Cleansed -> Integrated -> Users

Data pipeline is a set of tools and processes of data from source to destination (ETL or ELT)

BI and reporting tools present integrated data in a visual format in a drag and drop manner