Skip to content
David Liu edited this page Dec 27, 2024 · 6 revisions

Welcome to the BI-collection wiki!

Challenges

  • As analytics engineers, how we prep data for analysis makes all the difference in how trustworthy it is to the rest of our organisation. If your end users don’t trust the data, it does not matter how much work you put in to all the previous steps - they will still silo their work in excel sheets and bury important business logic in one off dashboard filters.

Domain QuerySet

HR data

  • Is there any relationship between who a person works for and their performance score?
  • What is the overall diversity profile of the organization?
  • What are our best recruiting sources if we want to ensure a diverse organization?
  • Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
  • Are there areas of the company where pay is not equitable?

Best Practice

  • Cost optimization strategies
  • Centralized dimension datasets
  • Although many BI tools provide convenience functions for time transformations, we recommend that these transformations be persisted at the database level.
    • This will provide a convenience layer for persisting time transformed aggregates into summary/aggregate tables.
  • Tool selection
    • Multiple, smaller tools for specific jobs
    • Understand the degree of abstraction the tool can provide to enable strong inheritance rules and template driven report development.
      • Inheritance rules will provide a consistent means of enforcing common definitions across calculations.
      • Template support is a key capability for rapid report development
    • Self-service support
    • Connectivity
    • Robust security models within the reporting platform
      • Row-level-security
      • Object level access control (s3, object storage)

Legacy/Traditional Practice

  • monolithic approach to data modeling: each consumer of data would rebuild their own data transformations from raw source data.
  • one tool serves all reporting requests.
  • expensive proposition
  • embedding too much business logic in the tool itself.
  • Utilize a rich set of convenience functions provided by reporting environments to provision common business logic
    • It means restricted reuse and a requirement to duplicate the logic wherever the original tool can’t be used.
Clone this wiki locally