Datawarehouse Schema

Types of Datawarehouse Schema

Fact and Dimension table in nutshell

Fact table

  • Fact table contains measurements, metrics, and facts about a business process
  • Fact table helps to store report label
  • Fact table does not contain a hierarchy

Dimension table

  • Companion to the fact table which contains descriptive attributes to be used as query constraining.
  • Dimension table contains detailed data
  • Dimension table contains hierarchies

Star Schema

Example

Advatages

  • Simple queries and less joins while retrieving the data
  • Query performance is faster due to denormalized form\

Disadvantages

  • Scaling issue might occur with new changes
  • Data redundancy is more as tables are not hierarchically divided.

Snowflake Schema

It is an extension of Star schema in such a way that the dimension table can be further normalized and split into main and secondary dictionary tables

Example

Advatages

  • Data redundancy is removed
  • Less storage space
  • Ease to update and maintain tables

Disadvantages

  • More joins , lookups and complex queries due to more dimension tables

Galaxy Schema or Fact Constellation Schema

A Galaxy Schema contains two fact table that share dimension tables between them. It is also called Fact Constellation Schema.

Example

Star schema vs Snowflake Schema

Which Schema is better Snowflake or Star Schema ?

Depends on the use-case.

If end users want to easily interact with table structure with simpler queries, then Star schema is preferred - but it comes with Storage cost, data redundancy ,etc.

In my opion, Snowflake schema can be preferred until or unless we are making the schema really complex. As building a complex Snowflake schema might help in reducing storage but will hamper the experience of end users (BI , ETL, Data/ML Engg or Data scientists)

We need to balance between Storage and Platform Experience of End Users