Handling Slowly Changing Dimensions (SCD) in Azure Data Warehouse

Introduction

In a data warehouse, dimensions are often subject to gradual changes over time. These changes might include changes to a customer’s name, an employee's role, or a product’s price, etc... To track and manage these changes effectively, data warehouses use Slowly Changing Dimensions (SCDs). Each type of SCD handles data changes in a distinct manner, depending on how you want to store and track the historical changes.

In this article, I will be explaining the five types of SCDs—Type 1, Type 2, Type 3, Type 4, and Type 6—explaining how data looks before and after each type is applied. This will help you understand the concept of SCDs before diving into the actual implementation in Azure Data Warehouse in the next article.

Types of Slowly Changing Dimensions (SCD)

There are various strategies for handling data changes, depending on the business requirements. The five most common types of SCDs are:

  1. SCD Type 1: Overwrite the old data.
  2. SCD Type 2: Add a new row for each change.
  3. SCD Type 3: Add a new column to track changes.
  4. SCD Type 4: Store historical data in a separate table.
  5. SCD Type 6: Hybrid approach combining Type 1, Type 2, and Type 3.

SCD Type 1. Overwrite the Old Data

SCD Type 1 is the simplest form. When a change occurs, the existing record is overwritten with the new value. This method doesn't keep any historical data; the previous value is simply discarded.

Example

Let’s say a customer, John Doe, changes his email address.

Overwrite the old data

In this case, the old email address ([email protected]) is overwritten with the new one ([email protected]). No history is preserved.

SCD Type 2. Add a New Row for Each Change

SCD Type 2 is used when you need to keep a full history of changes. Every time a change happens, a new row is created. This method allows you to track changes over time and see the previous values, making it ideal for tracking historical data.

Example

Let's use John Doe again. His address changes from 123 Main St to 456 Oak St.

Add a new row

In this example

  • The old address (123 Main St) is still available with EndDate set to 2023-06-01, and its CurrentFlag is set to 0 (indicating it’s not the current record).
  • The new address (456 Oak St) is added in a new row, with StartDate set to 2023-06-01 and CurrentFlag set to 1.

This way, the full history is preserved, and you can see when the address change happened.

SCD Type 3. Add a New Column to Track Changes

SCD Type 3 keeps track of only the most recent change. In this method, a new column is added to the table to store the previous value of the dimension. This is useful when you only need to track the current and previous value.

Example

Let’s take John Doe again. His address changes from 123 Main St to 456 Oak St.

Add a new column

Here

  • CurrentAddress stores the most recent address (456 Oak St).
  • PreviousAddress stores the previous address (123 Main St).
  • The table only keeps track of two states: the current and the previous state.

SCD Type 4. Store Historical Data in a Separate Table

SCD Type 4 involves storing historical data in a separate table. This approach keeps the main table focused on the current data while moving the historical records to a historical table. This is useful when the history isn’t frequently queried, but you still need to preserve it for archival purposes.

Example

Again, let’s take John Doe and his address change. We'll split the data into two tables.

Store historical data in separate table

In this case

  • The main table stores the current data (456 Oak St).
  • The historical table stores the old address (123 Main St), preserving the history separately.

SCD Type 6. Hybrid Approach (Combining Type 1, Type 2, and Type 3)

SCD Type 6 is a hybrid approach that combines elements of Type 1, Type 2, and Type 3. This allows you to:

  1. Overwrite values when necessary (Type 1).
  2. Track changes with a full history (Type 2).
  3. Keep track of the most recent and previous values (Type 3).

This approach is more complex and allows flexibility in managing data changes, depending on business requirements.

Example

Let’s again consider John Doe, whose address changes from 123 Main St to 456 Oak St.

Hybrid Approach

In Type 6

CurrentAddress always shows the latest value acts as a Type 1 SCD field and the HistoricalAddress shows the address that was valid between the StartDate and EndDate acts as a Type 3 SCD field. So for the same customer, every record would have the latest address populated in CurrentAddress while HistoricalAddress works exactly like the Address field in the Type 2 SCD example.

Conclusion

In this article, I’ve explained the various types of Slowly Changing Dimensions (SCD)—Types 1, 2, 3, 4, and 6—demonstrating how each type handles data changes differently. Understanding these concepts is critical for managing historical data in a data warehouse.

In the next article, we will dive deeper into how to implement these SCDs in Azure Data Warehouse, including practical examples using SQL and Azure Data Factory.

Up Next
    Ebook Download
    View all
    Learn
    View all