Introduction
In a data warehouse, dimensions are often subject to gradual changes over time. These changes might include changes to a customer’s name, an employee's role, or a product’s price, etc... To track and manage these changes effectively, data warehouses use Slowly Changing Dimensions (SCDs). Each type of SCD handles data changes in a distinct manner, depending on how you want to store and track the historical changes.
In this article, I will be explaining the five types of SCDs—Type 1, Type 2, Type 3, Type 4, and Type 6—explaining how data looks before and after each type is applied. This will help you understand the concept of SCDs before diving into the actual implementation in Azure Data Warehouse in the next article.
Types of Slowly Changing Dimensions (SCD)
There are various strategies for handling data changes, depending on the business requirements. The five most common types of SCDs are:
- SCD Type 1: Overwrite the old data.
- SCD Type 2: Add a new row for each change.
- SCD Type 3: Add a new column to track changes.
- SCD Type 4: Store historical data in a separate table.
- SCD Type 6: Hybrid approach combining Type 1, Type 2, and Type 3.
SCD Type 1. Overwrite the Old Data
SCD Type 1 is the simplest form. When a change occurs, the existing record is overwritten with the new value. This method doesn't keep any historical data; the previous value is simply discarded.
Example
Let’s say a customer, John Doe, changes his email address.
![Overwrite the old data]()
In this case, the old email address ([email protected]) is overwritten with the new one ([email protected]). No history is preserved.
SCD Type 2. Add a New Row for Each Change
SCD Type 2 is used when you need to keep a full history of changes. Every time a change happens, a new row is created. This method allows you to track changes over time and see the previous values, making it ideal for tracking historical data.
Example
Let's use John Doe again. His address changes from 123 Main St to 456 Oak St.
![Add a new row]()
In this example
- The old address (123 Main St) is still available with EndDate set to 2023-06-01, and its CurrentFlag is set to 0 (indicating it’s not the current record).
- The new address (456 Oak St) is added in a new row, with StartDate set to 2023-06-01 and CurrentFlag set to 1.
This way, the full history is preserved, and you can see when the address change happened.
SCD Type 3. Add a New Column to Track Changes
SCD Type 3 keeps track of only the most recent change. In this method, a new column is added to the table to store the previous value of the dimension. This is useful when you only need to track the current and previous value.
Example
Let’s take John Doe again. His address changes from 123 Main St to 456 Oak St.
![Add a new column]()
Here
- CurrentAddress stores the most recent address (456 Oak St).
- PreviousAddress stores the previous address (123 Main St).
- The table only keeps track of two states: the current and the previous state.
SCD Type 4. Store Historical Data in a Separate Table
SCD Type 4 involves storing historical data in a separate table. This approach keeps the main table focused on the current data while moving the historical records to a historical table. This is useful when the history isn’t frequently queried, but you still need to preserve it for archival purposes.
Example
Again, let’s take John Doe and his address change. We'll split the data into two tables.
![Store historical data in separate table]()
In this case
- The main table stores the current data (456 Oak St).
- The historical table stores the old address (123 Main St), preserving the history separately.
SCD Type 6. Hybrid Approach (Combining Type 1, Type 2, and Type 3)
SCD Type 6 is a hybrid approach that combines elements of Type 1, Type 2, and Type 3. This allows you to:
- Overwrite values when necessary (Type 1).
- Track changes with a full history (Type 2).
- Keep track of the most recent and previous values (Type 3).
This approach is more complex and allows flexibility in managing data changes, depending on business requirements.
Example
Let’s again consider John Doe, whose address changes from 123 Main St to 456 Oak St.
![Hybrid Approach]()
In Type 6
CurrentAddress always shows the latest value acts as a Type 1 SCD field and the HistoricalAddress shows the address that was valid between the StartDate and EndDate acts as a Type 3 SCD field. So for the same customer, every record would have the latest address populated in CurrentAddress while HistoricalAddress works exactly like the Address field in the Type 2 SCD example.
Conclusion
In this article, I’ve explained the various types of Slowly Changing Dimensions (SCD)—Types 1, 2, 3, 4, and 6—demonstrating how each type handles data changes differently. Understanding these concepts is critical for managing historical data in a data warehouse.
In the next article, we will dive deeper into how to implement these SCDs in Azure Data Warehouse, including practical examples using SQL and Azure Data Factory.