Insights into Slowly Changing Dimensions (SCD) and Implementing SCD

Last Update: November 13, 2024
Slowly Changing Dimensions (SCD) and Implementing SCD
Table of Contents
Contributors
Picture of Vivasoft Data Engineering Team
Vivasoft Data Engineering Team
Tech Stack
0 +
Want to accelerate your software development company?

It has become a prerequisite for companies to develop custom software products to stay competitive.

In today’s world, data is everything. It’s a vital asset across all aspects of life. Historical data especially, is incredibly important. Managing and preserving this historical data is fundamental in effective data warehousing.
Data warehouse is the goldmine of information, keeping it relevant is a big challenge. There will always be some entities whose data will change slowly but often. This is where Slowly Changing Dimensions (SCD) comes in.

Understanding Slowly Changing Dimensions

SCD is a set of techniques to manage historical data over time. It keeps track of the new data and historical data. This enables us to observe and stay informed about changes to an entity’s history. For instance, consider Product entities. The price of a product can fluctuate frequently. If we simply overwrite the product table with each new price, we lose sight of its historical pricing data.That will be invalid not to see the history of it. This is where SCD becomes invaluable. Through SCD strategies, we maintain the relevance of product histories, enabling us to discern when price changes occurred.

Types of SCD

There are multiple types of SCD. We can follow any type depending on our requirements over entities. 

  • Type 1 SCD
  • Type 2 SCD
  • Type 3 SCD
  • Type 4 SCD

Type 1 SCD

The simplest approach among Slowly Changing Dimensions (SCD) is SCD Type 1, where data is overwritten. When a change occurs, the existing record in the dimension is replaced with the new data, erasing any trace of historical information.

This method is suitable when there’s no requirement for historical data. For an example, consider customer addresses.

While delivering goods, the system prioritizes the current address for customer convenience, without the need to retain past addresses in the database.

Type 2 SCD

SCD Type 2 is widely recognized as the most popular and effective among all types. It’s the optimal way to maintain historical data. When a change occurs, a new row is inserted into the target table, while the old row is marked as inactive. If new data is introduced, a new row is added accordingly. This approach ensures comprehensive tracking of entity changes.

Consider products as an example: product prices can fluctuate frequently. In this scenario, whenever a new product is introduced, a new row is appended to the product table. Likewise, when an existing product undergoes a change, the previous row is deactivated, and a new row is added with the updated information.

Type 3 SCD

Type 3 SCD is more common when historical data retention is not that much mandatory. In this type, a new column is introduced for keeping the immediate previous data. This approach is suitable for scenarios where tracking just the most recent change is sufficient. For instance, consider a situation where the product price is updated occasionally, and the system only needs to retain the previous price for comparison purposes.

Type 4 SCD

We will explore the implementation of different types of SCD here. We’ll start with Type 1, Type 2, and Type 3 SCD implementations.

Implementation of Slowly Changing Dimensions

We will explore the implementation of different types of SCD here. We’ll start with Type 1, Type 2, and Type 3 SCD implementations.

Implementing Type 1 SCD

Type 1 Slowly Changing Dimension (SCD) is often preferred when historical data retention isn’t necessary. It revolves around updating the primary table directly, altering its fields accordingly. However, utilizing a stage table can enhance this process.

Let us create an Employee stage table and Employee table.

				
					-- Create the Employee table

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Department VARCHAR(50),
    Salary DECIMAL(10, 2)
);



-- Create the staging table for Employee updates
CREATE TABLE Employee_Staging (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Department VARCHAR(50),
    Salary DECIMAL(10, 2)
);

--  Adding some data to employee table 
INSERT INTO Employee (EmployeeID, FirstName, LastName, Department, Salary)
VALUES 
    (47121, 'Nurul', 'Islam', 'IT', 62000.00),
    (47122, 'Mohammad', 'Nabi', 'HR', 55000.00),
    (47123, 'Jewel', 'Islam', 'Finance', 65000.00);

-- Adding data that we will change in stage table
INSERT INTO Employee_Staging (EmployeeID, FirstName, LastName, Department, Salary)
VALUES 
    (47121, 'Nurul', 'Islam', 'IT', 72000.00)
				
			

After populating our Employee table with dummy data, we’re set to update the salary of the employee with ID 47121.
Now let’s look at our SCD type 1 Query

				
					MERGE INTO Employee AS dst
USING Employee_Staging AS src
ON (dst.EmployeeID = src.EmployeeID)
WHEN MATCHED THEN
    UPDATE SET
        dst.FirstName = src.FirstName,
        dst.LastName = src.LastName,
        dst.Department = src.Department,
        dst.Salary = src.Salary
WHEN NOT MATCHED BY target THEN
    INSERT (EmployeeID, FirstName, LastName, Department, Salary)
    VALUES (src.EmployeeID, src.FirstName, src.LastName, src.Department, src.Salary);
				
			

Now if we run a select query on our Employee table we will see that we have a change in salary for an employee whose ID is 47121.

Type 1 SCD example

Fig: Type 1 SCD example

Implementing Type 2 SCD

Type 2 SCD is one of the most used approaches for keeping historical data.It relies on two essential tables: a stage table and a primary table. Initially, data resides in the stage table as the primary repository. Then, records from the stage table are transferred to the main table for long-term storage.

For this we will use two table

  • Product_Stage   – this will be our stage table.
  • Product  – this will be our primary table.


To implement Type 2 Slowly Changing Dimensions (SCD), we will set up two tables; create a product table and a product stage table. We will also add some dummy data in both of them to test.

				
					-- create product table
CREATE TABLE Product (
    ProductCode VARCHAR(50),
    Name VARCHAR(255),
    Price DECIMAL(10, 2),
    StartedFrom DATE,
    IsActive BIT DEFAULT 1
);

-- create product stage table

CREATE TABLE Product_Stage (
    ProductCode VARCHAR(50),
    Name VARCHAR(255),
    Price DECIMAL(10, 2)
);

-- adding some dummy data

INSERT INTO Product (ProductCode, Name, Price, StartedFrom) VALUES
('MD101', 'iPhone 12', 999.99, '2024-01-01'),
('MD102', 'Samsung Galaxy S21', 899.99, '2024-01-01'),
('MD103', 'Google Pixel 6', 799.99, '2024-01-01'),
('MD104', 'OnePlus 10', 699.99, '2024-01-01'),
('MD105', 'Xiaomi Redmi Note 11', 599.99, '2024-01-01');


-- adding data that will be changed or added in product table
INSERT INTO Product_Stage (ProductCode, Name, Price) VALUES
('MD104', 'OnePlus 10', 599.99),
('MD105', 'Xiaomi Redmi Note 11', 399.99),
('MD108', 'OnePlus 10 PRO', 799.99),
('MD109', 'iPhone 14 Pro Max', 1699.99);

Select * From Product
				
			

This will give us below data in our product table. Currently we have 5 products here.

Type 2 SCD example

Fig : Type 2 SCD example

Now for adding new products and making changes on any existing products we will strictly follow the stage table. We will always add in the stage table first and then we will run SCD to update that in our main table.

In the stage table, we have added two new products;product code MD108 and MD109. Also we have changed the price of existing products;product code MD104 and MD105.

Now lets look at our below Type 2 SCD query

				
					INSERT INTO Product(ProductCode,Name,Price,StartedFrom,IsActive
)
SELECT ProductCode,Name,Price,StartedFrom,IsActive
FROM(
MERGE INTO Product AS Dst
    USING
    (SELECT DISTINCT ProductCode, Name, Price FROM Product_Stage) AS Src
    ON Dst.ProductCode = Src.ProductCode
       And Dst.IsActive = 1
    WHEN MATCHED and dst.Name != Src.Name OR Dst.Price != Src.Price  THEN
        UPDATE SET Dst.IsActive = 0
    WHEN NOT MATCHED BY TARGET THEN
        INSERT
        (
            ProductCode,
            Name,
            Price,
            StartedFrom,
            IsActive
        )
        VALUES
        (Src.ProductCode, Src.Name, Src.Price, GETDATE(), 1)
    OUTPUT $ACTION,
           Src.ProductCode,
           Src.Name,
           Src.Price,
           GETDATE(),
           1) AS CHANGES([ACTION],ProductCode, Name, Price, StartedFrom, IsActive)
WHERE [ACTION] = 'UPDATE';
				
			

Now if we make a select statement we can see the below output. We can see we have added two new products here; MD108 and MD109. Also we have set inactive to the products; MD104 and MD105 first and then we have added them as new.

Type 2 SCD result

Fig:Type 2 SCD result

Implementing Type 3 SCD

For Type 3 Slowly Changing Dimension (SCD), we’ll focus on product prices, assuming they don’t fluctuate frequently and there’s no necessity to track every price change. We’ll set up a main table along with a stage table to facilitate this process.

				
					-- create product table
CREATE TABLE Product (
    ProductID INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    Price DECIMAL(10,2),
    PriceStartedDate DATETIME,
    PreviousPrice DECIMAL(10,2),
    PreviousPriceDate DATETIME

);

-- create product stage table
CREATE TABLE Product_Stage (
    ProductID INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    Price DECIMAL(10,2)
);

-- Add dummy data to the Product table
INSERT INTO Product (ProductID, ProductName, Price, PriceStartedDate, PreviousPrice, PreviousPriceDate)
VALUES
    (2311, 'Laptop', 1500, '2024-05-01', 0, NULL),
    (2312, 'Smartphone', 1000, '2024-05-01', 0, NULL),
    (2313, 'Tablet', 500, '2024-05-01', 0, NULL);

-- Add dummy data to the Product_Stage table
INSERT INTO Product_Stage (ProductID, ProductName, Price)
VALUES
    (2312, 'Smartphone', 899)
				
			

We have added some dummy data in both of the tables. We have kept two separated columns. Price of the product and Previous Price of the product.
Let’s take a look on our Type 3 SCD query

				
					MERGE INTO Product AS dst
USING Product_Stage AS src
ON dst.ProductID = src.ProductID
WHEN MATCHED THEN
    UPDATE SET
        dst.ProductName = src.ProductName,
        dst.PreviousPrice = dst.Price,
        dst.PreviousPriceDate = dst.PriceStartedDate,
        dst.Price = src.Price,
        dst.PriceStartedDate = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
    INSERT (ProductID, ProductName, Price, PriceStartedDate, PreviousPrice, PreviousPriceDate)
    VALUES (src.ProductID, src.ProductName, src.Price, GETDATE(), 0, NULL);
				
			

Now if we run the select statement on the product table, we will see we have update the smartphone price and we have updated the previous price of that product.

Type 3 SCD example

Fig: Type 3 SCD example

Implementing Type 4 SCD

For Type 4 SCD we will add an extra table to keep history of the changes.

				
					-- create product table
CREATE TABLE Product (
    ProductID INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    ProductPrice DECIMAL(10, 2),
    EffectiveDate DATETIME DEFAULT GETDATE(),
    ExpiryDate DATETIME NULL
);


-- create product history table
CREATE TABLE Product_History (
    ProductID INT,
    ProductName NVARCHAR(100),
    ProductPrice DECIMAL(10, 2),
    EffectiveDate DATETIME,
    ExpiryDate DATETIME
);

-- create product stage table
CREATE TABLE Product_Stage (
    ProductID INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    ProductPrice DECIMAL(10, 2)
);


-- adding dummy data to product table
INSERT INTO Product (ProductID, ProductName, ProductPrice, EffectiveDate)
VALUES 
    (1, 'iPhone 13', 999.99, '2024-01-01'),
    (2, 'Samsung Galaxy S21', 799.99, '2024-01-01'),
    (3, 'Google Pixel 6', 599.99, '2024-01-01');

-- adding dummy data to product stage table
INSERT INTO Product_Stage (ProductID, ProductName, ProductPrice)
VALUES 
    (1, 'iPhone 13', 949.99), 
    (4, 'OnePlus 9', 729.99); 
				
			

So as we have created the tables and also we have added dummy data in the tables, we will now run the SCD query. Currently we have no data in the history table. But as soon as we run our SCD query there will be data in our history table.

Now let’s look at our SCD query

				
					INSERT INTO Product_History([ProductID], [ProductName], [ProductPrice], [EffectiveDate])
SELECT [ProductID], [ProductName], [ProductPrice], [EffectiveDate]
FROM(

MERGE INTO Product AS Dst
    USING
    (SELECT DISTINCT [ProductID], [ProductName], [ProductPrice] FROM Product_Stage) AS Src
    ON Dst.[ProductID] = Src.[ProductID]
    WHEN MATCHED and dst.[ProductName] != Src.[ProductName] OR Dst.[ProductPrice] != Src.[ProductPrice]  THEN
        UPDATE SET 
		Dst.ProductName = Src.ProductName,
        Dst.ProductPrice = Src.ProductPrice
    WHEN NOT MATCHED BY TARGET THEN
        INSERT
        (
		[ProductID], [ProductName], [ProductPrice], [EffectiveDate]
        )
        VALUES
        (Src.[ProductID], Src.[ProductName], Src.[ProductPrice], GETDATE())
    OUTPUT $ACTION,
           Src.[ProductID],
           Src.[ProductName],
           Src.[ProductPrice],
           GETDATE()
         ) AS CHANGES([ACTION],[ProductID], [ProductName], [ProductPrice], [EffectiveDate])
WHERE [ACTION] = 'UPDATE'  OR [ACTION] = 'INSERT';
				
			

This query will first update or insert in the product table. If there is a change on the existing product table then we will update the product otherwise we will insert into the product table. After that we will insert into the history table according to the action type.

Query Insights

Merge Into, this specifies the target table where the merge operation will be performed. It allows us to combine the update and insert operations into a single statement, enhancing readability and performance.
When Matched, by this we compare the source values with destinations. If there are any changes, then we update the changes. 

When Not Matched, we are simply adding new values to our destinations.

End Note

Slowly Changing dimensions (SCD) offers a great way to manage evolving data over time. By implementing needed SCD techniques organizations can ensure data accuracy and relevance, empowering informed decision-making for sustained success in a dynamic world.

The choice of which type of SCD to implement depends on the specific requirements of the organization.

Potential Developer
Tech Stack
0 +
Accelerate Your Software Development Potential with Us
With our innovative solutions and dedicated expertise, success is a guaranteed outcome. Let's accelerate together towards your goals and beyond.
Blogs You May Love

Don’t let understaffing hold you back. Maximize your team’s performance and reach your business goals with the best IT Staff Augmentation