SCD Using MERGE and Table Data Types in SSIS - 1 of 2

Download Source Files: Here

In the first of what hope to be many posts. I’ve taken my first step into really utilising some of the many enhancements delivered in the most recent release of Microsoft SQL Server.

Two of the most eagerly awaited enhancements to come out from Microsoft have been the advent of table data types and the new MERGE syntax.

Table data types now allow SQL developers to pass table variables directly into stored procedures as a parameter, in the past ingenious way had been devised to do this with the use of temp tables or with XML, but table data types now make these process redundant and offer great performance improvements.

For the data warehouse developers out there the MERGE statement now allows INSERT, UPDATE and DELETE statements to be applied to a table in a single pass, simplify the SCD (Slowly Changing Dimension) process.

The Challenge

The challenge I set my self was to populate a table variable with data representing new dimensional records, pass through the variable into a stored procedure that would then use the table variable to compare against the current records in a dimension using the new MERGE syntax and apply the appropriate upserts (updates and inserts). Going beyond that I wanted to be able to call this stored procedure directly within an SSIS package without writing the new records out to a temporary staging table.

Creating the sample database environment

The Customers dimension for a data warehouse is a typical candidate for a type 2 slowly changing dimension. For this example I will use the Customers table in the Northwind database as my source data set. I’ve created my own sample database where I’ve created a DimCustomer table that will be the destination for our data, I will also create the table data type and stored proc on this database.

Lets get started, first create the sample database:

IF DB_ID('Sample') IS NULL

CREATE DATABASE Sample;

GO


Now create the DimCustomer dimension table, this will be the target for our records.



IF OBJECT_ID('DimCustomer',N'U') IS NOT NULL

DROP TABLE DimCustomer;



CREATE TABLE DimCustomer

(

  
CustomerID VARCHAR(5),

  
ContactName VARCHAR(40),

  
ContactTitle VARCHAR(30),

  
Address VARCHAR(60),

  
City VARCHAR(15),

  
Region VARCHAR(15),

  
PostalCode VARCHAR(10),

  
Country VARCHAR(15),

  
Phone VARCHAR(24),

  
Fax VARCHAR(24),

  
CurrentRecord BIT,

  
ValidFromDate DATE,

  
ValidToDate DATE

);

GO


You will notice that this table has a valid from and to date that will be used to track the period that the data is valid, there is also a flag CurrentRecord that will indicate the current record for each customer.



 



Now we need to create the table data type that will be used to pass into our stored procedure



IF TYPE_ID('DimCustomer') IS NOT NULL

DROP TYPE dbo.DimCustomer



CREATE TYPE dbo.DimCustomer AS TABLE

(

  
CustomerID VARCHAR(5),

  
ContactName VARCHAR(40),

  
ContactTitle VARCHAR(30),

  
Address VARCHAR(60),

  
City VARCHAR(15),

  
Region VARCHAR(15),

  
PostalCode VARCHAR(10),

  
Country VARCHAR(15),

  
Phone VARCHAR(24),

  
Fax VARCHAR(24),

  
ValidFromDate DATE

);

GO


Notice that I have left out the CurrentRecord, ValidToDate attributes from the table data type, these fields will be managed within the stored procedure itself.



Now to create the actual stored procedure, and thanks to Warren Thornwaite of the Kimball Group who posted a really great article on using the MERGE statement for SCD, which can be found here:



http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf



IF OBJECT_ID('sp_SCDCustomer',N'P') IS NOT NULL

DROP PROC sp_SCDCustomer

GO



CREATE PROC dbo.sp_SCDCustomer(@NewCustomer AS dbo.DimCustomer READONLY)

AS

BEGIN

INSERT INTO
DimCustomer  --These are the new updated records

  
(       
      
CustomerID,

      
ContactName,

      
ContactTitle,

      
Address,

      
City,

      
Region,

      
PostalCode,

      
Country,

      
Phone,

      
Fax,

      
CurrentRecord,

      
ValidFromDate,

      
ValidToDate

  
)

SELECT

      
CustomerID,

      
ContactName,

      
ContactTitle,

      
Address,

      
City,

      
Region,

      
PostalCode,

      
Country,

      
Phone,

      
Fax,

      
CurrentRecord,

      
ValidFromDate,

      
ValidToDate

FROM

(

  
MERGE DimCustomer AS ExistingCustomer

   USING
@NewCustomer AS NewCustomer

  
ON ExistingCustomer.CustomerID = NewCustomer.CustomerID

  
WHEN NOT MATCHED 
  
THEN --Brand New Customers

      
INSERT 
      
(       
          
CustomerID,

          
ContactName,

          
ContactTitle,

          
Address,

          
City,

          
Region,

          
PostalCode,

          
Country,

          
Phone,

          
Fax,

          
CurrentRecord,

          
ValidFromDate,

          
ValidToDate

      
)

      
VALUES

      
(

          
NewCustomer.CustomerID,

          
NewCustomer.ContactName,

          
NewCustomer.ContactTitle,

          
NewCustomer.Address,

          
NewCustomer.City,

          
NewCustomer.Region,

          
NewCustomer.PostalCode,

          
NewCustomer.Country,

          
NewCustomer.Phone,

          
NewCustomer.Fax,

          
1,                          -- CurentRecord

          
NewCustomer.ValidFromDate-- ValidFromDate

          
'2050-01-01'                -- ValidToDate

      
)

  
WHEN MATCHED -- Expired records

      
AND ExistingCustomer.CurrentRecord =
      
AND ExistingCustomer.CustomerID = NewCustomer.CustomerID

      
AND(    ExistingCustomer.ContactName != NewCustomer.ContactName

          
OR  ExistingCustomer.ContactTitle != NewCustomer.ContactTitle

          
OR  ExistingCustomer.Address != NewCustomer.Address

          
OR  ExistingCustomer.City != NewCustomer.City

          
OR  ExistingCustomer.Region != NewCustomer.Region

          
OR  ExistingCustomer.PostalCode != NewCustomer.PostalCode

          
OR  ExistingCustomer.Country != NewCustomer.Country

          
OR  ExistingCustomer.Phone != NewCustomer.Phone

          
OR  ExistingCustomer.Fax != NewCustomer.Fax   
          

  
THEN

       UPDATE

       SET
CurrentRecord = 0,

          
ValidToDate = CAST(DATEADD(d,-1,NewCustomer.ValidFromDate) AS DATE)

  
OUTPUT --Newly updated records, used by the outer insert statement

      
$Action Action_Out,

      
NewCustomer.CustomerID,

      
NewCustomer.ContactName,

      
NewCustomer.ContactTitle,

      
NewCustomer.Address,

      
NewCustomer.City,

      
NewCustomer.Region,

      
NewCustomer.PostalCode,

      
NewCustomer.Country,

      
NewCustomer.Phone,

      
NewCustomer.Fax,

      
1 AS CurrentRecord,

      
NewCustomer.ValidFromDate AS ValidFromDate,

      
CAST('2050-01-01' AS DATE) AS ValidToDate

) AS MERGE_OUT

WHERE MERGE_OUT.Action_Out = 'UPDATE';

END


Now you have created all the plumbing on the SQL Server, we can turn our attention to building the SSIS package that will call this.



Creating the SSIS Package



First we are going to have to create to connections, one to the northwind database called Northwind.OLEDB and the other to the Sample database that we just created called Sample.NET. Its important you use .Net Provider for this second connection otherwise the script task will fail later.



clip_image002



Now that we’ve created the connection we need to declare two package level variable



1) objTable an OBJECT data type that will store our transformed record from the Northwind database,



2) dtDate of DATETIME data type, which wil lstore a data that represent the run date of the EL process.  This will be used as the validFromDate on our transformed records.



clip_image004



Drag a dataflow component and a script component onto the design surface.



clip_image006



In the Data flow we are going to add an OLE DB Source that will pull the data from the Northwind database. A derived column that will use the dtDate variable to assign a ValidFromDate and a Recordset Destination that will use the ojbTable variable to hold all the data to be passed into the stored procedure.



clip_image008



In the OLE DB Source use the following SQL to select data from the Northwind database:



SELECT  
   Customers.CustomerID,

  
Customers.ContactName,

  
Customers.ContactTitle,

  
Customers.Address,

  
Customers.City,

  
Customers.Region,

  
Customers.PostalCode,

  
Customers.Country,

  
Customers.Phone,

  
Customers.Fax

FROM  
  
Customers WITH (NOLOCK)


clip_image010



In the Derived Column component drag the dtDate variable into the expression column, ensure to rename the column ValidFromDate



clip_image012



Finally in the RecordSet Destination select all the columns that are available in the pipeline.



clip_image014



Now go back to the Control flow and the Script Task, ensure that you map the objTable variable as a ReadOnlyVaraible. Now enter into the script editor.



clip_image016



In the sub main past the following code. Having done this the package is now ready to run.



Public Sub Main()  


  
Dim conn As New SqlConnection 
  
Dim cmd As New SqlCommand 
  
Dim dt As New DataTable 
  
Dim ad As New Data.OleDb.OleDbDataAdapter 
   
  
'Ensure this is an ADO.NET connection 


  
conn = DirectCast(Dts.Connections("Sample.NET").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection
  
ad.Fill(dt, Dts.Variables("User::objTable").Value)



  
cmd = conn.CreateCommand() 
  
cmd.CommandType = System.Data.CommandType.StoredProcedure 
   cmd.CommandText
= "dbo.sp_SCDCustomer" 
  
cmd.Parameters.AddWithValue("@NewCustomer", dt
  
cmd.ExecuteNonQuery()



  
Dts.TaskResult = ScriptResults.Success 


End Sub


Execute the package, you should now see all the data populated in the DimCustomer table.



Make some changes to the data back in the Northwind database and increment the value in the dtDate variable back in the SSIS package, and now re-run the package. You should find that the changed records are inserted into the DimCustomer table and the old records ValidToDate and CurrentRecord flag have been updated.



In a future post I’m going to show you how to do the upsert directly in the data flow, without using the recordset object.

Comments

Post a Comment

Popular posts from this blog

SSIS - Google Spreadsheets Data Source

Exam 70-443 – Overview