SCD Using MERGE and Table Data Types in SSIS - 2 of 2

Download Source Files: Here

As promised in my first post, I’ve now put together an new example of how to use the the SQL MERGE statement, along with new TABLE Data Type to upsert a dimension table in a data warehouse completely within data flow.  In the previous example I used the control flow along with and recordset object to illustrate how to pass the data into a stored proc, however the call to the stored procedure was in the control flow. In this post I will use a script destination component to call the stored proc.

Creating the sample database environment

The Customers dimension for a data warehouse is a typical candidate for a type 2 slowly changing dimension. For this example I will use the Customers table in the Northwind database as my source data set. I’ve created my own sample database where I’ve created a DimCustomer table that will be the destination for our data, I will also create the table data type and stored proc on this database.

Lets get started, first create the sample database:

IF DB_ID('Sample') IS NULL

CREATE DATABASE Sample;

GO


Now create the DimCustomer dimension table, this will be the target for our records.



IF OBJECT_ID('DimCustomer',N'U') IS NOT NULL

DROP TABLE DimCustomer;



CREATE TABLE DimCustomer

(

  
CustomerID VARCHAR(5),

  
ContactName VARCHAR(40),

  
ContactTitle VARCHAR(30),

  
Address VARCHAR(60),

  
City VARCHAR(15),

  
Region VARCHAR(15),

  
PostalCode VARCHAR(10),

  
Country VARCHAR(15),

  
Phone VARCHAR(24),

  
Fax VARCHAR(24),

  
CurrentRecord BIT,

  
ValidFromDate DATE,

  
ValidToDate DATE

);

GO


You will notice that this table has a valid from and to date that will be used to track the period that the data is valid, there is also a flag CurrentRecord that will indicate the current record for each customer.



 



Now we need to create the table data type that will be used to pass into our stored procedure



IF TYPE_ID('DimCustomer') IS NOT NULL

DROP TYPE dbo.DimCustomer



CREATE TYPE dbo.DimCustomer AS TABLE

(

  
CustomerID VARCHAR(5),

  
ContactName VARCHAR(40),

  
ContactTitle VARCHAR(30),

  
Address VARCHAR(60),

  
City VARCHAR(15),

  
Region VARCHAR(15),

  
PostalCode VARCHAR(10),

  
Country VARCHAR(15),

  
Phone VARCHAR(24),

  
Fax VARCHAR(24),

  
ValidFromDate DATE

);

GO


Notice that I have left out the CurrentRecord, ValidToDate attributes from the table data type, these fields will be managed within the stored procedure itself.



Now to create the actual stored procedure, and thanks to Warren Thornwaite of the Kimball Group who posted a really great article on using the MERGE statement for SCD, which can be found here:



http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf



IF OBJECT_ID('sp_SCDCustomer',N'P') IS NOT NULL

DROP PROC sp_SCDCustomer

GO



CREATE PROC dbo.sp_SCDCustomer(@NewCustomer AS dbo.DimCustomer READONLY)

AS

BEGIN

INSERT INTO
DimCustomer  --These are the new updated records

  
(       
      
CustomerID,

      
ContactName,

      
ContactTitle,

      
Address,

      
City,

      
Region,

      
PostalCode,

      
Country,

      
Phone,

      
Fax,

      
CurrentRecord,

      
ValidFromDate,

      
ValidToDate

  
)

SELECT

      
CustomerID,

      
ContactName,

      
ContactTitle,

      
Address,

      
City,

      
Region,

      
PostalCode,

      
Country,

      
Phone,

      
Fax,

      
CurrentRecord,

      
ValidFromDate,

      
ValidToDate

FROM

(

  
MERGE DimCustomer AS ExistingCustomer

   USING
@NewCustomer AS NewCustomer

  
ON ExistingCustomer.CustomerID = NewCustomer.CustomerID

  
WHEN NOT MATCHED 
  
THEN --Brand New Customers

      
INSERT 
      
(       
          
CustomerID,

          
ContactName,

          
ContactTitle,

          
Address,

          
City,

          
Region,

          
PostalCode,

          
Country,

          
Phone,

          
Fax,

          
CurrentRecord,

          
ValidFromDate,

          
ValidToDate

      
)

      
VALUES

      
(

          
NewCustomer.CustomerID,

          
NewCustomer.ContactName,

          
NewCustomer.ContactTitle,

          
NewCustomer.Address,

          
NewCustomer.City,

          
NewCustomer.Region,

          
NewCustomer.PostalCode,

          
NewCustomer.Country,

          
NewCustomer.Phone,

          
NewCustomer.Fax,

          
1,                          -- CurentRecord

          
NewCustomer.ValidFromDate-- ValidFromDate

          
'2050-01-01'                -- ValidToDate

      
)

  
WHEN MATCHED -- Expired records

      
AND ExistingCustomer.CurrentRecord =
      
AND ExistingCustomer.CustomerID = NewCustomer.CustomerID

      
AND(    ExistingCustomer.ContactName != NewCustomer.ContactName

          
OR  ExistingCustomer.ContactTitle != NewCustomer.ContactTitle

          
OR  ExistingCustomer.Address != NewCustomer.Address

          
OR  ExistingCustomer.City != NewCustomer.City

          
OR  ExistingCustomer.Region != NewCustomer.Region

          
OR  ExistingCustomer.PostalCode != NewCustomer.PostalCode

          
OR  ExistingCustomer.Country != NewCustomer.Country

          
OR  ExistingCustomer.Phone != NewCustomer.Phone

          
OR  ExistingCustomer.Fax != NewCustomer.Fax   
          

  
THEN

       UPDATE

       SET
CurrentRecord = 0,

          
ValidToDate = CAST(DATEADD(d,-1,NewCustomer.ValidFromDate) AS DATE)

  
OUTPUT --Newly updated records, used by the outer insert statement

      
$Action Action_Out,

      
NewCustomer.CustomerID,

      
NewCustomer.ContactName,

      
NewCustomer.ContactTitle,

      
NewCustomer.Address,

      
NewCustomer.City,

      
NewCustomer.Region,

      
NewCustomer.PostalCode,

      
NewCustomer.Country,

      
NewCustomer.Phone,

      
NewCustomer.Fax,

      
1 AS CurrentRecord,

      
NewCustomer.ValidFromDate AS ValidFromDate,

      
CAST('2050-01-01' AS DATE) AS ValidToDate

) AS MERGE_OUT

WHERE MERGE_OUT.Action_Out = 'UPDATE';

END


Now you have created all the plumbing on the SQL Server, we can turn our attention to building the SSIS package that will call this.



Creating the SSIS Package



First we are going to have to create two connections, one to the Northwind database called Northwind.OLEDB and the other to the Sample database that we just created called Sample.NET.   You need to ensure that the Sample.NET connection string uses the .Net Providers\SqlClient Data Provider, as we will be using this connection string inside a customer script component.



image



image



Now that we’ve created the connections we need to declare one package level variable dtDate of DATETIME data type, which will store a date that represent the run date of the ETL process.  This will be used as the validFromDate on our transformed records.



image



Drag a dataflow component onto the design surface.  Rename this DFT – Populate sp_SCDCustomer



image



In the Data flow we are going to add an OLE DB Source that will pull the data from the Northwind database. A derived column that will use the dtDate variable to assign a ValidFromDate and a Script Component Destination which will call the stored procedure with the help of some custom scripting.



image



In the OLE DB Source use the following SQL to select data from the Northwind database:



SELECT  
   Customers.CustomerID,

  
Customers.ContactName,

  
Customers.ContactTitle,

  
Customers.Address,

  
Customers.City,

  
Customers.Region,

  
Customers.PostalCode,

  
Customers.Country,

  
Customers.Phone,

  
Customers.Fax

FROM  
  
Customers WITH (NOLOCK)


image



In the Derived Column component drag the dtDate variable into the expression column, ensure to rename the column ValidFromDate



image



Finally add the Script Component, select Destination.



image



Change the scripting language to VB.



image



Select all the available input columns



image



Add a connection manager to the component, call the connection “Sample” and associate it with the Sample.NET connection that you created earlier.



image



Go back to the Script tab, select Edit Script.  Ensure that you have chosen VB as your scripting language.



image



Paste the following code into the script editing window:



Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

   
Dim dt As New DataTable

   
Public Overrides Sub PreExecute()
       
MyBase.PreExecute()

       
With dt
            .Columns.Add
("CustomerID", Type.GetType("System.String"))
           
.Columns.Add("ContactName", Type.GetType("System.String"))
           
.Columns.Add("ContactTitle", Type.GetType("System.String"))
           
.Columns.Add("Address", Type.GetType("System.String"))
           
.Columns.Add("City", Type.GetType("System.String"))
           
.Columns.Add("Region", Type.GetType("System.String"))
           
.Columns.Add("PostalCode", Type.GetType("System.String"))
           
.Columns.Add("Country", Type.GetType("System.String"))
           
.Columns.Add("Phone", Type.GetType("System.String"))
           
.Columns.Add("Fax", Type.GetType("System.String"))
           
.Columns.Add("ValidFromDate", Type.GetType("System.DateTime"))
       
End With

    End Sub

    Public
Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

       
Dim xRow As DataRow

       
Do
            With
dt
                xRow
= .NewRow
                xRow
("Address") = Row.Address
                xRow
("City") = Row.City
                xRow
("ContactName") = Row.ContactName
                xRow
("ContactTitle") = Row.ContactTitle
                xRow
("Country") = Row.Country
                xRow
("CustomerID") = Row.CustomerID
                xRow
("Fax") = Row.Fax
                xRow
("PostalCode") = Row.PostalCode
                xRow
("Region") = Row.Region
                xRow
("ValidFromDate") = Row.ValidFromDate
                .Rows.Add
(xRow)
           
End With
        Loop While
(Row.NextRow = True)

   
End Sub

    Public
Overrides Sub PostExecute()
       
MyBase.PostExecute()

       
Dim con As New SqlConnection
       
Dim cmd As New SqlCommand
       
Dim ad As New Data.OleDb.OleDbDataAdapter

       
'Ensure this is an ADO.NET connection
       
con = DirectCast(Connections.Sample.AcquireConnection(Nothing), SqlClient.SqlConnection)
       
cmd = con.CreateCommand()
       
cmd.CommandType = System.Data.CommandType.StoredProcedure
        cmd.CommandText
= "dbo.sp_SCDCustomer"
       
cmd.Parameters.AddWithValue("@NewCustomer", dt)
       
cmd.ExecuteNonQuery()

   
End Sub
End Class


Some important point to note about the code




  1. Adds references to System.Data.SqlClient


  2. In the pre execute event we create a Data Table variable that has the same schema as the SQL Table Data Type.  Its important that the columns are declared in the same order as those in the SQL Data Type and that the columns have the correct data types.


  3. In the  Input0_ProcessInputRow event, for each row in the buffer we copy the data from the buffer to a DataRow object and add this to the Data Table variable.


  4. After all the rows in all the buffers have been copied the PostExecute event will fire.  In this event we use the sample connection to create a SqlConnection  object and call the sp_SCDCustomer stored proc passing in the Data Table as the table data type.



You should now be able to run the package.



I hope you found this and the previous article useful. Please drop me a line if you have any suggestions \ improvements.

Comments

  1. Thanx for the great posts, I have never used the Script Component Destination before, is there a great performance benefit from using this instead of the Recordset Destination? The Script Component looks like more mapping.

    thnx again, and great post.

    ReplyDelete
  2. Error 1 'Sample' is not a member of 'SC_3bb4f72cd9c540b399ee23e2bfe697d0.vbproj.Connections'. C:\Users\rc203b\AppData\Local\Temp\SSIS\7f2105d2aa4542bcb750393f26b10489\main.vb 69 26 sc_3bb4f72cd9c540b399ee23e2bfe697d0

    I'ver tried this in two environments!

    Thanks for this script by the way. I can really use this in my SSIS packges if I can get the connection working.

    ReplyDelete
  3. Just FYI, I did use an ADO.NET connector!

    ReplyDelete
  4. I am not sure about whether anybody here is interested for a third-party product but I genuinely think you should look for SSIS Upsert destination which is used to merge/synchronize data from source to target.

    ReplyDelete
  5. Thank you so much for providing information about SSIS and other such aspects of IT which helps in solving many complex IT problems.
    SSIS Upsert

    ReplyDelete

Post a Comment

Popular posts from this blog

SSIS - Google Spreadsheets Data Source

Exam 70-443 – Overview