Quick Tip 002 – High Performance Running Totals

Download Source Files: Here

In this post I will illustrate a high performance method for calculating running total by traversing a table index.

I’ve seen many example of people using sub queries and CROSS JOIN’s in order to accomplish the same thing, the only problem with these is the overhead of joining so many rows together.  In this example I’ll illustrate how to traverse an index and create a running total at the same time.

In this example we are going to use the AdventureWorksDW2008.  I am going to create a table the summaries the sales from the FctInternestSales table and then apply thee running total

First create the table that we will use to apply the running total, note that I've created a clustered index on this table, this is very important as later when we apply the running total we’ll us the index to ensure that the running total is created in the correct order.

USE AdventureWorksDW2008

GO



IF OBJECT_ID('RunningTotal',N'U') IS NOT NULL

DROP TABLE RunningTotal;



CREATE TABLE RunningTotal

(

  
CustomerKey INT NOT NULL,

  
OrderDateKey INT NOT NULL,

  
SalesAmount money NOT NULL,

  
RunningTotal money NULL

)

 
CREATE CLUSTERED INDEX idx1 ON RunningTotal (CustomerKey,OrderDateKey);

GO


Next populate the table with summaries data from the FactInternestSales table.  Because a customer can have multiple orders on a day with multiple product we need to aggregate the data up to a single amount for the day.



INSERT INTO RunningTotal

(  
  
CustomerKey,

  
OrderDateKey,

  
SalesAmount

)

SELECT

  
CustomerKey,

  
OrderDateKey,

  
SUM(SalesAmount) AS SalesAmount

FROM

  
FactInternetSales

GROUP BY

  
CustomerKey,

  
OrderDateKey


Now for the cool bit.  T-SQL supports a method by which you can assign a value to a field and update a variable at the same time.



The calculation of the equation is calculated from right to left, so first we identify in the case statement if the customerKey on this record is the same as the customerKey on the previous record.  If they are the same then we take the current @RunningTotal value and add the SalesAmount to this.  This is then updated into the RunningTotal field on the table and then this intern updated the @RunningTotal variable with the new total, ready to be used in the next itertio of the update.  If the CustomerKeys are not the same then we know this is a new customers RunningTotal and so the first entry will be equal to the SalesAmount on that day.



Notice that we also update @PrevCustomerKey which will be used for comparison in the next iteration of the update.



The real trick here is to force the query to use the index, this way you ensure that the records are read and updated in the correct sequence.



DECLARE @RunningTotal AS money

DECLARE @PrevCustomerKey AS INT



SET
@RunningTotal = 0

SET @PrevCustomerKey =


UPDATE RunningTotal 
SET @RunningTotal = RunningTotal = CASE 
                                  
WHEN (CustomerKey = @PrevCustomerKey
                                  
THEN @RunningTotal + SalesAmount

                                  
ELSE SalesAmount

                                  
END
   
@PrevCustomerKey = CustomerKey

FROM 
  
RunningTotal WITH (INDEX(idx1),TABLOCKX);      


SELECT * FROM RunningTotal WITH (INDEX(idx1),TABLOCKX)

Comments

Popular posts from this blog

SSIS - Google Spreadsheets Data Source

Exam 70-443 – Overview