Quick Tip 002 – High Performance Running Totals
Download Source Files: Here
In this post I will illustrate a high performance method for calculating running total by traversing a table index.
I’ve seen many example of people using sub queries and CROSS JOIN’s in order to accomplish the same thing, the only problem with these is the overhead of joining so many rows together. In this example I’ll illustrate how to traverse an index and create a running total at the same time.
In this example we are going to use the AdventureWorksDW2008. I am going to create a table the summaries the sales from the FctInternestSales table and then apply thee running total
First create the table that we will use to apply the running total, note that I've created a clustered index on this table, this is very important as later when we apply the running total we’ll us the index to ensure that the running total is created in the correct order.
USE AdventureWorksDW2008
GO
IF OBJECT_ID('RunningTotal',N'U') IS NOT NULL
DROP TABLE RunningTotal;
CREATE TABLE RunningTotal
(
CustomerKey INT NOT NULL,
OrderDateKey INT NOT NULL,
SalesAmount money NOT NULL,
RunningTotal money NULL
)
CREATE CLUSTERED INDEX idx1 ON RunningTotal (CustomerKey,OrderDateKey);
GO
Next populate the table with summaries data from the FactInternestSales table. Because a customer can have multiple orders on a day with multiple product we need to aggregate the data up to a single amount for the day.
INSERT INTO RunningTotal
(
CustomerKey,
OrderDateKey,
SalesAmount
)
SELECT
CustomerKey,
OrderDateKey,
SUM(SalesAmount) AS SalesAmount
FROM
FactInternetSales
GROUP BY
CustomerKey,
OrderDateKey
Now for the cool bit. T-SQL supports a method by which you can assign a value to a field and update a variable at the same time.
The calculation of the equation is calculated from right to left, so first we identify in the case statement if the customerKey on this record is the same as the customerKey on the previous record. If they are the same then we take the current @RunningTotal value and add the SalesAmount to this. This is then updated into the RunningTotal field on the table and then this intern updated the @RunningTotal variable with the new total, ready to be used in the next itertio of the update. If the CustomerKeys are not the same then we know this is a new customers RunningTotal and so the first entry will be equal to the SalesAmount on that day.
Notice that we also update @PrevCustomerKey which will be used for comparison in the next iteration of the update.
The real trick here is to force the query to use the index, this way you ensure that the records are read and updated in the correct sequence.
DECLARE @RunningTotal AS money
DECLARE @PrevCustomerKey AS INT
SET @RunningTotal = 0
SET @PrevCustomerKey = 0
UPDATE RunningTotal
SET @RunningTotal = RunningTotal = CASE
WHEN (CustomerKey = @PrevCustomerKey)
THEN @RunningTotal + SalesAmount
ELSE SalesAmount
END,
@PrevCustomerKey = CustomerKey
FROM
RunningTotal WITH (INDEX(idx1),TABLOCKX);
SELECT * FROM RunningTotal WITH (INDEX(idx1),TABLOCKX)
Comments
Post a Comment