SSIS - Google Spreadsheets Data Source
Download Source Files: Here
In this blog entry I’m going to show you how to access data from a Google spreadsheet in SSIS using the Google Data API.
I’m really into Google Spreadsheets, especially the newest release which allows for a lot of collaboration. One of the great things about Google spreadsheets is the ability to create forms that you can Email to other or even embed in your blog.
However for me there is one limitation, how to get your data out of those spreadsheets and into other applications. Well in true Google fashion they’ve built an API that lets you access and manipulate Google Spreadsheets.
Installing the Google API
Firstly you need to download and install the Google Data API SDK
http://code.google.com/p/google-gdata/downloads/list
Once downloaded, run the Google Data API Setup(1.4.0.2).msi, this will install the dll’s in C:\Program Files\Google\Google Data API SDK\Redist you need to register the following dll’s in the GAC (C:\Windows\assembly).
Google.GData.Client
Google.GData.Extensions
Google.GData.Spreadsheets
You can just drag and drop them from :\Program Files\Google\Google Data API SDK\Redist into C:\Windows\assembly. However I had a few issues when I tried to do this. I’m using windows vista and had User Control Access switched on, so I switched this off, restarted, and after that had no problems. There is an excellent article about this here:
Now that we’ve installed and registered everything, we can get on with accessing our data.
Creating the spreadsheet
For this example I created a simple Google spreadsheet from a form. There is an excellent tutorial on YouTube. The final spreadsheet I created had a TimeStamp, FirstName, LastName and Age field.
An important piece of information that you need to remember is the spreadsheet key, you can get this from the URL, you’ll need this in order to access the spreadsheet later.
Once the spreadsheet is created and populated with some data then its time for us to build a custom source component to access this data.
Building the source component
Open BIDS and start a new project, rename the package Google.dtsx. and add a Data Flow Task.
Add the following variables to your package:
strUserName | Your Google account user name |
strPassword | Your Google account password |
strKey | The spreadsheet key |
Inside the Data Flow Task add a script component, choose Source as the component type.
Inside the script component we are going to setup the columns that are going to be output. Go to the Inputs and Outputs tab, Select the Output node in the treeview control and click Add Column.
Add the following columns to the output:
Field | DataType |
TimeStamp | DT_DBTIMESTAMP |
FirstName | DT_STR(50) |
LastName | DT_STR(50) |
Age | DT_14 |
Now you can go back to the Script tab Change the Script Language to Microsoft Visual Basic 2008, On the ReadOnlyVariables property select the variable that we just setup.
Your script component should now look something like this:
With all this in place we are now ready to edit the script component. Click Edit Script, and paste the following code:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports Google.GData.Client
Imports Google.GData.Extensions
Imports Google.GData.Spreadsheets
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Dim objListFeed As ListFeed
Public Overrides Sub PreExecute()
MyBase.PreExecute()
Dim objService As SpreadsheetsService
Dim objWorkSheetQuery As WorksheetQuery
Dim objWorkSheetFeed As WorksheetFeed
Dim objWorkSheet As WorksheetEntry
Dim objListFeedLink As AtomLink
Dim objListQuery As ListQuery
Dim bt(0) As Byte
'Create a connection to the google account
objService = New SpreadsheetsService("exampleCo-exampleApp-1")
Me.Log(Variables.strPassword, 0, bt)
Me.Log(Variables.strUserName, 0, bt)
objService.setUserCredentials(Variables.strUserName, Variables.strPassword)
Me.Log("Service: " + Variables.strUserName, 0, bt)
'Connect to a specific spreadsheet
objWorkSheetQuery = New WorksheetQuery(Variables.strKey, "private", "full")
objWorkSheetFeed = objService.Query(objWorkSheetQuery)
objWorkSheet = objWorkSheetFeed.Entries(0)
Me.Log("Spreadsheet: " + objWorkSheet.Title.Text.ToString, 0, bt)
'Get a list feed of all the rows in the spreadsheet
objListFeedLink = objWorkSheet.Links.FindService(GDataSpreadsheetsNameTable.ListRel, Nothing)
objListQuery = New ListQuery(objListFeedLink.HRef.ToString()) objListFeed = objService.Query(objListQuery)
Me.Log("ListFeed: " + objListFeed.Feed.ToString, 0, bt)
End Sub
Public Overrides Sub PostExecute()
MyBase.PostExecute()
End Sub
Public Overrides Sub CreateNewOutputRows()
Dim objRow As ListEntry
For Each objRow In objListFeed.Entries
With Output0Buffer
.AddRow()
.TimeStamp = objRow.Elements.Item(0).Value
.FirstName = objRow.Elements.Item(1).Value
.LastName = objRow.Elements.Item(2).Value
.Age = objRow.Elements.Item(3).Value
End With
Next
Output0Buffer.EndOfRowset()
End Sub
End Class
Code Exaplined:
The mainstay of the work is done in the PreExecute method.
First I created a service object that uses the username and password to connect to a Google account.
Then we connect to a specific spreadsheet, using the spreadsheet key
Once connected to the spreadsheet, we create a list feed of all the rows in the spreadsheet.
The list feed then gets used in the CreateNewOutPutRows method, we iterate across all the rows in the feed and pass each row out as a new buffer from the component.
For more information about the Google API you can go to the following place:
http://code.google.com/apis/spreadsheets/data/2.0/developers_guide_dotnet.html#RetrievingListFeeds
This post was a MASSIVE help. Karma points in any way possible to you! :)
ReplyDeleteThanks for posting this! Any idea about updating the Google Spreadsheet with results?
ReplyDeleteThanks for the article. I'm getting NameSpace warnings with:
ReplyDeleteImports Google.GData.Client
Imports Google.GData.Extensions
Imports Google.GData.Spreadsheets
and then undefined errors.
Oops, never mind, I forgot to reference the dlls. working great now, thanks.
ReplyDeleteWonderful article, thanks!
ReplyDeleteWhen interactively testing my SSIS package, it seems to stick in the processing phase (Data Flow task yellow) and never finishes.
Is there anything special i have to do on the Google spreadsheet end to ensure access to the doc besides sharing it (can edit) with the user ID I'm using in the SSIS package?
D
Super helpful but i need to be able to connect to specific spreadsheet in the Google doc. How would i do that?
ReplyDeleteThanks for this - nicely succinct and useful article.
ReplyDelete@Allan - for reference, the snippet you'd need to work with is around the objWorkSheetFeed object - Entries(0) will be the first tab, Entries(1) will be the second etc. Check the documentation link for details (specifically - "Working with worksheets")
Hi, I have used this and it helped me alot, but I have come a cross problem. My script component is only reading 100 records. How do I change that?
ReplyDeleteThank you
Have tried this and process hans on Pre-Execute -50 percent complete
ReplyDeleteCould someone please advise as the Data Flow Task shows status as Pre-Execute -50 percent complete.
ReplyDelete[Script Component [23]] Error: Google.GData.Client.GDataRequestException: Execution of authentication request returned unexpected result: 404
ReplyDeleteat Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)
am also getting same error can you pls hekp us
ReplyDelete[Script Component [23]] Error: Google.GData.Client.GDataRequestException: Execution of authentication request returned unexpected result: 404
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)