SSIS - Google Spreadsheets Data Source

Download Source Files: Here

In this blog entry I’m going to show you how to access data from a Google spreadsheet in SSIS using the Google Data API.

I’m really into  Google Spreadsheets, especially the newest release which allows for a lot of collaboration.  One of the great things about Google spreadsheets is the ability to create forms that you can Email to other or even embed in your blog.

However for me there is one limitation, how to get your data out of those spreadsheets and into other applications.  Well in true Google fashion they’ve built an API that lets you access and manipulate Google Spreadsheets. 

Installing the Google API

Firstly you need to download and install the Google Data API SDK

http://code.google.com/p/google-gdata/downloads/list

Once downloaded, run the Google Data API Setup(1.4.0.2).msi, this will install the dll’s  in C:\Program Files\Google\Google Data API SDK\Redist you need to register the following dll’s in the GAC (C:\Windows\assembly).

Google.GData.Client

Google.GData.Extensions

Google.GData.Spreadsheets

You can just drag and drop them from :\Program Files\Google\Google Data API SDK\Redist  into C:\Windows\assembly.  However I had a few issues when I tried to do this.  I’m using windows vista and had User Control Access switched on, so I switched this off, restarted, and after that had no problems.  There is an excellent article about this here:

Now that we’ve installed and registered everything, we can get on with accessing our data.

Creating the spreadsheet

For this example I created a simple Google spreadsheet from a form.  There is an excellent tutorial on YouTube.  The final spreadsheet I created had a TimeStamp, FirstName, LastName and Age field.

image

An important piece of information that you need to remember is the spreadsheet key, you can get this from the URL, you’ll need this in order to access the spreadsheet later.

image

Once the spreadsheet is created and populated with some data then its time for us to build a custom source component to access this data.

Building the source component

Open BIDS and start a new project, rename the package Google.dtsx. and add a Data Flow Task. 

image

Add the following variables to your package:

strUserName

Your Google account user name

strPassword

Your Google account password

strKey

The spreadsheet key

image 

Inside the Data Flow Task add a script component, choose Source as the component type.

image

Inside the script component we are going to setup the columns that are going to be output.  Go to the Inputs and Outputs tab, Select the Output node in the treeview control and click Add Column.

image 

Add the following columns to the output:

Field

DataType

TimeStamp

DT_DBTIMESTAMP

FirstName

DT_STR(50)

LastName

DT_STR(50)

Age

DT_14

Now you can go back to the Script tab Change the Script Language to Microsoft Visual Basic 2008, On the ReadOnlyVariables property select the variable that we just setup.

image 

Your script component should now look something like this:

image

With all this in place we are now ready to edit the script component.  Click Edit Script, and paste the following code:

Imports System 
Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports Google.GData.Client

Imports Google.GData.Extensions

Imports Google.GData.Spreadsheets

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _

<CLSCompliant(False)> _

Public Class ScriptMain

    Inherits UserComponent

   
Dim objListFeed As ListFeed

   
Public Overrides Sub PreExecute()

       
MyBase.PreExecute()


       
Dim objService As SpreadsheetsService
       
Dim objWorkSheetQuery As WorksheetQuery

       
Dim objWorkSheetFeed As WorksheetFeed
       
Dim objWorkSheet As WorksheetEntry

       
Dim objListFeedLink As AtomLink
       
Dim objListQuery As ListQuery

       
Dim bt(0) As Byte

       
'Create a connection to the google account

       
objService = New SpreadsheetsService("exampleCo-exampleApp-1")

       
Me.Log(Variables.strPassword, 0, bt)

       
Me.Log(Variables.strUserName, 0, bt)

       
objService.setUserCredentials(Variables.strUserName, Variables.strPassword)

       
Me.Log("Service: " + Variables.strUserName, 0, bt)


       
'Connect to a specific spreadsheet

       
objWorkSheetQuery = New WorksheetQuery(Variables.strKey, "private", "full")

       
objWorkSheetFeed = objService.Query(objWorkSheetQuery)

       
objWorkSheet = objWorkSheetFeed.Entries(0)

       
Me.Log("Spreadsheet: " + objWorkSheet.Title.Text.ToString, 0, bt)

       
'Get a list feed of all the rows in the spreadsheet

       
objListFeedLink = objWorkSheet.Links.FindService(GDataSpreadsheetsNameTable.ListRel, Nothing)

       
objListQuery = New ListQuery(objListFeedLink.HRef.ToString())     objListFeed = objService.Query(objListQuery)

       
Me.Log("ListFeed: " + objListFeed.Feed.ToString, 0, bt)

   
End Sub

    Public
Overrides Sub PostExecute()

       
MyBase.PostExecute() 

   
End Sub


    Public
Overrides Sub CreateNewOutputRows() 

       
Dim objRow As ListEntry

       
For Each objRow In objListFeed.Entries

           
With Output0Buffer

                .AddRow
()

               
.TimeStamp = objRow.Elements.Item(0).Value

                .FirstName
= objRow.Elements.Item(1).Value

                .LastName
= objRow.Elements.Item(2).Value

                .Age
= objRow.Elements.Item(3).Value

           
End With

        Next


       
Output0Buffer.EndOfRowset()

   
End Sub

End
Class


Code Exaplined:



The mainstay of the work is done in the PreExecute method.



First I created a service object that uses the username and password to connect to a Google account.



Then we connect to a specific spreadsheet, using the spreadsheet key



Once connected to the spreadsheet, we create a list feed of all the rows in the spreadsheet.



The list feed then gets used in the CreateNewOutPutRows method, we iterate across all the rows in the feed and pass each row out as a new buffer from the component.



For more information about the Google API you can go to the following place:



http://code.google.com/apis/spreadsheets/data/2.0/developers_guide_dotnet.html#RetrievingListFeeds

Comments

  1. This post was a MASSIVE help. Karma points in any way possible to you! :)

    ReplyDelete
  2. Thanks for posting this! Any idea about updating the Google Spreadsheet with results?

    ReplyDelete
  3. Thanks for the article. I'm getting NameSpace warnings with:
    Imports Google.GData.Client
    Imports Google.GData.Extensions
    Imports Google.GData.Spreadsheets

    and then undefined errors.

    ReplyDelete
  4. Oops, never mind, I forgot to reference the dlls. working great now, thanks.

    ReplyDelete
  5. Wonderful article, thanks!

    When interactively testing my SSIS package, it seems to stick in the processing phase (Data Flow task yellow) and never finishes.

    Is there anything special i have to do on the Google spreadsheet end to ensure access to the doc besides sharing it (can edit) with the user ID I'm using in the SSIS package?

    D

    ReplyDelete
  6. Super helpful but i need to be able to connect to specific spreadsheet in the Google doc. How would i do that?

    ReplyDelete
  7. Thanks for this - nicely succinct and useful article.

    @Allan - for reference, the snippet you'd need to work with is around the objWorkSheetFeed object - Entries(0) will be the first tab, Entries(1) will be the second etc. Check the documentation link for details (specifically - "Working with worksheets")

    ReplyDelete
  8. Hi, I have used this and it helped me alot, but I have come a cross problem. My script component is only reading 100 records. How do I change that?

    Thank you

    ReplyDelete
  9. Have tried this and process hans on Pre-Execute -50 percent complete

    ReplyDelete
  10. Could someone please advise as the Data Flow Task shows status as Pre-Execute -50 percent complete.

    ReplyDelete
  11. [Script Component [23]] Error: Google.GData.Client.GDataRequestException: Execution of authentication request returned unexpected result: 404
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    ReplyDelete
  12. am also getting same error can you pls hekp us

    [Script Component [23]] Error: Google.GData.Client.GDataRequestException: Execution of authentication request returned unexpected result: 404
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    ReplyDelete

Post a Comment

Popular posts from this blog

Exam 70-443 – Overview