Using Graphviz, Drupal and Google Analytics to Display Keyword Relationships

Recently, I started experimenting with the Google Analytics API. Using it, you can extract data from Google Analytics for whatever processing you might want to do. My first attempt was to access Google Analytics to see who is viewing the most pages on my site when the come from an EntreCard inbox. Yesterday, I went a bit further and used php and graphviz in Drupal to create a graph of the relationship of keywords used to access a site.

Here is a graph of the relationship of the most frequently used keywords for Orient Lodge over the past few days:

Graphviz Keywords, originally uploaded by Aldon.

For the geeky details, read on.

User’s Guide

To use the Graphviz Google Analytics Keywords page, you need to have access to some Google Analytics data. When you first enter the page, it asks you to authorize the page. This allows my program to have temporary read access to your Google Analytics data. If you aren’t already logged into Google Analytics, it will ask you to login. Then it will ask if you want to give my page permission to access your data.

The first thing that it then does, is retrieve a list of different websites you have access to Google Analytics for. It displays a list of these websites. Each website is a link. When you click on the link, you are sent to the display page for my Graphviz Google Analytics Keywords page. It uses a special security token that is only good for a limited time. Once you get to the page, it will show you the top ten searches connecting keywords from different searches over the past seven days.

Advanced Features

Once you have gotten this graphic, for as long as the special security token is valid, you can change two of the parameters used to create the graphic. The ‘days’ parameter in the URL tells Google how many days to go back in the search data. The current default is ‘7’. The ‘lines’ parameter is how many lines to get from Google. The default is ‘10’. The larger this number, the more complicated the graph will get, until eventually, you reach a breaking point.

The guts

So, what is really going on? The PHP code in the Drupal page parses the data that comes back from Google. For each search phrase, it goes through and creates a pair of nodes for each two adjacent words in a search string. As an example, the search string
Using Graphviz Drupal and Google Analytics
Would produce five pairs:

Using -- Graphviz
Graphviz -- Drupal
Drupal -- and
and -- Google
Google -- Analytics

This allows other strings to connect on matching words. Right now, I’m stripping out double-quotes. However, I’m leaving in the most common search string “(not set)” and I’m not doing anything special for other symbols. I put each word in double-quotes in case the word has something in it that would break GraphViz.

I’ve kicked around using the number of page views to represent the appropriate width for the links and I’ve thought about using other GraphViz programs, instead of ‘dot’ which is what I’m using right now.

To pull this all together I use the Graphviz_filter in Drupal. This takes the graphviz data that has been generated on the fly by PHP using the Google Analytics data, and processes it into a PNG file that gets embedded into the webpage.

If you want a copy of the source code or have other ideas, let me know. I’ll gladly share the code with anyone interested and will consider enhancements if they aren’t too complicated and I have time.