Blog Log Analysis
I've been keeping a blog for about two months now, and I thought it would be an interesting
exercise to do some analysis of the logs. The blogging application that this site
uses (BlogX)
records the daily hits each blog gets into a tab-delimited file, so I used Data Transformation
Services to clean the data up a bit and import it into SQL Server, and then finally
used Analysis Services to create a multidimensional cube that I could manipulate with
Excel. This process worked very smoothly, and saved the need to purchase a specialised
web reporting tool. I'll document this process more fully at a later stage, but the
information gleaned from the analysis was quite revealing about the current status
of the blogging world:
-
At the moment my blog averages around 40,000 hits per month. I've no idea how
that compares to other blogs out there, but knowing that your blog is read is definitely
a motivating factor when writing new entries! I suspect that most people stumble across
this blog because it's posted on the main GotDotNet
blogs page; I'm certainly under no illusions that it's to do with any personal
fame. Like any other website, one of the biggest challenges of a blog is capturing
and maintaining traffic to the site. For bloggers without the inherent advantage of
working for Microsoft, aggregation sites such as PDC
Bloggers are probably one of the best ways to spread the word.
I'm amused and amazed at how many people have wound up at the blog by means of a Google
search. Unsurprisingly, searching for "Tim Sneath" brings the blog more or less to
the top of the results, but I've had hits that have come from such bizarre search
terms as "lossless wma", "Sitar music that you can listen to on the net", and "Frank
Zappa AND Albanian Music"\! Approximately 5% of browser hits to the site come via Google;
other search engines might as well not exist for the traffic they bring.
- There's an astonishing variety of blog aggregators and browsing tools in use: I counted
over 500 distinct user agent strings. Of the aggregators, various variants of SharpReader are
the most popular, with a 46% share; Newsgator comes
next with 23%; NewzCrawler has a 5% share,
and many others have a smaller share. (Incidentally, 8% of visitors have an empty
useragent string, a surprisingly high number.) I'm a SharpReader user myself; although
I've never done an exhaustive survey of aggregation tools, I've certainly heard good
things about Newsgator. What's NewzCrawler like (I've not come across it before)? - The most popular blog entries have been ADO.NET
Tips and Tricks, Mind
Mapping and New
C# Features in Whidbey. The last of the three can be explained by a link from Robert
Scoble's immensely popular blog, but the other two were a little more unexpected.
I'll write more on ADO.NET shortly. - Traffic drops by about 20% at the weekend. I was expecting that to be higher, but
I guess many people leave their computers on permanently, so the aggregators continue
to poll for new content.
Overall it's been an intriguing experiment. I look forward to repeating it in a couple
of months to see whether there have been any noticeable changes of trend as weblogging
continues to mature.
Comments
- Anonymous
September 27, 2003
Hi! Is there a general place I could to work up a solution like yours? i'm trying to analyse my blog, and I like your solution, but I know nothing about datacubes :) - Anonymous
November 24, 2003
Addendum: Have now written up the "HOWTO" for creating the Analysis Services cube at the following location: http://blogs.gotdotnet.com/tims/PermaLink.aspx/334ec9fe-7abb-4291-91a2-a9a5d600a5fd - Anonymous
July 16, 2004
Here's another google search that put your blog on the 1st page - "most popular blogs". That should keep you writing. If you find yourself ever at a loss for something to comment on you can go to the Random Thoughts section of www.fastbreed.com for a collection of rarely, if ever, voiced ideas.