How Do You Create a Popular Blog?

There’s plenty of sources for advice for new bloggers. However, there’s precious little quantitative data about what factors are the most important to building readership. Among the most obvious quantitative factors — posting frequency, number of external links, blogroll size, and simple longevity — which are the most important? In general, are there statistical relationships between any of these factors and readership?

I set about attempting to answer this question myself. In order to create a representative sample of blogs I decided to use the Truth Laid Bear traffic rankings, which contains a listing of the most popular 5,000 blogs on the web. This website compiled the traffic data from bloggers who have added the free traffic tracking service Sitemeter. Although my sample would be limited to bloggers who use the Sitemeter tool (almost entirely from the English-language blogosphere), as far as I could tell the TLB system was relatively free of non-blog websites and contain a wide variety of blogs. The traffic data would also be impacted by the site design choices of individual bloggers because meter readings can be affected by the complexity and size of the site and their placement within the page code.

In order to identify my pool I generated 50 numbers between 1 and 5,000 using this random sequence generator. I looked up the blogs corresponding with these 50 rankings on the TLB rankings, recorded the average number of visitors per day his database contained, and visited each site. For each blog I recorded several pieces of data. First, I recorded the approximate founding date, usually the earliest entry found through chronological archive pages. Although in some cases the first post was clearly labeled as such, in others the oldest posts just seemed to start right in, making it possible that still earlier posts were contained in older blogging systems. (In order to check this data I plan on running each URL through the Internet Archive.) Second, in order to get a measure of posting frequency I recorded the number of posts published between September 24th and 30th, 2006. The period is the last week of September of this year, free of any major holidays I am aware of. Lastly, I looked for the Sitemeter icon and recorded the “Average Visits Per Day” available from the summary page. In the course of the data collection I discovered the LTB data was at least six months out of date, which would introduce a chronological bias into my study excluding new sites.

Of the original 50, 9 sites were either not blogs or inaccessable (domains expired, site blank, etc), and 10 had no posts for the week in question. This group of 19 inactive blogs included two with announcements that they had been shut down by their owners, and several were short-term projects long abandoned by their owners. I decided to exclude this entire group because I wanted to study the function of active blogs updated at least once per week.

I then created two scatterplots for this n=31 data: one displaying the observed number of posts against the TLB/Sitemeter Average Daily Visits, and the other plotting the age (in days) against the daily visits. I then ran a simple linear regression analysis for each. In short, at this point age seems to be a better predictor for traffic than posting frequency.

Posts vs. Traffic

Age vs. Traffic

Before I take the statistical analysis any further, I’ll need to collect more data. Up next, I’ll be collecting data relating to the number of blogs linking to each blog in my sample from Technorati, and adding to the sample set so that it contains 50 active blogs. I also think I will explore obtaining “snapshot” traffic data, available from the summary page of sites with public meters. Lastly, I need to clean up the age data as best I can using the Internet Archive. Then I’ll run some multivariate analysis to see if whether holding age constant posting more really results in more traffic, and if there are other statistical connections. I’m avoiding diving into the question of blogrolls because of the tedium involved, however it seems another relevant variable.

What do you think?

Author: Rob Goodspeed


  1. Thank you for the link, “hint.” Is that image from a public document? Also, I wonder if anyone from Metblogs has analyzed the age of all their sites and traffic to see what role the age had in producing the same result. Since the older sites would also have larger staffs and more readership the two variables are probably both important.

Comments are closed.