What Government Data Should be Transparent?

At an event I attended in March, Massachusetts’ Chief Information Officer Ann Marguiles raised a simple yet profound issue. Although they’re committed to open data, the Commonwealth was still to figure out which datasets to post online through their new data portal mass.gov/data.

Plenty of transparency advocates would say the answer should be “all of it.” However, I think this answer is unsatisfactory for a couple reasons. First, Massachusetts faces very real resource constraints. Administrative data is managed by hundreds of legacy systems across over 100 independent agencies. Many of these systems contain personal or otherwise sensitive data that precludes throwing open the doors, and requires time to create public reporting scripts. Second, the “free it all” position overlooks the government’s role as data collector. Plenty of information is collected and released merely as a public service: environmental data, population statistics, etc. Instead of just focusing on making paper records digital, we should discuss the larger issue: what types of information should governments make available?

I think there are several basic categories of types of data government should release. Each has its own logic, and a review of the categories can emphasize the multiple purposes of transparency.

1. Data “About the World” To Inform Research and Policy Debate
For a variety of reasons, governments often collect some of the most accurate and up-to-date descriptive data about communities. This includes vast array of geographic data, school and testing data, demographic data, employment and economic statistics, and more. It should be released primarily because it enhances our ability to create good policy, or collective understanding more generally.

2. Data Released to Improve Service Delivery
Some data should be released because it improves access to government services. This includes cases where the data itself is the service (e.g., research reports), but also includes more technical forms such as transit system data, government facility locations, and service details.

3. Data to Help Hold Government Accountable
A host of budget, voting, and performance data should be released to hold government accountable. However, metrics produced internally as part of stat-type programs introduces the problem of mixed motives. Why would governments want to release the data that can be used against them? This problem can be partially avoided by separating the data from the operations within the government organization. This concern also introduces the important issue of presenting information in accurate ways, and including metadata about definitions and collection methodology.

4. Data to Change Private Decisions to Achieve Policy Goals
In their book Full Disclosure, Archon Fung, Mary Graham, and David Weil argue many transparency policies fall into the new category of “targeted transparency.” Including mortgage reporting requirements, nutrition labels, and automobile crash ratings, these efforts make information available with the deliberate intention to achieve a public objective by influencing private decisions. These policies succeed when they provide people facts they want in the “times, places, and ways that enable them to act.” They stress these aren’t limited to policies seeking economic changes, but also include campaign finance reporting laws which work through political channels. Although implemented with the intention of reaching end users, the ease of citizens to access this data ranges widely. Some data are readily available, but governments rely heavily on intermediaries to analyze and present more complex (and politically-charged) data like the toxics release inventory or mortgage lending data from banks.

5. Data Posted to Improve Access Within or Across Government
Although it’s rarely discussed, I think an important use of available data is to help break down barriers within and between government agencies. This will be an unintended use so long as our governments are separated into layers and silos. This purpose explains why so much of the data on the HUDUser website are specific to certain policies or programs: the intended users are state and local governments and nonprofits, not the general public.

What do you think? Are these the right categories, or have I omitted something important?

Author: Rob Goodspeed


  1. Great list, Rob.

    I’d propose a sixth: “Data to Keep Citizens Informed,” which would include crime, health inspections, new business licenses and the types of frequently updated, “newsy” government information we collect at EveryBlock. I realize your list is slightly skewed toward incentives for *governments* to release their data, but I’d argue an informed public helps the government do its job better (e.g., in community policing).

  2. Adrian, thanks for the comment. Your comment makes me think the role could be data as part of the traditional role for public media, similar to some of the stuff in the Knight Commission’s report on “information needs in a democracy.”

  3. Great post. I’m not a transparency advocate, I’m concerned more with informative ability. Maybe transparency is a byproduct of good information, not raw data distribution? Here is a comment I submitted over at thescoop to a post that references yours. Perhaps should have posted it here first:

    I think what people are realizing is that there is a difference between data and information. I recently created the Center for Digital Information http://digitalinfo.org not focused just on government research/data, but policy research generally including think tanks, agencies, foundations, nonprofits, etc. The goal is to start to make these important distinctions between “research” “data” “information” where they are often used synonymously. To qualify as “information,” I maintain it needs to be effectively *communicated* (in digital media). Perhaps that’s where data distribution such as this falls short of information?

