(Editor’s Note: This article originally appeared online at StateScoop. It is reprinted here with permission. Any errors in misprinting should be assumed to be ours. Thank you for contacting us with any errors. The image accompanying this version of the article appeared in the original.)
It can be hard to notice when something is missing, particularly in an age of such abundance. But amid the internet’s vast and churning piles of information, there are pieces missing from the open government puzzle.
Since the National Oceanic and Atmospheric Administration began releasing weather data in the 1970s, government and its surrounding community of technologists have increasingly demonstrated both the ability and inclination to publish information not with the attitude that it was granting the public some royal benignity, but that those data sets and knowledge should have been available from the start.
Although the volume of data published has steadily increased, only recently has it taken the shape of something substantial and publicly visible. As teams from around the world nurture their projects to maturity — with several near their mainstream debuts — the image of the future of truly open government is more full but remains hardly complete.
What’s the most crucial missing piece? Everyone has a different answer — a sense of direction, privacy controls, personalized user engagement and overall data quality and reporting consistency are just a few.
One of the most insistent voices in open government work, the Sunlight Foundation, says a more “tactical” approach is needed. Last week, the nonprofit published a Guide to Tactical Data Engagement — a step-by-step roadmap for governments looking to outgrow a lackadaisical build-it-and-they-will-come mentality and replace it with a focused strategy infused with all the diligence of a Hollywood bank heist.
The whole point of publishing new data is that it might contain some reserve of potential energy. Sunlight has been studying this space for years and working intensely with cities funded through the Bloomberg Philanthropies What Works Cities initiative, especially during the past year. The group says it’s getting a good idea of how to arrive at “data that makes a difference” — and it wants to share it.
Sunlight’s formula is conceptually simple: engage with the community and find a narrow use case, collaborate with users on designing a plan and then do it. The group’s expertise is born out of experience working with cities that no longer find open data mysterious or intimidating.
Kevin Merritt, founder and CEO of open data firm Socrata, echoed that point, and said open data “has gone mainstream” and is no longer associated only with notions of accountability.
“The most successful open data programs are the ones in which open data is a byproduct of governments breaking down data silos and making financial, operational and transactional data broadly available internally, to government workers,” Merritt said. “And most importantly, agencies are recognizing that open data can be leveraged internally to improve operational efficiency and program outcomes.”
OpenGov’s Cameron Galbraith told StateScoop this observation is validating open data’s original promise. Open data isn’t just for citizen engagement anymore, he said — it’s becoming government infrastructure itself.
“The ultimate goal of the government is to serve their communities and improve outcomes and the quality of life,” Galbraith said. “Tying that back to what can be done with the data was a little more tenuous in that first generation, but now the next generation of open data, that is where things are being more closely tied to specific strategic priorities and getting the right data together to measure progress against those and inform the decisions that governments are making.”
Open data’s diminishing adolescence is in part a product of necessity. Many cities are now preparing their policies for a new generation of devices like self-driving cars and public sensors like those in the Array of Things that will generate potentially unmanageable troves of data and introduce conflicting notions of data ownership and privacy.
The City of Seattle and the Future of Privacy Forum (FPF) are now amidst a risk assessment to evaluate the city’s “open by preference” data policy. In early 2016, the city made a firm commitment to including every department in a thorough and sustainable process for publishing as much data as possible.
It’s “extremely important” to be mindful of the potential for harming or accidentally identifying residents as a city grows its open data efforts, said David Doyle, Seattle’s open data program manager.
“We want to make sure that when we are releasing data to the public, we are taking into account all of those factors and ensuring no personally identifiable information is leaked or other indicators that could be used to reconstruct identity,” Doyle said.
When departments review their data, they frequently do so in isolation, Doyle said, but to the detriment of privacy, the data is no longer isolated once it’s released.
“How do we think holistically about when we bring those data sets together, and how do we avoid creating that ‘mosaic effect’ where data can be joined and reconstructing various things we may not want people to do?”
The mosaic effect shows how innocuous and anonymous data points can be combined to identify users with startling accuracy. One 2013 study that looked at 15 months of mobile phone usage data for a half-million people notes that “human mobility traces are highly unique” and that “four spatio-temporal points are enough to uniquely identify 95 percent of the individuals” studied.
The report explains one way users can be uncovered: “A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual’s [calling] patterns are unique enough, outside information can be used to link the data back to an individual.”
The city is also motivated in its study by a desire to maintain “data quality” and “fairness,” according to the draft report.
Though Seattle is a leader in privacy efforts for open data, Doyle said the city is still in the early stages of integrating its research. After the public comment period for the risk assessment ends on Oct. 2, FPF will then draft a final report and provide a framework called the Model Open Data Benefit Risk Analysis (MODBRA). That advanced framework will then need to be “operationalized” by the city, Doyle said, and its chief privacy officer will write a strategic plan for dealing with privacy across the city — open data efforts included.
A lot of this work is in establishing the infrastructure for publishing data, Doyle said, and one of the things the city hopes to eventually produce is some kind of tool or source code that can help other cities publish data in a sustainable and privacy-safe way. Seattle’s meticulous and time-consuming work doesn’t necessarily need to be replicated in each new city, he said.
Where open data is directed at the public, it’s more friendly than it ever was before — if it open data isn’t mainstream now, it’s certainly on its way.
Each new data project to come out of Boston puts user experience and narrative context front and center. Chicago’s open data portal, which launched in April, was designed specifically with non-experts in mind, and the city’s geospatial open data tool, OpenGrid, puts government data in a context everyone is familiar with by plotting it on a map.
Kansas City built a chatbot for Facebook messenger to help people fetch information from government data stores in a more natural way after its mayor reportedly noted that “open data is hard to use.”
“I’m KCData Bot! I can help you find data from Kansas City, Missouri’s Open Data Portal! What type of data can I help you find today?” the experimental bot writes users.
When New York City relaunched its open data portal in March, the new front page showcased its inclusive maxim: “Open Data for All New Yorkers.” NYC Chief Technology Officer Miguel Gamiño called it “a welcome mat” for the city’s public data.
Becoming more approachable and easier to use is the natural evolution of all technologies, but New York City’s user research takes its outreach yet another step further.
In May, international consulting firm Reboot published a report commissioned by the city’s open data teams that is now allowing the city to treat its users as more specific people, rather than an indiscriminate uniform mass.
“Open data has been in the spotlight for so long as this really great thing, but we see it as only one component of a larger open government strategy and mission,” Reboot’s Emily Herrick told StateScoop. “We were hired to help them understand and align their priorities around who they should be creating service improvements for.”
Researchers spent five weeks surveying both who was using open data portals, but more uniquely, how that use was operating “within a larger open data ecosystem,” according to the report. Researchers studied people ranging from those who had never heard of open data to developers with extensive experience, all with the goal of showing the city how it could be engaging these people more effectively.
“We wanted to be more conscious of the fact that not everyone who’s going use open data is going to be a data scientist or someone who knows R or who has a strong background in analytics,” said Adrienne Schmoeker, director of civic engagement & strategy at the Mayor’s Office of Data Analytics. “It might be someone who just wants to find more information.”
In its report, Reboot includes six user personas with names like the “Community Champion” or the “Influential Interpreter,” each an archetype of the potential and existing users the city hopes to develop relationships with.
“What we’ve really done over the last six to seven months since we’ve had this research in hand is it’s given our team a new vocabulary,” Schmoeker said. “So now we can say, ‘OK, this will be a really good event for the Meticulous Mapper.’ Or, ‘What can we do to engage for Busy Bystander?’”
Messaging and engagement in the open data world is following a similar trajectory to that of American marketing and popular media in general. Both advertising and media in the U.S. are historically rooted in the concept of mass appeal — more than 80 percent of American households tuned in to watch Elvis Presley on “The Ed Sullivan Show” on Sept. 9, 1956, but today there are YouTube channels with content, tone and presentation styles that cater to every sensibility imaginable. Open data is its own kind of media content, and NYC officials have picked up on the idea that there might be a more efficient strategy for outreach.
“What I’ve seen in the last year of supporting this initiative is that there’s a lot of excitement around having more structure in how we’re providing support to city agencies to help them publish their data, a lot of excitement from the public when we have events or workshops,” Schmoeker said. “If we do open data right and we grow this, it could, in a very idealistic sense, change the way we do democracy.”
Some are opening data with the specific idea of changing how the country does democracy.
Last year, the Sunlight Foundation relaunched an index of open data policies that contains links to government documents dictating how dozens of cities and counties are running their open data programs. Putting all the policies in one place makes it easier to compare them and “democratize the process of policy making,” Sunlight Open Cities Director Stephen Larrick told StateScoop.
A project called OpenElections is gathering and sharing precinct-level election data dating back to 2000. Project Lead Derek Willis told StateScoop the data, much of which has never been widely available before, is expected to help journalists, app developers, political scientists and the nation at large to better understand how and why people vote the way they do.
And the Open Law Library, a nonprofit started about two years ago, is taking a stab at building the infrastructure to automatically publish the entire bulk of the nation’s laws as they’re updated in real time.
“A lot of people don’t realize their laws are not actually accessible,” said David Greisen, co-founder and CEO of Open Law Library.
Government websites that host legislative data are frequently slow to update and are full of errors, Greisen said. And the third-party websites that host legislative data typically do not allow its use for commercial, nonprofit or educational uses without payment. There’s a lot missing here, he said.
Before taking on the entire country’s legal data, the Open Law Library has started by partnering the Council of the District of Columbia on a pilot project, and Greisen says the body has been “a fantastic partner.”
“D.C. has more code changing ordinances in a year than many states five, six, seven times larger,” Greisen said. “And yet we’ve been able to — through automation — get the time to codify a law down to the point where a single attorney is able to do the entire process that before took substantial resources of a for-profit publisher.”
Using machine learning and natural-language processing, Open Law Library is hitting government in its sweet spot: turnkey technology that can generate untold savings in operational efficiency.
“We’re getting really close to the point where it takes less time for the District personnel to use our system once than it is to be checking all the manual changes made by the codifier and having to fix many errors and go through multiple revisions,” Greisen said. “And we’re reducing the amount of manpower the D.C. Council has to expend on creating the code.”
Despite its progress, the Open Law Library is still in its infancy compared to what people are familiar with in the open data world, Greisen said.
But the country’s legal materials — its laws, policies, and procedures — aren’t just a guide the how the country is supposed to work, but are also indicative of what its priorities are, and the public should have free and easy access to all of that, he said. And thanks to their work, it soon may.
For Daniel Castro, director of the Center for Data Innovation, the missing piece in open government is consistency and quality. Each place is different, he said — the types of data that are available vary from place to place, the laws permitting data availability vary, and the quality of the data can vary wildly even within a single municipality, let alone the country.
“The last few years, we’ve seen a lot of basic education and a lot of governments have said they’re doing open data,” Castro said. “The question is: can we do it well?”