Volunteers hack, scrape and crawl government websites to save data

Rich McMartin and Mariquita Anderson work on saving research data
Volunteers Rich McMartin and Mariquita Anderson work on saving research data from the federal government during a recent Data Rescue event at the University of Minnesota.
Max Nesterak | MPR News

Fear that environmental data could disappear under the Trump administration has spurred hundreds of volunteer researchers, hackers and archivists around the country to start saving federal data on secure, non-government servers.

In Minneapolis, some 150 volunteers brought their laptops to the basement of the Humphrey School of Public Affairs at the University of Minnesota last weekend. They worked on digging through various not-yet-archived pockets of the massive .gov system.

The effort is organized by the group DataRefuge, which provides universities instructions on how to host these Data Rescue events. They have a place to save all the data, a system for documenting what's been saved, and even offer a clear division of labor.

At the University of Minnesota, "seeders," those with basic computer skills, sent EPA webpages on pesticide risks to the Internet Archive. Volunteer "harvesters" — those hacking skills — worked on writing code to download employee directories for various agencies like the Environmental Protection Agency and National Oceanic and Atmospheric Administration. At the end of the production line were "baggers and checkers," who made sure the information would make sense to future researchers, and then pushed it to a public repository.

Create a More Connected Minnesota

MPR News is your trusted resource for the news you need. With your support, MPR News brings accessible, courageous journalism and authentic conversation to everyone - free of paywalls and barriers. Your gift makes a difference.

Events like these have also happened in Los Angeles, New York, Philadelphia and Berkeley, Calif.

While concern over what President Trump might do has galvanized the effort, this problem existed long before Trump's campaign, says Alicia Kubas, a U librarian who helped organize the event.

"We don't want to see this necessarily as a political thing. This is something libraries have been thinking about for a while — how we archive these materials that are really important, not just to researchers but also the public," Kubas said.

That's because the government doesn't really archive all the data it collects. For example, data the Department of Energy collects on hybrid fuel emissions or the data the Department of Agriculture keeps on crop yields could disappear. And access to this data is vulnerable to more than just changes in administration.

Kubas says during the government shutdown in 2013, researchers couldn't access data they use.

"It came back when it was back on, but for those few weeks we had quite a few researchers who said, 'what am I supposed to do, I don't have access to this data that I use all the time,' " Kubas said. "And if it's not there it's not there; there's nothing you can do."