Political events in the United States have shed new light on the fragility of publicly administered data. In just the first few weeks of the Trump administration and 115th Congress, the Environmental Protection Agency was allegedly ordered to remove climate change information from its website, the USDA removed animal welfare data from its website, and the House passed H.Res.5, specifically excluding changes to the Affordable Care Act from mandatory long-term cost data analysis. The Senate and House of Representatives have both received proposed bills (S.103 and H.R.482) prohibiting funding from being used "to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing." While researchers, archivists, librarians, and watchdog groups work hard to create and preserve open data, there's little guarantee that information under federal control will always survive changes to federal agencies.
Threats to open data aren't new, and archivists, librarians, and researchers have a long history of working to foster and preserve unfettered access to information. Events like Sunshine Week and Open Access Week highlight similar issues to the scholarly community and the press. The End of Term Web Archive project has functioned since 2008 to "capture and save US Government websites at the end of presidential administrations" as a collaboration between the Internet Archive, California Digital Library, University of North Texas Libraries, and Library of Congress, covering sites from all three branches of the federal government. However, since the November election, the urgency to address endangered datasets has been felt more deeply and by a larger community. The most visible effort (and an Endangered Data Week partner) focuses on environmental data: #DataRescue/DataRefuge, a program spearheaded by the Penn Program in the Environmental Humanities Lab, University of Pennsylvania Libraries, and Project_ARCC. Contributors to this project are scrambling to ensure that crucial datasets on climate change and related issues are preserved for researchers now and into the future. Datasets are being added to ICPSR's DataLumos and DataRefuge as a supplement to federal agency servers. Meanwhile, censorship fears are driving the Internet Archive to pursue backup strategies outside the United States of America.
Our public data will not be saved through a one-time mass backup, nor by distributed and uncoordinated, small acts of heroism. And, as the research data management community well knows, privately administered data is also under threat, often of benign neglect. We see Endangered Data Week as a service to projects like those listed above, and to the broader community of people who care about access to information. Together, we must: work for strong federal, state, and local open data policies; increase data skills and competencies among students and colleagues; and continue to shed light, year after year, on threats to data collections from all sources. An annual series of events, coordinated across campuses, nonprofits, libraries, citizen science initiatives, and cultural heritage institutions and spanning disciplines and types of datasets can shed light on public information that is in danger of being deleted, repressed, mishandled, or lost. Through this project, we hope to: raise awareness of different types of threats to publicly available data; engage with the power dynamics involved in data creation, sharing, and retention; foster concrete skills and collaborative projects; and highlight work to make endangered data more secure and accessible.
Spearheaded by Brandon Locke and co-founded by Jason A. Heppler, Sarah Melton, and Rachel Mattson, in collaboration with Bethany Nowviskie and Wayne Graham, and inspired by events like Banned Books Week and Open Access Week, this project quickly gained the attention and support of the Digital Library Federation's interest group on Government Records Transparency/Accountability. Additional contributors include Purdom Lindblad, Kristen Mapes, Anna Kijas, and (for DLF) Katherine Kim and Becca Quon. DataRefuge, Mozilla Science Lab, the NDSA, and CLIR join the DLF as project sponsors.