For nearly 50 years, data accumulated by scientists at UC Berkeley’s Central Sierra Snow Laboratory languished in notebooks, and later in spreadsheets, at the mountain outpost, located just 3 miles west of Donner Pass. The valuable information — measurements of temperature, precipitation quantity, snowfall, and snowpack characteristics from 1970 to 2019 — held untapped potential for researchers studying California’s changing climate. But the dataset was accessible only by email request.
That changed in 2021, when Andrew Schwartz, the manager and lead scientist of the lab, reached out to the UC Berkeley Library Data Services Program. Schwartz wanted support in helping people accurately cite the lab’s data in their research.
“We had an initial conversation about citations,” recalls Anna Sackmann, the Library’s data services librarian. “But that snowballed into a broader discussion on where the data lived, how people found it, and how we might make it accessible to more people by publishing it online.”
Schwartz, who was new in his position at the time, was excited by the prospect. He saw the data as a treasure trove for researchers, with information that could provide a deeper understanding of climate change and, with luck, nudge us closer to solutions.
“Coming into the job, I knew there was a lot of desire in the community to have this data online,” he says. “It’s a wonderful, long-standing dataset, one of the longest in the world, and I needed to find the right repository to make it available to researchers and the public.”
Sackmann ultimately recommended Dryad, a research and data repository that makes data easily discoverable and downloadable online. The system also attaches a license to the dataset, making it convenient for users to cite and for owners to track citations.
“I love talking about publishing data, because the technical side of it is relatively easy,” Sackmann says. “It’s a pretty simple process, but it solves so many problems.”
Schwartz was quickly convinced of Dryad’s value and adopted the system shortly after the initial phone call. In the first year, the uploaded data was viewed more than 1,000 times, and downloaded over 280 times.
He says the built-in analytics are especially helpful in understanding how often and when the set is being used. He has also saved a lot of time, with researchers accessing the data online rather than making requests by email.
The lab, founded in 1946, is the only continuously staffed snow research facility in the western United States. The 1970-2019 dataset comes from measurements that were taken at the lab at the same time and location each day, ensuring consistency over five decades. That makes the data particularly valuable to researchers, according to Sackmann.
“Data collection is very time- and resource-intensive,” she says. “So it’s more efficient for the researcher if they can use data that’s already been collected.
“If we have all the researchers in the entire Mountain West using one dataset, which has been consistent, with rigorous measurement methods applied, that leads to better science, clearer, more reliable research, and time saved.”
Partners in research
For Salwa Ismail, the associate university librarian for digital initiatives and information technology, the snowpack data project is an example of the Library Data Services Program’s vision in action.
“The program supports faculty, staff, and student researchers through the entire data lifecycle, either working with them directly or by giving them excellent referrals to other resources on campus,” she says. “We are here to help them find, preserve, and share data, so their work can be discovered and used by others.”
Ismail cites two other recent examples. The team is helping a faculty member develop and analyze a dataset for research on how racism is discussed in medical literature. Team members also assisted an undergraduate student in locating census data for Argentinian municipalities.
Ismail says scholars can reach out to the team no matter where they are in their research data journey to get the support they need.
The support Schwartz received from the program turned out to be a steppingstone to getting even more data out into the world. Later this year, 20 new instruments for data collection will be installed at the lab. He expects that data to be available in real time and says it will be uploaded to Dryad as well. There are also 10 file cabinets at the lab, filled with items such as old photos, whose contents will be scanned and uploaded online.
That means more resources for the scientists, scholars, and reporters studying California’s Sierra Nevada snowpack — which accounts for about one-third of the state’s water supply — and others around the world, who simply want to see the numbers with their own eyes.