Abstract


The University of Melbourne is currently investing in several projects to improve long-term data curation, and implement digital preservation processes. There are many imperatives to improve processes for long-term digital preservation of research data, including the need to provide ongoing accessibility for the data for reproducibility, and to take advantage of the economic and societal advantages that come from re-using data. However, digital preservation is a complex field that requires input from various differing stakeholder perspectives, and ongoing, sustainable funding.

Given there are many different elements required for data curation, of which digital preservation is just one important element, it can be a challenge to get started, and the issues involved can seem overwhelming. We describe how we are examining the current state of digital preservation awareness and activity at the University of Melbourne, how the findings from this are helping to define an ideal future state for long-term data curation improvements, and how we are developing actions to achieve this. We describe our experience using a capability development model, and show how we simplified this model, with potential benefits for other organisations seeking to get started with data curation improvements.

Improvements for data curation need to be based on strategic objectives, and an evaluation of current activities is useful for planning an ideal future state that meets the requirements of different stakeholders. To complete this evaluation comprehensively, the use of the CESSDA-SAW Capability Development Model (CESSDA-CDM) is being explored. The CESSDA-CDM was developed by the Consortium of European Social Science Data Archives. Capability Development Models help plan for transforming ad hoc processes into more effective, well- defined steps.

Although the focus of the CESSDA-CDM model is on social science research data, it has wider application, and is especially useful for a large research organisation seeking structured elements with which to describe and assess effective data curation processes. It can be used to “set process improvement goals and priorities, provide guidance for quality processes and activities, and provide a benchmark for assessing and appraising current practices”.

Capability development models can be complex and high level, and as a result, can be difficult for time- and resources- poor staff to implement. We describe our process of creating a simplified practitioner-focussed working version of the CESSDA-CDM, and explain how we can disseminate this via open platforms such as GitHub and figshare for use by others starting out in digital preservation improvement initiatives. This work will be of interest to other research institutes and cultural heritage organisations seeking ways to improve long-term data curation. We also explore how best to ensure that our outputs are sustainable for the long term, in contrast to some projects (even those with substantial funding and a digital preservation focus) that do not have either time or resources to invest in long-term thinking for dissemination of research outputs at the beginning of projects.



Rough content for full paper
1.     Background to UoM, the need, and the identified plan to improve: operationalising the big picture
2.     Data curation vs digital preservation: useful to define what we’re talking about (https://blog.cosector.com/meanings-of-data-curation)
2.     Overview of CMMs, why they’re useful, how using these models can help plan for organisational improvements in data curation (https://connect.library.utoronto.ca/plugins/servlet/mobile?contentId=37686407#content/view/37686407)
3.     Our problems and how they map to strategic objectives
4.     Why we need a practitioner simplified model
5.     What’s the end goal: what’s the action? The models are missing this – the action and the pathways to take from assessment to action
6.     How we made the practitioner version of the CESSDA-CDM, our process etc.
7.     Describing how we share, disseminate the model: github to figshare.Melbourne, csv, other outputs generated from the model, open, FAIR principles, etc.
8.     Possible testing process: testing the CESSDA-CDM and our simplified version on a small group of people, comparing results
9. Sustainability issues: lots of big brilliant thinking goes into these big funded projects, then often the outputs are published only in web publications or PDFs (where data goes to die), perhaps time and effort needs to be set aside at the start of these projects to think about how the outputs could be disseminated for wider testing and use/takeup. Something we want to explore is ensure we’re creating outputs that are sustainable for the long term in contrast to other large/some projects, even those with substantial funding and a digital preservation focus.


Possibly to do: 
Investigate ISO16363 Trustworthy Repositories Audit Criteria in more detail: what does CESSDA offer that's not in this?
dfdfdfdfdfdf

Process