Research Data Management and Publishing
Data repositories
Research data repositories are data storage facilities which store and curate data sets, ensuring their long-time preservation and access to the data.
When planning their data management, it is important for researchers to decide upon the repository where they will store their data.
Repositories are registred in re3data.
There are four main types of repositories:
Universal repositories
Repositories of this type accept data in all formats. However, in case of very specific formats it is more advisable to use subject repositories.
- Zenodo is a research data repository that is financed by the EU foundations and all researchers may upload their data (up to 50GB).
- Figshare is also a universal repository, allowing to upload data sets, articles, videos, posters, code, etc.
- Open Science Framework – is a repository and data management platform able to interoperate with Dropbox, GitHub and Zotero
Additional reading: General Repository Comparison is a tool for researchers to help select the best general repository for their data or other digital objects.
Disciplinary repositories
Repositories of this type are essential to researchers because they support more specific formats and subject metadata standards. The number of such repositories is relatively large, here are a few examples of them:
- DataOne, PANGAEA – environmental data and ecology
- OpenTrials – medicine
- GESIS, ICPSR – data archive of social sciences
- tDAR – archaeology
- COD, QsarDB – chemistry
Sometimes research funders publish a list of accepted repositories, let us take a look at biomedical repositories as an example. The list includes information how to submit and access the data.
National repositories
- UK Data Service – in the United Kingdom collects and preserves public open data (population census, health records, data from long-term studies, social and economic data)
- ANDS – Australian national repository of research data
Institutional repositories
These repositories are provided by universities.
A recent survey shows that researchers generally like to use their university repository, but are just as likely to use universal repositories (international + general-purpose on this graph):
Resource: European Commission, Directorate-General for Research and Innovation, European Research Data Landscape – Final report, Publications Office of the European Union, 2022, https://data.europa.eu/doi/10.2777/3648
University of Tartu repository DataDOI is a DSpace-based platform for permanent storage of research data. If a researcher cannot find a subject-based environment, which could ensure the preservation their data, it is possible to upload it on DataDOI; the data will also be assigned a DOI and metadata will be registered via DataCite.
The preservation and accessibility of data is ensured by the University of Tartu Library, which is the administrator of the repository. The administrator also guarantees updating of the repository software, its compliance with standards, the functioning of data exchange, etc. The target groups of this repository are single researchers and subunits/work groups at the UT and outside, handling the so-called long tail data, whose volume is not very large. The UT researchers can use DataDOI free of charge.
Other data repositories, which are members of the DataCite Estonia Consortium are listed on the DataCite Estonia web page. All these repositories can use the same services as DataDOI.
If you do not use the above-shown data centres, you should indicate the trustworthiness of the repository you have chosen.
Quality repositories can be searched with the help of Repository Finder where you will get a list of repositories preserving the FAIR data.
Several journals also recommend data repositories that meet the quality requirements of the journal, such as Nature Scientific Data Recommended Data Repositories.
Another quality indicator of a repository is the Core Trust Seal certificate. The certificate takes into account the organisational and technical structure of the repository, the level of data curation, etc.
Data Preparation
In order to decide which data should be stored in a repository, you should carefully consider the objective and the period of preservation of your data.
Among the objectives of storing the data can be the requirements of the grant giver of the journal where you plan to publish, the need for validation, using of the data in teaching, etc.
In general, repositories promise to store and curate the data for ten years. During this time, they ensure the accessibility, interoperability and reusability of data. Naturally, data can be stored for a much longer period, but in such case, not all functions may work anymore.
Special attention should be paid to the protection of personal data to make sure that the data were anonymised.
Data can be deleted, and sometimes it even has to be deleted, but there should be an explanation about the reasons for such actions.
Additional reading for those who are about to preserve their data in a repository:
Lahtinen, T., Mela, M., Mäkelä, M., Nurmi, N., & Kuusniemi, M. E. (2023). How to become a data preserver: The official University of Helsinki guide to the responsible preservation of research data (2.0). Zenodo. https://doi.org/10.5281/zenodo.10424017
Services of Data Centres
Large repositories often offer software for processing more common formats. New formats may pose a problem and make data curation more complicated.
In some cases, the use of data format can be negotiated between the researcher and the repository so that the main objective – data preservation and access to it – were achieved and the researcher’s needs were met.
As a rule, data processing will still take place after downloading the data.
Repositories can also offer bibliometric data and statistics concerning the downloading and visualisation of a data set.
Repositories can help researchers in selecting the most appropriate licences.
In general, repositories use a certain type of deposition licence, specifying the rights that the owner of the data transfers to the repository; this will ensure the quality of data curation. Uploading of data in a repository does not change the ownership of the data.
Licencing is one of the most important services offered by repositories. This helps the researcher to save much time as both the depositor and the user of the data (compared to asking the consent of every single author when using some other method for data sharing and obtaining).
The owner of the data should create an account and register in order to use the repository, and choose the most suitable type of licence.
It is also necessary to make sure that the data set and its related documents were similarly licenced.
Web pages of repositories include conditions for use and instructions for data uploaders, which the researchers should get familiar with even before starting to collect their data.
The last phase is the preparation of data for long-time preservation.
Margaret Levenstein, director, ICPSR (Inter-university Consortium for Political and Social Research), spoke about the importance of machine-readable DMPs and PIDs for enhancing research practices of graduate students and faculty as well as the usefulness for planning repository services.
OpenAIRE
OpenAIRE is a European Union project, supporting Open Access and Open Data and harmonising the Open Science policies in Europe by building the e-infrastructure and the European Science Cloud (EOSC).
All projects, publications and data financed by the European Commission, and the financers, have to be visible and searchable on the OpenAIRE portal, interlinked with each other. For storing the results, OpenAIRE offers the services of the repository Zenodo, but the data can be stored in any repository which is able to assign DOIs.
Digital Object Identifier DOI
DOI (Digital Object Identifier) is a series of numbers, characters and symbols that persistently and unchangeably identifies an article, document, dataset, e-book etc. and refers to it on the web. For example, dataset “E-raamatute eeltöödeldud ja lemmatiseeritud failid” has a DOI http://dx.doi.org/10.15155/re-46 which refers to a web address http://datadoi.ee/handle/33/76. If, for some reason, this web address changes (for example, data repository is moved to a new domain) then DOI stays unchangeable and guarantees that dataset is always findable and accessible from the right web address.
Estonian data repositories can assign DOIs to data sets via the DatCite Estonia Consortium. Have a look at the data centres which have the right to assign DOIs and if possible, use these centres: DataCite Estonia data repositories.
How to get DOI for your research data?
In Estonia, research data is registered by the DataCite Estonia member universities. The DataCite services are free of charge for the researchers of these universities.
Researchers of other institutions can get DOIs according to the price list.
Researchers should take the following steps:
- Organise their data
- Provide metadata for their dataset
- Find a suitable data centre (subject-based or institutional
- Upload their data following the instructions
- In case of questions or problems contact the manager of the data centre
Additional information and special training are offered by specialists at member university libraries of by the UT Library as a DataCite member.