Research Data Management and Publishing
Open Data Search
The final stage and probably the first stage of a new project in the research data lifecycle is data re-use. This means that data already published by someone should be found, accessed and understood in order to be used in secondary research. Once machine-readable FAIR data is found, it can be integrated with your systems.
Where and how to search for open data depends very much on the users, their specialty and information needs. The main search fields in data registers are author, keywords and data type.
For example, an astronomer needs data from long-term observations, which are downloaded directly from instruments to a disciplinary repository, and the data is dynamic and large-scale.
Developers of artificial intelligence need big data for machine learning.
In medicine, for example, diagnostic imaging files and 3D images are needed, not to mention patients’ health data.
In archeology, field work diaries, photographs, artifacts and questionnaires in social sciences, survey data, interviews, video material are of interest.
In the humanities, research is often based on previous publications and manuscripts.
First of all, you should think about what and how to search, plan a strategy.
1. The easiest way is to check whether the data has been published or linked with the article and then move on to the repository to download the data if it is necessary.
Data, methods and code can be found in the article as supplemental material or supporting information.
2. However, the publication of data is a relatively recent practice, so in the case of older articles, the data may not be available or the underlying data is not mentioned at all, only the analyzed data and results are described in the article.
In this case, it is possible to use networks of researchers and contact the authors directly to request data. Several studies have shown that researchers have long been willing to share their data, but they know that data is not properly managed, so they do not want to publish it.
3. Data repositories. In the previous study material, we considered repositories like archives, where researchers can store scientific data for a long time.
Here we count on repositories as sources of open data.
Effective data search
If the search has led to some datasets of interest, these must be thoroughly examined and the quality and reusability of the database assessed.
The README.txt file and all metadata are helpful. If you start to delve into them, you can find a lot of good and bad examples from which to learn how to publish your own data in good way.
Metadata should provide so much information that you don’t need to download it until you’re absolutely sure you want to explore or use it.
The following article provides some tips for effective data retrieval; you can delve into each point when you open the article:
Gregory K, Khalsa SJ, Michener WK, Psomopoulos FE, de Waard A, Wu M (2018) Eleven quick tips for finding research data. PLoS Comput Biol 14(4): e1006038. https://doi.org/10.1371/journal.pcbi.1006038
- Tip 1: Think about the data you need and why you need them.
- Tip 2: Select the most appropriate resource.
- Tip 3: Construct your query strategically.
- Tip 4: Make the repository work for you.
- Tip 5: Refine your search.
- Tip 6: Assess data relevance and fitness -for -use.
- Tip 7: Save your search and data- source details.
- Tip 8: Look for data services, not just data.
- Tip 9: Monitor the latest data.
- Tip 10: Treat sensitive data responsibly.
- Tip 11: Give back (cite and share data).
EOSC
It is obvious that the search for open research data is currently a rather long and inconvenient process. To reduce this burden in the future, the European Commission is developing the European Open Science Cloud.
The idea behind the European Open Science Cloud (EOSC) is to provide free and open virtual cloud environment for storing, managing and reusing research data.
EOSC would consolidate and connect existing scientific data e-infrastructures and create equal possibilities for all researchers and disciplines by providing federated search of FAIR data.
The University of Tartu, the Library and the High Performance Computing Centre are currently actively involved in the EOSC-Nordic project.
Additional reading:
In April 2020 there was a virtual seminar “Where, How and Why – Essential Connections of Research Data“ in the UT library.
An Estonian Andreas Veispak is a director of eInfrastructure and Science Cloud (Unit C.1) in European Commission.
His slides about open science, FAIR data and EOSC: a_veispak_eosc.pptx
Videorecording of the presentation: rda_seminar_06_04_2020_estonia.mp4