{"id":6,"date":"2024-04-04T07:10:46","date_gmt":"2024-04-04T04:10:46","guid":{"rendered":"https:\/\/sisu.ut.ee\/datasearch\/data-search-0\/"},"modified":"2024-04-04T07:10:54","modified_gmt":"2024-04-04T04:10:54","slug":"data-search-0","status":"publish","type":"page","link":"https:\/\/sisu.ut.ee\/datasearch\/data-search-0\/","title":{"rendered":"Data Search"},"content":{"rendered":"<p>\n\t<strong>Where and how to search open data depends very much on the users, their research areas and information needs. Data search is based on the standard metadata published by the researcher.<\/strong>\n<\/p>\n<p>\n\tSimilar to searching in article databases, data registers can be searched by the author, title and keywords.\n<\/p>\n<p>\n\tBesides these bibliographical metadata, the type of the data is also very important:\n<\/p>\n<p>\n\t<strong>The astronomer<\/strong> needs long-term observation data, which is stored directly from the instruments in a disciplinary repository; the <strong>data is dynamic and large-scale<\/strong>.<br>Developers of <strong>artificial intelligence<\/strong> need <strong>big data<\/strong> for machine learning.<br>In <strong>medicine<\/strong>, for example, <strong>medical imaging\u00a0files and 3D images<\/strong> are needed, not to mention the patients\u2019 <strong>health data<\/strong>.<br>In <strong>archeology, field\u00a0diaries, photographs, artifacts<\/strong> are of interest.<br><strong>Social scientists<\/strong> are interested in <strong>questionnaires, survey data, interviews and video materials<\/strong>.<br>In the <strong>humanities<\/strong>, research is often based on previously published <strong>publications and manuscripts<\/strong>.\n<\/p>\n<p>\n\tUsually, several different types of data are collected in one and the same research project. For example, when studying hurricanes, the data types include videos, images, location data, tables with measurement results, etc.\n<\/p>\n<h2>\n\tMetadata<br>\n<\/h2>\n<blockquote>\n<p>\n\t\tMetadata is data about data.<br>Metadata provide context and provenance for research data.\n\t<\/p>\n<\/blockquote>\n<p>\n\tThere are different\u00a0types of metadata, but for search, the <strong>descriptive<\/strong>\u00a0<strong>bibliographical metadata<\/strong> mentioned above prove to be the most important:\n<\/p>\n<ul>\n<li>\n\t\tauthor\n\t<\/li>\n<li>\n\t\ttitle\n\t<\/li>\n<li>\n\t\tkeywords\n\t<\/li>\n<li>\n\t\tyear of publication.\n\t<\/li>\n<\/ul>\n<p>\n\tOnce a database of interest has been identified on the basis of these characteristics, <strong>technical metadata<\/strong> should be considered:\n<\/p>\n<ul>\n<li>\n\t\tdata types\n\t<\/li>\n<li>\n\t\tfile sizes\n\t<\/li>\n<li>\n\t\thow the files are organized\n\t<\/li>\n<li>\n\t\twhether there are encrypted files\n\t<\/li>\n<li>\n\t\twhat software has been used\n\t<\/li>\n<\/ul>\n<p>\n\t<strong>Administrative metadata<\/strong> provides information on how the database can be re-used:\n<\/p>\n<ul>\n<li>\n\t\tproject and responsible executors\n\t<\/li>\n<li>\n\t\twho is the owner of the data\n\t<\/li>\n<li>\n\t\tlicenses\n\t<\/li>\n<li>\n\t\taccess restrictions\n\t<\/li>\n<li>\n\t\tembargo period\n\t<\/li>\n<li>\n\t\tcontacts\n\t<\/li>\n<\/ul>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t<strong>Each database is accompanied by a text file, README.txt, which describes the database in\u00a0natural language.<\/strong> In many ways, this file repeats metadata, but goes deeper into the data descriptions with the aim of making the\u00a0database understandable to other researchers. It may explain the principles of naming the files, the file structure, encodings, and special file formats.<br>The README.txt file also refers to the research methods, the hardware and software, and the instruments and their specifications used, to make it possible to reproduce\u00a0the research.<br>The long-term storage and data sharing\u00a0is described in more detail, especially if the data cannot be shared for some reason or access has been restricted.<br>The file should list all the standards used (data standards, metadata standards, security standards, etc.).\n<\/p>\n<p>\n\tMetadata can be used to determine whether FAIR data is human-readable and machine-readable at the same time.<br>Equipped with such information, it is possible to decide whether the database can be useful and only then to start downloading the data.\n<\/p>\n<p>\n\t<strong>Metadata standards<\/strong>\n<\/p>\n<p>\n\t<strong>Metadata is the structured machine-readable information; such information is easy to standardize and process on a computer, which is the basis of how a search engine works. The more metadata describes the dataset, the easier it is to find and understand the dataset.<\/strong>\n<\/p>\n<p>\n\tDue to the fact that data from different research fields\u00a0are very different, different characteristics are also needed to describe them.<br>Let us take, for example, phonetic research. The data include audio recordings of a speaker of a specific language, which can later be explored from many aspects. In addition to subject metadata (language, dialect), the metadata could also include:\n<\/p>\n<ul>\n<li>\n\t\tinformation about the speaker (gender, age, place of residence, origin, social status, state of health)\n\t<\/li>\n<li>\n\t\tinformation about recording conditions (weather, background noise, distractions)\n\t<\/li>\n<li>\n\t\ttechnical information (storage devices, software, quality indicators)\n\t<\/li>\n<\/ul>\n<p>\n\tBased on this metadata, for example, an ethnologist can decide that this data is also useful in ethnology research.\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\tSuch domain-specific features are collected and structured in professional metadata standards.\n<\/p>\n<blockquote>\n<p>\n\t\tA\u00a0metadata standard\u00a0is a requirement which is intended to establish a common understanding of the data.\n\t<\/p>\n<\/blockquote>\n<p>\n\tMany registers allow you to limit search results to a metadata standard, so it\u2019s a good idea to be aware of the metadata standards in your field.<br>Some examples of metadata standards:\n<\/p>\n<p>\n\t<a data-url=\"http:\/\/www.ddialliance.org\/\" href=\"http:\/\/www.ddialliance.org\/\" target=\"_blank\" title=\"\" rel=\"noopener\">DDI \u2013 Data Documentation Initiative<\/a>: standard for social sciences and economics<br><a data-url=\"http:\/\/www.spase-group.org\/data\/\" href=\"http:\/\/www.spase-group.org\/data\/\" target=\"_blank\" title=\"\" rel=\"noopener\">SPASE Data Model<\/a>: astrophysics<br><a data-url=\"http:\/\/fged.org\/projects\/miame\/\" href=\"http:\/\/fged.org\/projects\/miame\/\" target=\"_blank\" title=\"\" rel=\"noopener\">MIAME\u00a0standard<\/a>: DNA microchip-technology<br><a data-url=\"https:\/\/www.historicengland.org.uk\/images-books\/publications\/midas-heritage\/\" href=\"https:\/\/www.historicengland.org.uk\/images-books\/publications\/midas-heritage\/\" target=\"_blank\" title=\"\" rel=\"noopener\">MIDAS-Heritage<\/a>:\u00a0standard of cultural heritage objects (buildings, sites, shipwrecks, parks, gardens, artifacts).\n<\/p>\n<p>\n\tIn addition to subject-specific\u00a0standards, more general standards have been developed to meet the needs of a very large number of users.<br>Probably the best known of these is the <a data-url=\"http:\/\/dublincore.org\/groups\/tools\/\" href=\"http:\/\/dublincore.org\/groups\/tools\/\" target=\"_blank\" title=\"\" rel=\"noopener\">Dublin Core standard<\/a>, which is easy to understand and implement in information systems. The Dublin Core standard is also used by the data repository DataDOI managed by the UT library; for example, see the metadata of a dataset:<a href=\"http:\/\/dx.doi.org\/10.15155\/re-34\">http:\/\/dx.doi.org\/10.15155\/re-34<\/a>\n<\/p>\n<p style=\"text-align: center\">\n\t<img loading=\"lazy\" decoding=\"async\" width=\"686\" height=\"840\" class=\"alignnone wp-image-12\" src=\"https:\/\/sisu.ut.ee\/wp-content\/uploads\/sites\/487\/capture19.png\" title=\"capture19.png\" alt=\"DataDOI_eng\" srcset=\"https:\/\/sisu.ut.ee\/wp-content\/uploads\/sites\/487\/capture19.png 686w, https:\/\/sisu.ut.ee\/wp-content\/uploads\/sites\/487\/capture19-245x300.png 245w\" sizes=\"auto, (max-width: 686px) 100vw, 686px\">\n<\/p>\n<h2>\n\tWhere to find data<br>\n<\/h2>\n<p>\n\t<strong>First of all, you should think about where and how to look for data, and plan a strategy.<\/strong> There are several ways to access research data, you need to be able to recognize and use these possibilities. In general, the data is storaged in data repositories and we look at them in the next section. Besides searching data repositories and data registers, information about data availability\u00a0can be found in academic\u00a0journals.\n<\/p>\n<p>\n\t<strong>Information about data in an article<\/strong>\n<\/p>\n<p>\n\tAs many research funders and\u00a0 publishers require that the underlying data of an article should be published together with the article, it is the easiest way to <strong>find out whether the article and data are linked.<\/strong> A persistent identifier for the article and data, leading directly to the data, is used for linking.<br>The data, methods and code can be found in the article as <em>supplemental material<\/em> or <em>supporting information<\/em>, or explicitly in the <em>Data and code availability<\/em> section.<br>Some academic publishers require a <em>Data Availability Statement<\/em>\u00a0(DAS) with the article, such as required by the Taylor &amp; Francis publishers: A data availability statement (also sometimes called a \u2019data access statement\u2019) about the data associated with a paper specifies conditions under which the data can be accessed. They also include links (where applicable) to the dataset.\n<\/p>\n<p>\n\tAn example of linked data from PLoS ONE:\u00a0<a href=\"https:\/\/doi.org\/10.1371\/journal.pone.0230416\" target=\"_blank\" title=\"\" rel=\"noopener\">https:\/\/doi.org\/10.1371\/journal.pone.0230416<\/a>\n<\/p>\n<table class=\"table table-hover\" align=\"center\" border=\"1\" cellpadding=\"1\" cellspacing=\"1\">\n<tbody>\n<tr>\n<td>\n\t\t\t\t<img loading=\"lazy\" decoding=\"async\" alt=\"PLOS ONE\" height=\"350\" src=\"https:\/\/sisu.ut.ee\/sites\/default\/files\/andmeotsing\/files\/capture4.png\" title=\"\" width=\"478\">\n\t\t\t<\/td>\n<td>\n<p>\n\t\t\t\t\t\u00a0\n\t\t\t\t<\/p>\n<p>\n\t\t\t\t\t\u00a0\n\t\t\t\t<\/p>\n<p>\n\t\t\t\t\t\u00a0\n\t\t\t\t<\/p>\n<p>\n\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" alt=\"citation\" height=\"294\" src=\"https:\/\/sisu.ut.ee\/sites\/default\/files\/andmeotsing\/files\/capture5.png\" title=\"\" width=\"539\">\u00a0\n\t\t\t\t<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t<strong>Data journals<\/strong>\n<\/p>\n<p>\n\t<strong>Data journals publish peer-reviewed data articles<\/strong>, i.e. articles about data but not the results of data analysis. This type of an article gives the researcher the opportunity to describe their dataset\u00a0in more detail, for example to explain the methods of data collection. The data article is certainly of great benefit to researchers who would like to reuse the data, but also to the researcher who published the data article, as the number of citations increases.<br>There are several disciplinary data journals, such as:\n<\/p>\n<p>\n\t<a href=\"https:\/\/www.nature.com\/sdata\/\" target=\"_blank\" title=\"\" rel=\"noopener\">Nature Scientific Data<\/a><br><a href=\"https:\/\/bdj.pensoft.net\/\" target=\"_blank\" title=\"\" rel=\"noopener\">Biodiversity Data Journal<\/a><br><a href=\"https:\/\/brill.com\/view\/journals\/rdj\/rdj-overview.xml\" target=\"_blank\" title=\"\" rel=\"noopener\">Research Data Journal for the Humanities and Social Sciences<\/a><br><a href=\"https:\/\/openarchaeologydata.metajnl.com\/\" target=\"_blank\" title=\"\" rel=\"noopener\">Journal of Open Archaeology Data (JOAD)<\/a><br><a href=\"https:\/\/openhealthdata.metajnl.com\/\" target=\"_blank\" title=\"\" rel=\"noopener\">Journal of Open Health Data<\/a>\n<\/p>\n<p>\n\t<strong>Data repositories and data registres: see next sections<\/strong>\n<\/p>\n<h2>\n\tSuccessful data search<br>\n<\/h2>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t<strong>If the data search has led to datasets of interest, these must be thoroughly studied and their quality and reusability assessed.<\/strong><br>The README.txt file and all metadata offer much help. If we start to delve into them, we can find many good but also bad examples.<br>Metadata should provide sufficient information so that you would download a dataset only when you are absolutely sure that you want to explore or reuse it.\n<\/p>\n<p>\n\tThe following article provides some tips for effective data retrieval:\n<\/p>\n<p>\n\tGregory K, Khalsa SJ, Michener WK, Psomopoulos FE, de Waard A, Wu M (2018) Eleven quick tips for finding research data. PLoS Comput Biol 14(4): e1006038.\u00a0<a href=\"https:\/\/doi.org\/10.1371\/journal.pcbi.1006038\" target=\"_blank\" title=\"\" rel=\"noopener\">https:\/\/doi.org\/10.1371\/journal.pcbi.1006038<\/a>\n<\/p>\n<ul>\n<li>\n\t\tTip 1: Think about the data you need and why you need them.\n\t<\/li>\n<li>\n\t\tTip 2: Select the most appropriate resource.\n\t<\/li>\n<li>\n\t\tTip 3: Construct your query strategically.\n\t<\/li>\n<li>\n\t\tTip 4: Make the repository work for you.\n\t<\/li>\n<li>\n\t\tTip 5: Refine your search.\n\t<\/li>\n<li>\n\t\tTip 6: Assess data relevance and fitness-for-use.\n\t<\/li>\n<li>\n\t\tTip 7: Save your search and data-source details.\n\t<\/li>\n<li>\n\t\tTip 8: Look for data services, not just data.\n\t<\/li>\n<li>\n\t\tTip 9: Monitor the latest data.\n\t<\/li>\n<li>\n\t\tTip 10: Treat sensitive data responsibly.\n\t<\/li>\n<li>\n\t\tTip 11: Give back (cite and share data).\n\t<\/li>\n<\/ul>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Where and how to search open data depends very much on the users, their research areas and information needs. Data search is based on the standard metadata published by the researcher. Similar to searching in article databases, data registers can &#8230;<\/p>\n","protected":false},"author":78,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"class_list":["post-6","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/pages\/6","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/comments?post=6"}],"version-history":[{"count":1,"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/pages\/6\/revisions"}],"predecessor-version":[{"id":35,"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/pages\/6\/revisions\/35"}],"wp:attachment":[{"href":"https:\/\/sisu.ut.ee\/datasearch\/wp-json\/wp\/v2\/media?parent=6"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}