Citing Data

Proper citation should be a significant feature of any publication that references anything created by someone else. It should also include references to any primary or secondary data sources used.

Making a data collection citable and encouraging users to cite it:

  • acknowledges the author's resources
  • promotes the reproduction of research results
  • makes easier to find data
  • allows the impact of data to be tracked
  • provides a structure that recognizes and can reward data creators

There aren't any strict standards for citing data so far. Some of the publishing houses have started to provide guidance on citing databases and everyone has to learn them one by one when publishing.

However, there are some mandatory fields:

  • author or creator of the dataset
  • title of the data collection
  • year of publication of the dataset
  • edition or version
  • publisher (data center)
  • access information, persistent identifier, DOI

For example:

Moosus, M.; Maran, U.; (2014): Moosus, M.; Maran, U. Quantitative structure-activity relationship analysis of acute toxicity of diverse chemicals to Daphnia magna with whole molecule descriptors. SAR and QSAR in Environmental Research 2011, 22, 7-8, 757–774.; QsarDB repository (

Evans, Helen F; Channell, James ET; Sager, William W; (2005): (Tables T2, T3) Magnetostratigraphic and astronomically tuned age models for ODP Leg 198 sites; PANGAEA - Data Publisher for Earth & Environmental Science.

Lakens, D.; (2013): Heart rate changes during relived happiness and anger measured with a smartphone app relying on photoplethysmography; Technische Universiteit Eindhoven.

The registry of DataCite provides citation in two ways: The most used formats (APA, Harvard, MLA, Vancouver, Chicago, IEEE, BibTeX, RIS) are available just with the dataset or one can use Citation Formatter for more than 5000 extra formats.

Take also look at  DataCite and DOI.