Research Data Management

Research data management (RDM) can help you keep your data organized, well-documented, and secure so that you can easily find, understand, share, and reuse it at any time. This guide provides a brief introduction to research data, RDM practices (for efficient data organization, documentation, storing, sharing, and RDM planning), and commonly accepted FAIR Principles. It includes recommendations for creating a data management plan and sharing data using repositories. Links in this guide will navigate you to additional information, tools, support, and resources to maximize the efficiency and quality of your research process.

Research data is any information or material that has been collected, used, or generated during the research process. Research data is needed to produce, support, or validate research findings, and it provides the evidence for published results.

Research data can take many different forms (both digital and physical), including numerical data, images, text documents, software code, audio recordings, videos, surveys, protocols, samples, and many more. Forms and specifications for data can vary across fields and disciplines (e.g., natural sciences, life sciences, social sciences, arts and humanities).

Why Manage Research Data?

Research data is a valuable resource that typically requires a lot of work, time, money, and effort to produce. Therefore, it is important to manage your data properly to keep it secure and organized. Well-managed data is easy to find, access, understand, use, or reproduce, even over time and by others. Research data management (RDM) can make your research process more efficient and it is often required or recommended by institutions, publishers, or research funders.

If you need assistance with your RDM or have any other questions, please contact us for an individual consultation. We also regularly offer a webinar “Introduction to Research Data Management”. For more information about upcoming webinars, please see our webinar schedule.

Key benefits of RDM include:

  • Organized, secure, smooth, and efficient research process, including:
    • Ability to locate, identify, and understand data quickly
    • Efficient and secure sharing of data with colleagues and collaborators
    • Improved data security (e.g., less risk of data loss, leaks, or unwanted disclosures)
    • Saving time and resources (e.g., by eliminating inefficient file searching or recollecting lost data)
    • Maintaining research integrity
    • Continuity of research in the long run for cases where the transfer of data between researchers is expected
  • Support of public data sharing (in accordance with Open Science and FAIR Principles) and related benefits:
    • Potential new uses of data (e.g., for new analyses or as a resource for education)
    • Improved replicability and reproducibility of research results
    • New opportunities for collaboration
    • Enhanced visibility and impact of research results
    • Increased transparency that helps to build trust in research findings
    • Improved validation of research results

Research Data Lifecycle

Research data passes through different stages during the research process. It can be described and visualized using a data lifecycle model (e.g., RDMkit, UK Data Service). This model is often used as a tool to help researchers map individual data stages, including data collection, processing, analysis, preservation, publishing, and reusing.

Deeper insight into each stage of the research process can reveal specific data-related requirements and identify appropriate data management practices to ensure efficient data organization, documentation, storage, and sharing. Deciding which practices and strategies to implement, when and how, should be done during the planning phase.

In reality, the research process is not necessarily so strictly ordered, and individual stages may occur simultaneously or some may be absent depending on nature of the data, project requirements, experiments performed, or standards in your discipline.

 
 

The FAIR Principles refer to a set of four fundamental guiding principles for research data (Findability, Accessibility, Interoperability, and Reusability) further described in fifteen detailed elements. The principles define what characteristics the data, metadata, tools, and infrastructures should have to improve data discovery and reuse. The principles are sufficiently general to be applied to a wide range of research outputs in all disciplines and do not prescribe any specific tools or technologies.

Adapted from Wilkinson, M.D. et al., 2016

Research funders increasingly require publicly funded data to be openly available (“as open as possible, as closed as necessary”) and in compliance with the FAIR Principles. Open data refers to data that is freely accessible and available for use for any purpose by anyone. There may be legitimate situations where data access needs to be restricted for legal, ethical, or security reasons (e.g., personal data protection, confidentiality, or intellectual property rights). The FAIR Principles do not require any data to be strictly open or fully available. However, when access needs to be restricted, this should be clearly specified.

There are several tools available online to help you make data “more FAIR” (e.g., FAIRification workflow, FAIRification framework, FAIRification process). You can also validate “how FAIR” your data is using various tools (e.g., F-UJI, FAIR DataSet Maturity assessment tool, FAIR data self-assessment tool, FAIR-Checker). The Data Stewardship Wizard tool shows FAIR metrics during the data management plan creation.

More information on the FAIR Principles can be found in, e.g., How to FAIR, GO FAIR, FAIRsFAIR, or FAIR Cookbook.

 
 

Research data management (RDM) is a set of practices, strategies, activities, tools, and techniques that ensure the proper organization, documentation, storage, and sharing of data during the research process. RDM helps keep your data secure and makes it easier for you and others to find, access, understand, use, or reproduce your data. RDM should cover the entire data lifecycle and is also associated with responsible data management planning.

Please note that RDM is an evolving field, and new recommendations and policies are emerging at the individual university, research institution, funder, and publisher levels. In addition, RDM is discipline-specific and there may be standards in your field that need to be followed (for more information see e.g., FAIRsharing).

If you need assistance with your RDM or have any other questions, please contact us for an individual consultation. We also regularly offer a webinar “Introduction to Research Data Management”. For more information about upcoming webinars, please see our webinar schedule.

At first, RDM may seem like a lot of extra work, however in the long run, it can save a lot of time and provide many benefits. There is no single right way to manage data; rather, you can incorporate a series of small routine practices specifically into your work to improve the efficiency and quality of your research process. Once you choose the right practice for you, it is essential to use it consistently.

Some RDM practices and strategies are described in the following sections.

Data management planning should start in the early stages of the research project as part of the project design. During planning, all data-related activities should be considered in detail, such as solutions for data storage and documentation, plans for data sharing, publication, and long-term preservation as well as potential legal and ethical issues. Creating a data management plan (DMP) can help address these aspects. Moreover, a DMP may be required by research funders when applying for funding.


Data organization includes using a logical folder structure and a consistent system for file naming and versioning to help locate and identify data easily.

For example, it is recommended that file names be short but meaningful, without spaces and special characters. Also, using version numbers or dates (always in the same format) in file names allows you to keep track of file modifications and to sort files accordingly (there are tools and software for automatic version control; e.g., Git).

More tips and suggestions about file naming, versioning, and data organization are provided by, e.g., the University of Ottawa, the University of Edinburgh, RDMkit, and Mendel University.


Data documentation should provide clear and complete information to ensure that the data can be understood, reused, reproduced, or replicated by you (or others, when sharing).

During a research project it is important to record all details about data collection, processing, and analysis (e.g., samples, materials, experimental methods and procedures, and instruments and software used), usually using reporting protocols and paper or electronic lab notebooks (ELNs). ELNs (e.g., Kadi4Mat, openBIS, Chemotion, eLabFTW, Jupyter Notebook, or NOMAD) are software tools that help to document, organize, store, and share the data, notes, and protocols more efficiently.

For each dataset it is a good practice to create a README file providing all relevant information about the dataset (e.g., list of files and description of their contents) and store it along with the dataset (for further details see, e.g., Cornell’s guide to writing README, Harvard Medical School’s guide, the MIT README sample and template, or the Great Learning Blog).

Explanations of abbreviations, codes, symbols, variable names, or units of measurement used during the project can be embedded directly within data files or kept separately in a codebook or data dictionary (for guides and examples see, e.g., McGill’s Codebook cookbook or OSF guide on How to make a data dictionary).

Metadata description provides information about data (including e.g., dataset title, creator, description, keywords), typically in a structured and defined format that enables findability of data when it is deposited in a public repository.


A reliable data storage and backup system as well as strategies should be in place to ensure data security and to protect data from potential loss, damage, unauthorized access, or unwanted disclosures.

It is important to distinguish between the storage of frequently accessed data that is in constant use during the active phases of a research project (data collection, processing, and analysis), and the long-term preservation of data, where further modifications are not expected (e.g., deposition in a data repository).

Regular data backups should be ensured (additional copies of data should be stored in various locations separated from your working files and accessed only to restore the original data in case of data loss or damage). One of the commonly used backup strategies is the 3-2-1 rule (i.e., keep 3 copies, on 2 different types of storage devices, with 1 copy off-site).

During the storage process, you can increase security of data by appropriate access controls or by data encryption.

There are several general guides that provide tips on data storage and security (e.g., ”Data storage and security” book chapter by C. Lewis) and on deciding what data to keep, for how long, and where (e.g., ”Five steps to decide what data to keep” section of DCC guide, “Preserving” section of RDMkit or Data Repositories tab).

Most universities and research institutions have own policies or methodological guidelines on how to store and secure data (e.g., the University of Chemistry and Technology Prague, Charles University, Mendel University, Masaryk University). These often include data categorization based on the level of sensitivity of the data, special regulation or protection requirements (e.g., legal or contractual), and the level of potential harm caused by data disclosure. These also provide an overview of data storage options and related recommendations for individual storage systems.

For academic researchers and students at research institutions in the Czech Republic, the CESNET Association offers data storage services for research purposes, provided that users comply with their Terms of Service. Services include storage environment for data backup, archiving, sharing, and other services such as Object storage, FileSender, or ownCloud.


During the research process, data is commonly shared with colleagues or collaborators working on the same project. It is also good practice to publicly share data to support research findings, ensure verification of results, enable data reusability and reproducibility, and otherwise benefit the scientific community.

In addition, data sharing is increasingly being required by research funders and individual funding programs (e.g., European Union funding programs, the Ministry of Education, Youth and Sports, the Czech Science Foundation, the Technology Agency of the Czech Republic) and by journals and publishers (e.g., Springer Nature, Wiley, PLoS). It is important to adhere to their Open Science and data sharing policies and guidelines.

Journals and publishers often require authors to deposit the data underlying a publication in an appropriate public data repository (usually when the associated manuscript is accepted for publication). An accepted publication should include a data availability statement (for examples, see Taylor & Francis or Cambridge University Press) with information on where and how to access data or an explanation of any access restrictions, if applicable.

Before sharing and publishing your data, it is important to make sure that you are allowed to do so. Any legal, ethical, contractual, or other potential issues related to your data should be considered (concerning e.g., intellectual property rights, personal data protection, confidentiality, and security of data). There may be legitimate reasons why some data cannot be shared or can only be shared under certain conditions (e.g., with data anonymization, data sharing consent, regulated use, or restricted access).

For each item of research data, appropriate access rights should be selected that specify who can access the data and under what conditions (summarized in, e.g., CESSDA, COAR).

When sharing data publicly, an appropriate license should be selected (using, e.g., EUDAT License Selector or Creative Commons License Chooser) and assigned to data, which defines terms for data use (summarized, e.g., by the Digital Curation Centre). In addition, when using data created by others, it is important to respect the terms of use.

To share data effectively, a detailed documentation should be provided to ensure that the data can be understood and used by others.

 
 

A data management plan (DMP) is a document that summarizes all the details of research data management for an individual research project. Before starting a new project, it is important to consider all data-related aspects to ensure efficiency of the research process, avoid or minimize problems, and anticipate how potential problems might be addressed. Creation of a DMP can help ensure that data is managed properly during all stages of the project and in accordance with FAIR Principles. In addition, when applying for funding, a DMP is increasingly required as a formal part of a grant proposal or at later stages of the project.

DMP Tools and Templates

Specific DMP templates may be required or recommended by individual research funders and funding programs (e.g., Horizon Europe) or some research institutions (e.g., the J. Heyrovský Institute of Physical Chemistry).

There are several online tools (e.g., DMPonline, Argos, or Data Stewardship Wizard) that can help you prepare a DMP for any project. In addition, these tools allow you to save and edit a DMP, share it with collaborators and export a final version into a required template. For instructions on how to use these tools, you can view one of the online tutorials (e.g., EOSC CZ webinar or DSW tutorials).

RDM planning is an active process that evolves over time and can change with new research findings; accordingly, a DMP should be updated regularly.

Topics often Included in a DMP

The structure and content of a DMP may vary according to the requirements of individual research funders or institutions. In general, a DMP should include details covering (but not limited to) the following:

  • General information
    • Project title
    • Funding information
    • Short project description (abstract)
    • Description of the research team (names, affiliations)
  • Data description
    • Origin of data (collection of new data or reuse of existing data)
    • Expected data types, file formats, and sizes
    • Purpose of new data generated and its intended use
  • Data documentation
    • File naming and versioning conventions
    • Methods, instruments, and software used to collect, process, and/or analyze data
    • Type of documentation used (e.g., README files, protocols, codebooks, lab notebooks)
    • Data quality control procedures (e.g., calibrations, repeated measurements)
  • Data storing and archiving
    • Plans for data storage and backup (including storage procedures and facilities)
    • Expected storage capacity requirements and related expenses
    • Data security and protection plan (e.g., off-site backup, recovery in case of accidents)
  • Data sharing and publication
    • Access rights (e.g., open access, restricted access, time embargo)
    • Data deposition in a repository, metadata descriptions
    • Use of persistent identifiers
    • Data licensing
  • Legal and ethical aspects of RDM
    • Potential legal and ethical issues (e.g., research involving personal, confidential, sensitive, or third-party data)
    • Relevant legal and ethical requirements (e.g., data anonymization, pseudonymization, encryption, restricted or controlled access, time embargos, collaboration agreements, ethical approval, consent of participants)
    • Compliance with laws, regulations, policies, and ethical guidelines
  • RDM roles, responsibilities, and resources
    • Expenses dedicated to RDM (e.g., costs related to storage, archiving, security, staff time and salary)
    • Roles and responsibilities assigned for RDM/data stewardship activities
    • Adherence to FAIR Principles
    • Regular DMP updates

For more details, see, e.g., the Science Europe Practical guide or the Horizon Europe DMP template.

 
 

Data repositories are storage locations for long-term preservation of research data (and publications). Repositories can also facilitate data sharing and publishing by providing access to data, if applicable. Data repositories often have a predefined structure and their own rules and standards for deposition, storing, and sharing data (see also the Data Sharing section).

Data Repository Selection

Most research funders and publishers have data sharing policies. These often require the deposition of research data in an appropriate data repository to make it publicly available. Sometimes a list of recommended repositories is provided (e.g., by Springer Nature, F1000Research, PLoS). Depending on the data type and disciplinary specifications, a suitable repository can be selected by searching one of the available registries or directories (e.g., re3data or FAIRsharing).

General recommendations for choosing a suitable repository are:

Data Deposition

Before data deposition, it is important to consider several things to make the process easier. Requirements for data deposition vary across repositories, so it is advisable to get familiar with the standards and deposition process used by your chosen repository. Follow the instructions and guides and prepare all files, information about your dataset, and any related documentation required for submission to the repository.

Some considerations and common recommendations for data deposition (to enhance compliance with the FAIR Principles):

Please note that for some repositories, there may be size limits for file uploads or fees for storing a large volume of data (or other services).

 
 

If you need help with RDM, there are a number of ways to get support. Some suggestions are provided below.

Home Institution Support

Most universities and research institutes provide RDM support through their own Open Science center or portal (e.g., the Czech Technical University in Prague, the University of Chemistry and Technology Prague, Charles University, Mendel University).

There may be a data steward, responsible for RDM support at the institutional or research team level. Some form of advice may be available from librarians, lawyers, or members of the Technology Transfer Department, the Project Office/Grant Office/Project Center, or the IT Department.

NTK Support

NTK offers following RDM support and services:

If you have any questions about RDM or related issues, please contact us for an individual consultation.

 
 

Books in the NTK Collection

Data stewardship for open science: implementing FAIR principles Scientific data management: challenges, technology, and deployment Managing research data

NTK Material

Online Courses

Guides

Articles

Your contact

Karolina Podloucká

Karolina Podloucká

 771 269 628

See also

Editor: Karolina Podloucká Last modified: 28.11. 2024 11:11