Research Data Management (RDM) is primarily about organising, storing and long-term preserving data generated during a research project.
Although effective data management can be challenging, it brings many benefits not only to the authors of the data themselves, but also to the wider community and is an example of good scientific practice. Effective research data management thus:
- Demonstrates research integrity, improves the reputation of being an honest and careful researcher, it can lead to higher citation impact;
- makes research robust and replicable;
- helps to anticipate potential issues that may arise during the research process;
- makes writing and revising papers easier;
- makes data searchable;
- reduces the risk of having to retract an article due to data mix-up or mislabelling;
- reduces the risk of data loss;
- helps with defending the research results if they are challenged (the data can be relied upon to defend the results, or at least prove that the results were presented in good faith);
- ensures continuity of long-term projects and consistency of projects involving multiple researchers;
- ensures that the research project meets all the conditions set by funders and publishers;
- enables progress in global research through the possibility of data re-use.
Data management in a research project
Data handling needs to be addressed before the project itself begins. In the research project planning phase, researchers should think about what data they will need for their research, how they will acquire it (will they create their own data or can they use existing data?), where the data will be stored, who will take care of it, etc. Data Management Plan (DMP), should also has been developed before the project started or in its early stages.
Once data collection begins, whether new data is generated or existing data is used, attention must be paid to data processing, storage and security. It is important that the data is carefully and correctly described (e.g. how the data was generated, what each piece of data means, clear versioning). Attention should also be paid to where the data are stored and backed up during the research and whether this storage is sufficiently secure, especially if sensitive data are involved.
When the end of a project is approaching, it should be clear what happens to the data after the end of the project. It is important to consider which data can be deleted and which should be preserved for the long term (we recommend reading the Five Steps for Selecting Data for Long-Term Preservation created by the Charles University Open Science Support Centre ), and to consider the possibility of sharing the data. If the decision is made to make the data available, care should be taken to ensure that it is FAIR* and to remember privacy rules and, if necessary, to anonymise the data. For example, the Amnesia tool available on the OpenAIRE website can be used for anonymisation. If data is published openly**, it is advisable to license it (preferably CC) so that users know how they can handle it. Whether or not the data is shared, consideration should be given to storing it in a data repository to ensure its long-term preservation.
* FAIR data is data that is managed in accordance with the FAIR principles, i.e. Findable, Accessible, Interoperable and Reusable. FAIR data may or may not be open - restricting access to data may be consistent with FAIR principles under certain conditions. Ideally, data should be open while still meeting the FAIR principles.
** Open data refers to data that is freely available online, can be reused, combined with other datasets and redistributed. Open data should be managed according to FAIR principles so that potential users can easily understand it, but even data that is not managed is considered open if access to it is not restricted.
Useful Resources
- CESSDA: Data Management Expert Guide (focused on social sciences)
- OpenAIRE: A Research Data Management Handbook
- The University of Edinburgh. MANTRA: Research Data Management training (free available online course)
- MARKOWETZ, Florian. Five selfish reasons to work reproducibly. Genome Biology [online]. 2015, 16(1) [cit. 2022-12-07]. ISSN 1474-760X. Available from: doi:10.1186/s13059-015-0850-7.
- WILKINSON, M. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data [online]. 2016, 3(160018). Available from: doi.org/10.1038/sdata.2016.18.
The pages were prepared using information retrieved from the website of the Charles University Open Science Support Centre.
Photo: "Research Data Management" by jannekestaaks is licensed under CC BY-NC 2.0.