Data management is a challenge, but when done well can provide great rewards. This primer is about open and FAIR data. In it, we look at some of the principles designed to promote good data management and we explain how we’re improving our support at Wiley for researchers and authors when they want (or need) to share the new data they create.
Like much of the research world, the use of data has been transformed by the revolution in computing power and technology, which has allowed the creation, calculation and sharing of data in a myriad of ways not possible before. By some calculations, the rate of scientific data output is growing at around 30%; other calculations suggest that 90% of the data in the world has been generated in the past 2 years. The possibilities for research, if this new data can be shared effectively, seem endless. They range from the simplest achievement of making redundant work less likely, to the enablement of studies that would otherwise be impossible. A great example of this is data from a longitudinal study in Australia which has now contributed to 765 published papers.
However, these opportunities can only be realized if data is shared in a way that ensures its preservation. It also must be easy to find and use, something that is sadly rare in a world where data mismanagement is typical. Studies have found that 54% of the data used in published material could not be identified. Moreover, the older data becomes, the less likely it is to be available with 80% of datasets aged over 20 years thought to have been lost. I have personal experience of this: when I wanted to use a database deposited in the UK Data Service in the year 2000, I discovered it was in an old and no longer useable form of Microsoft Access. For data to remain useful to the research community, things need to change.
What Is Being Done to Help Promote Better Data Management?
Over the past five years researchers, publishers and other stakeholders have developed principles, which, if followed, can help provide better data management.
Chief among these are the FAIR principles, developed in 2014. These state that to ensure data is searchable by humans and machines, data must be
- Findable - that data should be stored in a database which has sufficient unique identifiers, such as DOIs, to allow it to be easily discoverable and linkable to other sources.
- Accessible - that the level of accessibility or openness is clearly defined. Ideally, data should be stored open access, but this is not always possible, and so if not, then how the data can be accessed should be easy to understand
- Interoperable - that data is stored ‘interoperably’ using widely adopted standards and technical languages that allow it to be easily integrated, structured and exchanged with other data and applications. This seeks to promote the easier interaction of different datasets and overcome potential difficulties found when using datasets from different sources about the same subject, or even datasets about the same subject that might be stored in different formats with slightly different labeling.
- Reusable – that is it stored in a way that makes it easy to reuse, test, attempt to replicate and use in studies other than the one it was intended for. Ideally, this is under an open access or Creative Commons license, but at least under a license that is clear, and with evident provenance.
Beyond FAIR, another important set of principles in data management are the FORCE11 Data Citation Principles, which provide specific recommendations intended to make the citation of data as important and normalized as citing peer reviewed articles. Recommendations state that contributors to the data should be given credit and attribution similar to that given to co-authors, that data should be cited whenever an argument made within an article relies upon it, and that data citation should be clear enough to be understood beyond an immediate scientific community but specific enough to ensure interoperability.
What Is Wiley Doing to Help Promote Best Practice? How Can We Ensure That These Guidelines are Put into Action?
The FAIR guidelines are useful, and what we do at Wiley enables researchers to find benefit from adopting them in a variety of ways.
Our new data sharing policies (launched Nov 2018) include 4 levels of data sharing:
- Mandates and Peer Reviews
These data policies are comparable with the Center for Open Science tiered data sharing policies from the Transparency and Openness Promotion Guidelines.
We also encourage authors to use templated standard data availability statements which promote good practice, such as the use of DOIs.
By April 2019, 90 journals had adopted the “Expects Data” policy, and a further 70 journals had gone beyond this to take on “Mandates Data”.
We’re also working closely with researchers, and other stakeholders (including publishers) to create and promote practical ways that support good data management. For example, in November 2018, colleagues from the FORCE11 Data Citation Implementation Pilot (DCIP) group published a data citation roadmap. This offers practical advice to academic publishers on how to effectively cite data at all points of an article’s life cycle and aims to help make open research easier for researchers. A second example is a newly preprinted and draft set of criteria to aid selection of appropriate data repositories, useful for publishers who want to give advice to authors and researchers. It sets out “essential” (as well as “desirable”) qualities for data repositories, which include making sure that a data depository has clear access and reuse conditions And in the meantime, we partner with repositories like Figshare and Dryad to help make it more straightforward for authors to share data.
Each of these steps moves us closer to supporting researchers whose goal is to turn open data into FAIR data, and to explore the potential benefits of open research.
About the AuthorMore Content by Vanessa Moir