Open Data is the gateway to machine learning

Data-driven decisions are one of the key drivers for high-impact research. Usually, the most expensive and time-consuming part of any research study is the design and collection of data. Post collection, quality check and data analytics are the stepping stones towards data-driven research decisions.

Researchers are generally good at creating good data design, collecting high-quality data, and performing advanced data analysis to provide a high impact on research. However, often what researchers are not very good at, is ensuring reusability of the data. Several researchers who share research data find that FAIR (Findable, Accessible, Interoperable and Reusable) data is a major challenge.

Non-availability of research data in open access has led to many new projects repeating similar data generation activities and collecting similar data sets over time. This issue crops up not just among organizations but even within an organization because of data unavailability. Although researchers are willing to share data, it’s often not findable and accessible and hence, fellow researchers are unable to locate it. This happens more often when a researcher moves on to another organization and his/her data remains buried in the hard disk of the previous organization. These hurdles can be easily overcome by assigning a persistent identifier such as Date of Issue (DOI) on an institutional data repository.

Considering this as a top priority, in order to avoid data availability gaps and to promote open data, ICRISAT took effective measures and implemented a Data Management Policy in 2014, implementing Open Access in ICRISAT. Later, ICRISAT’s Statistics, Bioinformatics and Data Management (SBDM) theme took on the task of promoting good data practices across ICRISAT and of advising researchers on data management workflow and open access.

SBDM provides complete data management support to researchers and projects from data design, curation and analysis to sharing.

After ICRISAT’s data management policy became effective, SBDM established several data repositories for different kinds of data, including ICRISAT Open Access Data Repository (http://dataverse.icrisat.org).

The infographic below highlights the large number of data sets that ICRISAT has stored in open access in a short span of time.

As mentioned earlier, availability of FAIR data ensures that future research is built on top of past research efforts, so that young researcher do not have to start from scratch and can replicate research results and learn from them. Now is the time for the international community to move rapidly forward and bring more data into open access.

To raise awareness on the benefits of open data, SBDM theme is organizing an “Open Data Day 2020” on 10 March 2020. On this day, we will conduct several interactive hands-on sessions on crop ontology, FAIR principles, digital breeding data management, genomic data management and various other aspects of open data. The sessions will be broadcast online to ICRISAT staff across all locations. Data clinics will be organized in East and Southern Africa (ESA) and West and Central Africa (WCA) at a later date.


About the author:

Dr Abhishek Rathore
Theme Leader -Statistics, Bio-Informatics & Data Management
EiB Module 5 – Biometrics & Bioinformatics CoP Coordinator


Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You are donating to : $50 for 50 campaign

How much would you like to donate?
Would you like to make regular donations? I would like to make donation(s)
How many times would you like this to recur? (including this payment) *
Name *
Last Name *
Email *
Additional Note