Data collection, curation and quality maintenance are the stepping stones towards data-driven solutions. The availability of datasets improves decision making and guides future research in science and agriculture. With this in view, a data ‘hackathon’ was organized at ICRISAT – where enormous datasets are generated owing to the substantial research activities – to strengthen the ‘data awareness’ of researchers here.
A data lifecycle has several stages – generation of raw data, transformation, curation, annotation with proper metadata, and storage in an efficient storage-and-retrieval system. The data hackathon served to speed up the steps after raw data generation. Participants of this hackathon submitted 42 high-quality datasets for ICRISAT open data repository.
The workshop elaborated on the difference between open-data and data repositories, and steps to achieve FAIR (Findable, Accessible, Interoperable and Reusable) standards for datasets. It had interactive hands-on sessions where participants standardized their datasets, created supporting metadata and annotated with respective ontologies from Crop Ontology and AGROVOC. They learnt about CGIAR core metadata standards and the preparation of quality datasets by adhering to FAIR standards. They saw how curated data were uploaded, DOIs generated and so on. Special emphasis was given on the importance of ORCID – which creates unique links to researchers and their work, including datasets.
The hackathon, attended by 20 participants, was organized by Dr Abhishek Rathore, Theme Leader, Statistics, Bioinformatics and Data Management, during 4–5 September 2019 at ICRISAT, Hyderabad, under the umbrella of CGIAR Platform for Big Data in Agriculture.