Principal Investigator: Prof. Y. Narahari
Co-Investigators: Prof. Yogesh Simmhan and Arun Kumar
Contributors: Chirag Jain
Role of the Institution in the GenomeIndia Project: Developing novel algorithms based on big data analytics for compression and decompression of Whole Genome Sequence (WGS) datasets for efficient data storage and transfer.
We have developed pipelines based on advanced bioinformatics algorithms for seamless and guaranteed lossless compression and decompression of GenomeIndia uBAM datasets. These leverage parallel optimizations to achieve a 5x reduction in size (from ~50GB to ~5GB per sequence) that saves on storage and transfer costs, and a parallelized time of 120mins.