Principal Investigator: Dr. Pankaj Yadav
Co-Investigators: Dr. Sushmita Jha
Contributors: Vaishnavi Jangale, Rajveer Singh Shekhawat, Soham Biswas, Samarpita Saha
Role of the Institution in the GenomeIndia Project: Develop a variant prioritization pipeline based on machine learning algorithms for genome-wide association studies (GWAS)
We developed a comprehensive and robust variant prioritization pipeline involving data quality control and feature selection, followed by association analysis using machine learning methods such as Support Vector Regression (SVR). We tested our pipeline on both simulated and real datasets. Our pipeline could determine the top SNPs using permutation scores. Further, our pipeline uses a variety of tools to evaluate the biological importance of identified SNPs. These includes GRASP for literature p-values, GTEx for SNP expression analysis, Disgenet for gene-disease associations, and the Panther classification system for biological, molecular, and cellular studies. This extensive investigation of SNP association ensures a complete examination, illuminating their possible importance and contributes to a more detailed understanding of their biological implications for given trait.