Insights from Preliminary Analysis

The consortium has completed whole genome sequencing of 10074 samples. Out of these, 9871 samples representing 99 ethnicities were considered further after thorough quality checks, comprising 3151 samples from CBR, 2020 from CCMB, 2245 from IGIB, and 2455 from NIBMG. The joint genotyping of these 9871 samples have been completed. The analyses for the same is ongoing. Previously, in phase-1, the team performed joint variant calling on 5,750 samples. These 5,750 individuals represent 69 distinct ethnicities within the population groups across India. The joint genotyping has been independently done in both NIBMG and CBR. The analyses from this exercise have shed new light on the genetic makeup of the Indian population, underscoring the importance of a population genomic project like GenomeIndia. Below, we present a picture of our analysis results.

Huge Number of Common Variants and Rare Variants

Phase 1 analysis has identified hundreds of millions of genetic variations. Many of these variants are rare or non-existent in global genetic variant databases, highlighting the novelty of this project, and the samples being sequenced..

The identified genetic variants that are common in Indian populations are candidates for future association studies to find genetic risk factors for common complex diseases. They will also be used to optimize and design new gene-chips or genome-wide chips, that include genetic variants that are relevant for Indian populations. This will represent a significant improvement over most existing chips that survey genomes at genomic locations that aren't commonly varying in Indian populations.

Importantly, our dataset of millions of variants will be an essential tool in rare-disease genetic studies aiming to discover genetic variants for rare diseases. Our analysis has created a large set of 'benign' variants carried by healthy individuals. Given a set of variants that are relatively common in healthy individuals, this represents a set of variants that can be eliminated from a list of candidate variants. Preliminary analysis on the sequence data also reveals insights into the population history not detected in any previous study, enabled by the inclusion of populations that have not been included in previous sequencing efforts.

Medical relevance of the identified variants

After joint calling 5,750 samples, we have annotated all variants in the dataset with various metrics. These annotations let us estimate the functions of these variants, and to predict their impacts on disease or complex phenotypes. We get a glimpse of medically relevant findings in genes implicated in familiar hereditary conditions like cholesterolemia and heart disease, finding novel likely functional variants that are potentially leading to increased risk of diseases, thus alluding to crucial functional consequences for the Indian population.


Our genes affect our inherent ability to metabolize drugs and modulate drug response. Pharmacogenomics is the discipline that looks at how genetic variations can affect an individual's biological responses to drugs. Previous studies have identified and catalogued a set of such genes and variants in the PharmGKB database. Of these, 118 variants--categorized as Level 1A and 1B--most adversely impact an individual's response to drugs.

A high frequency of one of these variants in one population would imply ineffectiveness of certain drugs for individuals of the population and is hence a public health issue. We observe that many populations in India carry a substantial proportion of these variants which will reduce efficiency and efficacy of anticoagulant, anti-retroviral, and anti-viral drugs.