Insights from Preliminary Analysis

In phase-1, the team performed joint variant calling on 5,750 samples, comprising 2,587 samples from CBR, 1,055 from CCMB, 572 from IGIB, and 1,536 from NIBMG. These 5,750 individuals represent 69 distinct ethnicities within the population groups across India. The joint genotyping has been independently done in both NIBMG and CBR. The analyses from this exercise have shed new light on the genetic makeup of the Indian population, underscoring the importance of a population genomic project like GenomeIndia. Below, we present a picture of our analysis results.

Huge Number of Common Variants and Rare Variants

Phase 1 analysis has identified hundreds of millions of genetic variations. Many of these variants are rare or non-existent in global genetic variant databases, highlighting the novelty of this project, and the samples being sequenced.

The identified genetic variants that are common in Indian populations are candidates for future association studies to find genetic risk factors for common complex diseases. They will also be used to optimize and design new gene-chips or genome-wide chips, that include genetic variants that are relevant for Indian populations. This will represent a significant improvement over most existing chips that survey genomes at genomic locations that aren't commonly varying in Indian populations.

Importantly, our dataset of millions of variants will be an essential tool in rare-disease genetic studies aiming to discover genetic variants for rare diseases. Our analysis has created a large set of 'benign' variants carried by healthy individuals. Given a set of variants that are relatively common in healthy individuals, this represents a set of variants that can be eliminated from a list of candidate variants. Preliminary analysis on the sequence data also reveals insights into the population history not detected in any previous study, enabled by the inclusion of populations that have not been included in previous sequencing efforts.

Medical relevance of the identified variants

After joint calling 5,750 samples, we have annotated all variants in the dataset with various metrics. These annotations let us estimate the functions of these variants, and to predict their impacts on disease or complex phenotypes. We get a glimpse of medically relevant findings in genes implicated in familiar hereditary conditions like cholesterolemia and heart disease, finding novel likely functional variants that are potentially leading to increased risk of diseases, thus alluding to crucial functional consequences for the Indian population.

Pharmacogenomics

Our genes affect our inherent ability to metabolize drugs and modulate drug response. Pharmacogenomics is the discipline that looks at how genetic variations can affect an individual's biological responses to drugs. Previous studies have identified and catalogued a set of such genes and variants in the PharmGKB database. Of these, 118 variants--categorized as Level 1A and 1B--most adversely impact an individual's response to drugs.

A high frequency of one of these variants in one population would imply ineffectiveness of certain drugs for individuals of the population and is hence a public health issue. We observe that many populations in India carry a substantial proportion of these variants which will reduce efficiency and efficacy of anticoagulant, anti-retroviral, and anti-viral drugs.