Human Genomics Research Has a Diversity Problem

Jonathan Lambert, NPR, March 21, 2019


But the studies that link genetic markers with disease focus largely on white European populations and neglect other races and ethnicities, according to an analysis published in the journal Cell on Thursday. The researchers argue this lack of diversity in genomic studies harms our scientific understanding of the genetic underpinnings of disease in all populations and exacerbates health care inequities.

The analysis reports that 78 percent of all individuals included in genomic studies of disease up to 2018 were of European descent, 10 percent Asian, 2 percent African, 1 percent Hispanic, and less than 1 percent for all other groups.


Ignoring genomic diversity can mean missing out on information that could benefit all. For example, the authors of the study point to PCSK9, a gene important for regulating cholesterol. Studying mutations that occurred in West African populations provided extra insight into the underlying biology and led to a new class of drugs that benefit people of all races.


Polygenic diseases vastly outnumber Mendelian diseases, making them a top research priority. But for a researcher, trying to identify the genes involved in a polygenic disease is like looking for an unknown number of needles in an enormous haystack.

Imagine our genome as a long line of about 3 billion base pairs, the letters that make up our genetic code. A researcher can use genetic markers, present in most people, to orient herself. These markers pop up at somewhat regular intervals across the whole line of letters.

Our researcher can then conduct what’s known as a genome-wide association study or GWAS, where she sequences these genetic markers in thousands of people, some portion of whom have a given disease. To home in on disease-causing genes, she looks for markers that keep popping up in people with the disease. If a marker is strongly associated with presence of the disease, the researcher infers that a disease gene must be nearby.

This conclusion is possible because letters that are close together tend to be linked and inherited as a block that is passed down the generations. The blocks can vary in size, but in general if a marker is associated with a disease, geneticists assume the disease-causing gene is in the same block.

But the authors of this analysis argue that inference can be faulty when comparing markers across different ethnic populations for two reasons. One is that the genes themselves may have changed, either through selection or random chance, in different populations.

For example, Tishkoff cites a gene that’s strongly associated with non-diabetic kidney disease. This condition is rare among Europeans, but more common among West Africans. Researchers pinpointed two mutations in a gene that seems to be associated with this disease, and further research suggested that this gene appears at higher frequency in West African populations because it confers some protection against sleeping sickness. Tishkoff says that if we’d only considered European variation, we’d have missed this example of how disease-causing genes can also be beneficial in some environments.


Populations with more diversity tend to have smaller blocks of the genome that are linked together, according to Tishkoff. But that blocking pattern can change during a migration event.


These different patterns of linkage can spell trouble for comparing across populations, as the markers associated with a disease-causing gene in European populations might exist in a totally different part of the genome in African or Hispanic populations, according to Tishkoff. A marker that accurately tagged a gene that increased risk of heart disease in Europeans might be miles away, genomically speaking, from that same gene in other populations, rendering the marker meaningless.

Tishkoff stresses that ignoring genomic diversity means that right now, genetically informed health care is worse, in some cases, for populations of non-European descent. Polygenic risk scores for diseases, which are calibrated using GWAS studies and can be used to inform treatment, can be less accurate when applied to other populations, leading to false positives, or underestimating the risk of certain diseases.


Popejoy agrees, though she emphasizes that the genetics of health disparities is only a small part of the problem. People “shouldn’t get the impression that health disparities are driven by differences in genetic structure between ethnic groups,” she says. “Environment matters and widespread systemic and structural racism that exacerbates environmental effects are more important.”


“Funding agencies need to financially encourage studying ethnically diverse populations,” Tishkoff says. “We’re already seeing the needle shifting, with initiatives like NIH’s All of Us.” That research initiative seeks to collect genomic data from diverse populations while making an effort to provide participants with their results.


Topics: , ,

Share This

We welcome comments that add information or perspective, and we encourage polite debate. If you log in with a social media account, your comment should appear immediately. If you prefer to remain anonymous, you may comment as a guest, using a name and an e-mail address of convenience. Your comment will be moderated.