If AI is going to be the world's doctor, it needs better textbooks

A 2016 meta-analysis looking at 2,511 studies from around the world found that 81% of participants in genome-mapping studies were of European descent. This has severe real-world impacts: Researchers who download publicly-available data to study disease are far more likely to use the genomic data of people of European descent than those of African, Asian, Hispanic, or Middle Eastern descent.

Shockingly, that 81% is actually an improvement. Alice Popejoy, now a postdoc at Stanford University, was a co-author of the 2016 study. She started the analysis after repeatedly hearing lectures citing a 2009 study that had found that 96% of participants in genome-mapping studies were of European descent. “It’s not just an ethical or moral problem, it’s really a scientific problem,” says Popejoy.

That’s because efforts to mine these flawed datasets for use in clinical settings are proliferating. Deep Genomics, for example, is developing new treatments for Mendelian disorders like Huntington’s disease and cystic fibrosis. Sophia Genetics is integrating with hospitals to analyze patient genomes and give on-site diagnoses. IBM Watson has touted genomics as a kind of silver bullet against cancer, allowing physicians (in theory) to personalize treatment like never before.

But the missing demographic chunks of information in genetic datasets could scuttle these potential -based tools by rendering them about as good as guesswork. “The big takeaway is that we don’t know what we don’t know,” says Popejoy. Without knowing about variations between populations, she says, we can’t really say what the implications of those variations are on treatment.

