ABOVE: © ISTOCK.COM, EVGENII KOVALEV

Mutations can change their frequency in a population because of selection or luck, and peering back in time to figure out why specific polymorphisms persisted has proven to be a particularly difficult scientific challenge. Now, research published September 13 in Cell Reports describes a tool that will likely make it easier for scientists, especially those studying the genomic roots of adaptation and disease, to do just that.

The tool, a deep learning algorithm called DeepFavored, simultaneously runs several statistical tests on existing genome-wide association study (GWAS) datasets to distinguish favored mutations—those that were the result of selection—from hitchhiking mutations that weren’t selected for but occurred alongside the favored ones. In validating the tool on three separate human populations, the researchers behind the paper, who are based at Southern Medical University in China, say they’ve identified genomic tradeoffs: mutations adaptive for specific environments that also made people more susceptible to certain diseases, or that carry hitchhiker mutations that did so.

While the experts who spoke to The Scientist all found the paper’s tradeoff-specific findings to be a bit of a stretch—identifying these so-called favored mutations is very hard, and the tradeoffs between adaptation and disease susceptibility are unlikely to be as tidy or straightforward as the paper suggests, they say—they were all intrigued by the algorithm and its potential to help scientists better explore such research questions.

“I think it’s a nice hypothesis-generating paper,” says Claudia Gonzaga-Jauregui, a geneticist at the National Autonomous University of Mexico who didn’t work on the study. “If you can run these algorithms in your genomic data for your population, then maybe you can identify some loci that you might want to explore further.”

Humans: a model for studying local adaptation

The study employed existing GWAS data to examine mutations in the genomes of people whose ancestors settled in either Europe, western Africa, or eastern Asia. The researchers focused on alleles in genes related to diet, such as those involved in metabolism or taste perception, and immunity, assuming that the three geographically separated groups would need to adapt to different pressures related to pathogens and food availability.

“Humans are a single species with populations across the globe, living in, and adapting to, every possible latitude, altitude, climate, and ecosystem. We are a great case study in adapting to varying environments,” George Washington University evolutionary genomicist Brenda Bradley, who was not involved in the study, tells The Scientist over email.

To distinguish adaptive evolution from coincidental mutations, DeepFavored simultaneously performs seven statistical tests already available and used by researchers to tell which mutations at sites of interest in GWAS studies were the result of selection, which were hitchhikers, and which were unrelated and coincidental. These tests were designed to detect what are called hard sweeps, or moments in evolutionary history when a haplotype (a group of genetic variants inherited together) quickly grew in prominence shortly after it first emerged. Typically, this happens when a haplotype offers a drastic benefit to those who carried it, explains University of Chicago immunogeneticist Luis Barreiro, who didn’t work on the study. The idea is that by combining multiple tests, DeepFavored may be able to detect soft sweeps—haplotypes that were already in circulation but became more beneficial and more prevalent after the local environment changed to conditions that favored them. These gentler shifts in frequency, Barreiro adds, “are very, very hard to identify.”

The researchers validated the algorithm by comparing its results against two other algorithms currently in use to identify favored and hitchhiker mutations in a set of simulated GWAS data so they could get a reliable measure of accuracy, finding that it reliably outperformed both.

The researchers then tested DeepFavored and the other two tools on real-world GWAS data. Only 55 putatively favored mutations were identified by all three tools, despite each individual model finding 700 to 1,200 candidates on its own, which Barreiro says makes those 55 somewhat more credible than those spotted by just one or two of the techniques. These included the well-established hard sweep of metabolic mutations that helped people survive by extracting more out of scarce food sources but are now associated with metabolic diseases such as diabetes in areas where food is abundant, as well as soft sweeps of mutations that facilitate glucose digestion but are also linked to increased risk of melanoma and carcinoma.

See “Gonorrhea-Blocking Mutation Also Protects Against Alzheimer’s: Study

“I actually think that the instrument that they’ve made is pretty cool. . . . The other tools, they’re not using as many tests, they’re not doing them at the same time,” says Jessica Brinkworth, an evolutionary immunology and genomics researcher at the University of Illinois, Urbana-Champaign, who didn’t work on the study. She adds that she appreciated how DeepFavored seems more robust than the other tools since it simultaneously ran several tests.

Indeed, error rates are high for similar techniques, with both false positives and negatives blurring findings, Barreiro cautions, adding that even improving favored mutation detection by a full (hypothetical) order of magnitude may not mean much in terms of absolute numbers of mutations detected, given the vastness and complexity of the genome.

More data needed

Though the broad conclusions of the paper are reasonable, Barreiro says, he, Brinkworth, and Gonzaga-Jauregui all say they’d be more confident in the specific tradeoffs identified by DeepFavored if they saw biological data from functional experiments.

“I think that most of the time when people are talking about [clear-cut] tradeoffs, it’s not necessarily real,” says Brinkworth, citing the fact that many phenotypic traits result from several gene variants acting together. “There are clearly genetic tradeoffs; protection for this, susceptibility to that. They’re very, very hard to find.”

Humans are a single species with populations across the globe, living in, and adapting to, every possible latitude, altitude, climate, and ecosystem. We are a great case study in adapting to varying environments.

—Brenda Bradley, George Washington University

However, data from in vitro and eventually in vivo experiments—to double-check the physiological effects of the algorithms’ identified mutations, for instance, or from sequencing the genetic material immediately up and downstream of the locus in question to better evaluate what may be the result of selection and what hitchhikes along—could help tease out the evolutionary history of sweeps. Barreiro also suggests that sequencing ancient DNA samples and looking for the mutations in question could reveal which emerged when, how quickly, and in response to what environmental pressures.

“It’s not an out-of-reality hypothesis and study, but I think it lacks some additional data to really show that what they are showing is really meaningful,” says Gonzaga-Jauregui.

See “Large Scientific Collaborations Aim to Complete Human Genome

“The putative adaptative and hitchhiker sites identified should now be examined in more detail,” Bradley writes. “What are these genetic changes and how are they impacting the functioning of the relevant protein? And how do these vary across a wider range of populations (beyond the three included here)?”

In response to the comments about a need for for functional data, study coauthor and Southern Medical University researcher Hao Zhu writes in an email to The Scientist that practical studies would take excessive time and money, “and the worse is, it is often infeasible to experimentally examine adaptive human evolution. . . . I think the more likely way to obtain more supporting evidence is to perform GWAS studies, especially GWAS studies of Africans . . . so we can be more sure and better measure the trade-offs of adaptive evolution and disease susceptibility for Eurasians.” Unfortunately, he notes, the field is hampered by a lack of financial support for GWAS research in African countries.