An AI Lab Partner Helps Sift Through Transcriptomics Data

Big omics datasets can be overwhelming for researchers with limited programming skills, but texting with a new AI chatbot could help them wade through their results.

Written byKamal Nahas, PhD
| 4 min read
A UMAP projection of a large transcriptomics dataset.
Register for free to listen to this article
Listen with Speechify
0:00
4:00
Share

Technological advances in sequencing have fueled the “omics revolution,” making big data a staple of biological research. However, many researchers feel ill-equipped to wrangle and analyze these massive datasets, leading them to seek the help of bioinformaticians. Now, with the help of advancing artificial intelligence (AI) technology, analysis can become less of an impediment.

Reporting in a bioRxiv preprint that has yet to undergo peer-review, researchers developed an AI chatbot called CellWhisperer that analyzes transcriptomics data and reports back its findings in plain English.1 Now, researchers with limited computational chops can probe their dense datasets by providing CellWhisperer with non-technical queries, such as “What are these selected cells?” or “Describe the sample concisely.”

Last year, AI algorithms called large language models spooked the world with their ability to respond to prompts in articulate English, but some have looked past their startling nature to streamline data analysis. Biologists have begun training these models on literature repositories to quickly retrieve information from publications. GeneGPT, for example, can answer questions about genes by consulting genomics databases.2 Moritz Schaefer, a bioinformatician at the Medical University of Vienna and study coauthor, wanted to harness AI to simplify analysis of transcriptomics data. “Right now, biologists need to learn programming languages,” he noted. “We wanted to turn this around and said, ‘the computer should learn English.’”

When a bioinformatician analyzes transcriptomics data, they draw on past research for contextual information about patterns of gene expression. For example, they check if a group of genes are typically expressed together by cross-comparing with historic datasets. An AI model needs access to the same resources, so Schaefer and his colleagues trained their algorithm on pre-existing transcriptomics data. They used 20,000 studies from Gene Expression Omnibus and nearly 400,000 human transcriptomes from CELLxGENE Census.3,4 Together, these repositories equipped the AI tool with the training materials it needs to recognize a cell type or a disease state based on its gene expression patterns.

To make their tool even more user-friendly, they paired their trained model with an AI chatbot that could respond to prompts written in English. They turned to a fine-tunable open-source large language model called Mistral 7B and customized it using over 100,000 examples of conversational questions and answers about transcriptomics data.5 Simple queries included “Give a brief description of these cells,” whereas complex prompts tasked the model to list the most prominently expressed genes or the most active cellular pathways. At last, they had developed an AI chatbot adept at discussing transcriptomics and made it publicly available in October of this year.

Continue reading below...

Like this story? Sign up for FREE Newsletter updates:

Latest science news storiesTopic-tailored resources and eventsCustomized newsletter content
Subscribe

To take CellWhisperer for a test run, Schaefer queried the model about transcriptomics studies that they excluded from the training data. Starting with an easy task, his team confirmed that, most of the time, the model correctly identified distinctive cell types from multiple organs, including fat, muscle, lung, and skin.6 It grappled slightly with the complexities of distinguishing between similar cell types, namely ones in the pancreas.7 However, the model struggled with a few transcriptomic samples from diseased cells, suggesting that the training data lacked sufficient information about these conditions. Schaefer said CellWhisperer works well with some conditions, such as certain liver cancers, but struggles more with other diseases, such as skin melanoma.

Although CellWhisperer made correct predictions most of the time, Schaefer said that users should be aware that AI tools can make occasional errors. “It’s important to keep in mind that this AI tool is especially helpful for explorative analysis and brainstorming, and all its responses need to be cross-checked with other experiments,” Schaefer noted.

Maxim Nosenko, an immunologist at Trinity College Dublin who was not involved with the work, said, “Anyone can analyze the sequencing data using CellWhisperer, so that’s certainly a big advantage.” He added, “This tool is really timely now when there is a huge amount of sequencing data.” However, he noted that, in its present form, CellWhisperer is limited to data on human cells since the researchers excluded animal findings. “It is not applicable, for now at least, to mouse studies,” said Nosenko, who uses mice as a model species.

Schaefer plans to build on CellWhisperer’s capabilities. “We want to develop this further to become a semi-autonomous research assistant,” he said. Currently, CellWhisperer responds to single queries one at a time, but Schaefer hopes the tool will eventually carry out a comprehensive analysis on its own without the need for small talk.

Add The Scientist as a preferred source on Google

Add The Scientist as a preferred Google source to see more of our trusted coverage.

Related Topics

Meet the Author

  • Kamal Nahas

    Kamal is a freelance science journalist based in the UK with a PhD in virology from the University of Cambridge. He enjoys writing about the quirky side of biology, like the remarkable extent to which we depend on our gut bacteria, as well as technological breakthroughs, including how artificial intelligence can be leveraged to design proteins. His work has also appeared in Live Science, Nature, New Scientist, Science, Scientific American, and other places. Find him at www.kamalnahas.com or on X @KLNahas.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
Image of a man in a laboratory looking frustrated with his failed experiment.
February 2026

A Stubborn Gene, a Failed Experiment, and a New Path

When experiments refuse to cooperate, you try again and again. For Rafael Najmanovich, the setbacks ultimately pushed him in a new direction.

View this Issue
Human-Relevant In Vitro Models Enable Predictive Drug Discovery

Advancing Drug Discovery with Complex Human In Vitro Models

Stemcell Technologies
Redefining Immunology Through Advanced Technologies

Redefining Immunology Through Advanced Technologies

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Beckman Coulter logo
Conceptual multicolored vector image of cancer research, depicting various biomedical approaches to cancer therapy

Maximizing Cancer Research Model Systems

bioxcell

Products

Sino Biological Logo

Sino Biological Pioneers Life Sciences Innovation with High-Quality Bioreagents on Inside Business Today with Bill and Guiliana Rancic

Sino Biological Logo

Sino Biological Expands Research Reagent Portfolio to Support Global Nipah Virus Vaccine and Diagnostic Development

Beckman Coulter

Beckman Coulter Life Sciences Partners with Automata to Accelerate AI-Ready Laboratory Automation

Graphic of amino acid chains folded into proteins

Expi293™ PRO Expression System: Higher Yields Across a Wider Variety of Proteins

Thermo Fisher Logo