Cedars-Sinai investigators create AI tool to analyze medical data for specific conditions like Alzheimer’s disease

July 16, 2024
AI tool’s software is free, publicly available.

A machine learning tool developed by Cedars-Sinai investigators can answer questions about genes, drugs, and biochemical pathways associated with Alzheimer’s disease and other health conditions.

Their findings were published in the journal Bioinformatics.

The study detailed how the tool, a free and publicly available software platform, analyzes and compiles data and information—including new peer-reviewed studies—to answer researchers’ queries. The key to the tool’s success is a new type of large language model, said Jason H. Moore, PhD, professor and chair of the Department of Computational Biomedicine at Cedars-Sinai and senior and corresponding author of the study.

Large language models are a specific type of AI programs that can distill large amounts of data—like medical studies, books, articles and interviews—and use that data to create new content.

“The large language model approach we developed uses knowledge stored in a special database, called a knowledge graph, that specializes in capturing the relationships between entities such as drugs and genes,” Moore said.

Historically, the main challenge in using large language models to generate content is ensuring quality, accuracy and reliability of the generated responses.

The Cedars-Sinai technique, however, moved past this challenge by using the graph-of-thoughts technique—a framework that allows investigators to break down a problem into subproblems, and turn the information generated by the large language models into a visual graph.  

The Cedars-Sinai tool also incorporates retrieval augmented generation, or RAG, which augments large language models with external data sources that provide relevant facts and context. Together, this tool unearths efficient and accurate data and information about varying conditions and diseases, including Alzheimer’s disease, which was the focus of the research study published in Bioinformatics.

The open-source software, called Knowledge Retrieval Augmented Generation ENgine—or KRAGEN—is publicly available on GitHub, a cloud-based platform that helps developers collaborate and manage code. To date, the software has received more than 400 endorsements from users.

To demonstrate the usability of the database, Moore and team used KRAGEN to generate data on Alzheimer’s disease, including data on genes, drugs and other aspects related to the condition. Investigators asked the database questions like, “What drugs bind to the proteins APOE and PTAU?” And “Which are genes associated with Alzheimer’s disease?”

Instead of receiving a list of data points for their question, investigators received a synthesized summary of information.

Cedars-Sinai release on Newswise