UW–Madison researchers find persistent problems with AI-assisted genomic studies

Nov. 6, 2024
The faulty predictions are linked to researchers’ use of AI to assist genome-wide association studies.

University of Wisconsin–Madison researchers are warning that artificial intelligence tools gaining popularity in the fields of genetics and medicine can lead to flawed conclusions about the connection between genes and physical characteristics, including risk factors for diseases like diabetes.

The faulty predictions are linked to researchers’ use of AI to assist genome-wide association studies. Such studies scan through hundreds of thousands of genetic variations across many people to hunt for links between genes and physical traits. Of particular interest are possible connections between genetic variations and certain diseases.

“It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data,” says Qiongshi Lu, an associate professor in the UW–Madison Department of Biostatistics and Medical Informatics and an expert on genome-wide association studies.

Now, Lu and his colleagues have demonstrated the peril of relying on these models without also guarding against biases they may introduce. The team describe the problem in a paper recently published in the journal Nature Genetics. In it, Lu and his colleagues show that a common type of machine learning algorithm employed in genome-wide association studies can mistakenly link several genetic variations with an individual’s risk for developing Type 2 diabetes.

“The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren’t,” says Lu.

These “false positives” are not limited to these specific variations and diabetes risk, Lu adds but are a pervasive bias in AI-assisted studies.

In addition to identifying the problem with overreliance on AI tools, Lu and his colleagues propose a statistical method that researchers can use to guarantee the reliability of their AI-assisted genome-wide association studies. The method helps remove bias that machine learning algorithms can introduce when they’re making inferences based on incomplete information.

University of Wisconsin–Madison release on Newswise