Dana-Farber AI-model predicts primary source of cancer using gene sequencing data

Aug. 8, 2023
The tool could help guide treatment for patients in cases where traditional diagnostic methods cannot identify a primary source of the disease.

Researchers at Dana-Farber Cancer Institute have created an AI-based tool that uses tumor gene sequencing data to predict the primary source of a patient’s cancer. The study, published in Nature Medicine, suggests that this predictive tool, called OncoNPC, could help guide treatment of cancer and improve outcomes in difficult to diagnose cases.

The primary source of cancer is traditionally diagnosed by a standardized diagnostic work-up, including radiology and pathology assessments based on slides of cells taken from a tumor biopsy. In 3-5% of cancer cases, the original source of the tumor cannot be determined.

In these cases, patients are diagnosed with cancers of unknown primary (CUP) and have few treatment options because most treatments are approved for a specific type of cancer.

The team found that the AI model’s predictions could have value for these patients. A retrospective analysis suggested that this additional piece of diagnostic information about the primary source of the tumor could help doctors select treatments that improve survival.

To build the model, the researchers trained and validated a machine learning classifier using the medical records of 36,445 patients with known primary tumors from three major cancer centers, including Dana-Farber. The records contained tumor genetic sequencing data and clinical information for each patient.

OncoNPC, short for Oncology NGS-based Primary cancer type Classifier, accurately predicted the origin of about 80% of tumors with known types, including metastatic tumors, using a subset of cases that had not been used as training data. The model made high confidence predictions in 65% of the tumors, meaning it assessed its prediction as having a high probability of being correct. Those predictions were 95% accurate.

They then applied OncoNPC to a separate database of 971 CUP tumors from patients seen at Dana-Farber, where a team of experts had already made a substantial effort to identify the primary source of the tumor. OncoNPC was able to predict the tumor’s origin with high confidence for 400 out of 971 (41.2%) of the cases.

To validate these predictions, the team looked at inherited germline risks of cancer among these patients and found that the risks lined up with the predictions. Further, they looked at specific cases closely to determine if the data, including pathology results, patient history, and genetic mutations supported the prediction.

To determine if an OncoNPC prediction might have value to patients, the team examined the outcomes of a subset of the patients with CUP. Patients who received treatments that matched the predicted primary tumor site had longer survival compared those receiving treatments that did not match the predictions. In addition, they found that the OncoNPC predictions would enable approximately 2.2 times as many CUP patients to be matched to approved targeted medicines.

The tool has so far been studied using retrospective data only. To determine if it could improve outcomes for patients, it would need to be tested in a clinical trial.

Dana-Farber Cancer Institute release on Newswise