AI traces the origin of metastatic cancer better than humans

AI-powered diagnostics based on tumor genomics helps fighting cancer of unknown origin

Image from:

Most of the time when oncologists face with cancer they know the origin of the tumor. This origin is usually trivial to trace because solid tumors grow gradually from a single malignant cell in particular tissue. The blood cancer is different because the tissue it originates from is liquid, but it is still obvious which cells initiated the process.

However, 3–5% of all cancers are designated as CUP: Cancer of Unknown Primary. Such cancers are usually diagnosed on the late metastatic stages, when the treatment is mostly inefficient and the sad outcome is inevitable. Moreover, it is not even clear which treatment to prescribe, because the origin of cancer is not known. CUP patients usually die within one year after a diagnosis, which is drastically different from the most other cancers, which usually allow for much better survival after proper diagnostics and treatment.

In the case of CUP there is no primary tumor, which could be spotted and eradicated early. Few malignant cells appear somewhere in the body and quickly evolve to aggressive metastatic stage before ever becoming visible as a primary tumor mass.

Metastatic cancer cells usually have little in common with their benign ancestors, thus it is very hard to determine the origin of CUP by means of histology or biochemistry. Advanced immunohistochemical tests provide 60–70% accuracy, which is still too low for prescribing an effective treatment. Gene expression analysis also appears to be ineffective in this respect because humans are unable to infer subtle correlations from the noisy expression data.

Recently the machine learning approach was proposed for determining the origin of CUP. The motivation for this was clear: the ML excels in revealing hidden correlations and patterns in the heterogeneous data, which remain unnoticed by humans.

The researchers performed DNA sequencing of 34,352 cancers. Additional 23,137 cases got both genomic and transcriptomic analysis. The data were used to train the AI, which predicts the cancer origin based on provided genome and gene expression data in the tumor.

This tumor classification algorithm was called “MI GPSai”. It is based on the Caris Deliberation Analytics (DEAN) framework comprising over 300 well-established machine learning algorithms, including random forest, support vector machine, logistic regression, K-nearest neighbor, artificial neural network, naïve Bayes, quadratic discriminant analysis, and Gaussian processes models.

The MI GPSai training and validation scheme. Image from:

The method was validated extensively on 19,555 real cases of cancer with known origin, which were not included into the training set. The results were impressive: MI GPSai correctly predicted the tumor type with an accuracy of over 94% while discriminating between 21 possible categories of cancer. If one also considers the second most probable prediction, the accuracy increases even more to 97%.

When speaking about clinical CUP cases, the algorithm was able to provide confident prediction in 71.7% of them. AI predictions were compared with those obtained by other techniques. In many cases prediction of the AI and the humans were contradicting and required additional pathologist evaluations. In 41.3% of such cases the final diagnosis was changed to one, provided by an AI! Needless to say that this might have saved quite a few lives by prescribing the correct treatment.

A clinical example of the a case in which the pathological diagnosis was changed from the “squamous cell carcinoma” to “urothelial carcinoma” based on MI GPSai predictions. Image from:

The AI systems similar to MI GPSai may not only be used in cancer diagnostics but also in quality control of anatomical pathology laboratories.

This study shows that AI-based analysis of different omics data could be used in various areas ranging from the classical drug discovery to cancer diagnostics and therapy.

We at Receptor.AI also incorporating the omics analysis in our products. Our target identification and drug-target interaction platforms utilize the omics data to train the AI, which is then used to select the high quality leads among the output of our molecular generation module. Oncologists, data scientists and bioinformaticians of our company are developing an approach of novel oncological target identification based on the analysis of omics data.

In addition, omics analysis has been implemented in our Drug Repurposing module, which allows us finding prospective anti-cancer compounds among the FDA approved drugs and substances that undergo the 3rd phase of clinical trials.

Each of the products that Receptor.AI creates are aiming at helping scientists and pharma around the world to make discoveries that could save lives.

Innovative solutions for computer-aided drug discovery based on machine learning and artificial intelligence.