top of page
  • Writer's pictureDebarka Sengupta

Making clinical oncology grand challenges solvable by Artificial Intelligence

As a computer science student, I always lived in denial about the underlying complexity of biological processes. I was curious about complex systems with many parameters that give rise to seemingly unsolvable problems. My impression of biology toppled as I was accidentally thrown at a computational biology project during my doctoral studies. As an outsider, I was baffled by the beauty of the well-coordinated, tiny molecular machines, i.e., cells. When my dear one fell prey to cancer, I took a special interest in the disease and realized that cancer cells feature aberrant phenotypes and molecular-information singularity. After returning to India as a young principal investigator from my only postdoc stint in Singapore, I steered my group’s focus toward understanding molecular mechanisms governing the genesis and progression of cancer that have direct clinical relevance. We spent most time framing mathematical problem statements around these and building efficient algorithms to solve them. In the past few years, this approach has helped us contribute to at least two of the grand challenges in clinical oncology – blood-based detection of cancers and patient-specific prediction of the success of cancer therapies.

Figure 1: (a) Schematic representation of the identification of a set of 11 transcripts from Tumor Educated Platelets (TEPs) and its adoption for a qRT-PCR-based blood test for affordable cancer detection. An artificial intelligence-based model is used to infer the existence of cancer. (b) Schematic representation of a chemo-transcriptomics framework that integrates cancer gene expression data, drug descriptors, and drug response annotations for AI-based modeling of cancer therapy success in cancer cell-lines, mouse xenografts and patient tumors.

According to a 2020 survey report by the World Health Organization (WHO), cancer accounts for nearly one in six deaths. The projected cancer burden in India for 2021 was 26.7 million DALYAMI (Disability Adjusted Life Years) (Kulothungan et al. 2022). The world has been racing against time to innovate blood-based early cancer detection methods. To date, most commercial products (e.g. GalleriTM by GRAIL) are based on tumor-derived biomarkers such as Circulating Tumor DNA (ctDNA) and Circulating Tumor Cells (CTCs). Due to the rareness in the bloodstream, the sensitivity of these tests is poor (Liu et al. 2020). Moreover, most of these diagnostic tests require sequencing infrastructure and a highly trained workforce, impending mass adoption of such tests in a developing economy. To circumvent this, we leveraged the seminal findings by Best and colleagues that showed platelet transcriptome undergoes substantial alteration in cancer patients. Platelets with such altered phenotypes were termed Tumor Educated Platelets (TEPs) by the authors, who managed to distinguish 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy (Best et al. 2015). It struck us that if we can narrow down the gene list to numbers that can be screened on a single qRT-PCR plate, it can be used by thousands of pathlabs. We, therefore, reduced the cancer liquid biopsy as a classical feature selection problem, as regularly dealt with in machine learning. Using an ensemble feature selection method, we could zero in on a set of 11 transcripts that adequately captures the differences in platelet transcriptomes in cancer patients and healthy individuals (Figure 1a). This small set of transcripts retained the classification accuracy at ~94%. We initially validated this panel in a prospective cohort comprising 10 lung cancer patients (7 treatment-naive and 3 first-line chemotherapy) and 7 healthy controls and obtained near-perfect predictions of the disease status (Goswami et al. 2020).

With such encouraging results, we went on to file a patent and established a startup (CareOnco Healthcare Pvt. Ltd.) for further validation and commercialization of the test. Since the inception of the company we have processed hundreds of samples across about ten prevalent cancer types. Our TEP-based diagnostic test ticks several boxes for a practical solution for blood-based early detection of cancer – 1. The cost of the test is low; 2. It can be performed in basic pathlabs equipped with qRT-PCR; 3. It’s a pan-cancer test; 4. The test detects early onset of the disease; 5. The turn-around time for the test outcomes is a few hours.

While there are visible breakthroughs in cancer detection, very little is known about why cancer therapies work selectively in patients. Failure of targeted therapies can be attributed to the dynamic nature of cancer genomes and an astronomically large number of ill-characterized possibilities in which mutations can co-occur. Therapies that target one or two specific mutations of growth factors, receptors, or enzymes promise 1–3% chances of therapeutic success (Maeda and Khatami 2018). We deduced that focusing exclusively on target molecules while ignoring the context genome is destined for unpredictable treatment outcomes.

We used gene expression levels in cancer cells/tissues to model therapeutic response. We soon found that training machine learning models for individual drugs weren’t feasible due to the insufficiency of publicly available transcriptomics plus treatment data. Gaurav Ahuja, a departmental colleague and an expert in chemo-genomics, advised us to use drug descriptors, which are numeric vector representations of molecules. Now we had all pieces of the puzzle in place. We formulated Response ~ gene expression levels + drug descriptor as regression and classification tasks depending on drug response annotation mechanism (IC50 in the case of cancer cell lines and partial/complete/no response in humans) (Figure 1b). We collaborated with Colleen Nelson's group from the QUT-Brisbane, to demonstrate that the trained models could predict treatment responses in LNCaP cell-line-derived xenografts in a prostate cancer progression study (Chawla et al. 2022). Due to their reliance on numeric drug descriptors (as opposed to compound names), our models could make predictions even for compounds not used in model training. Our work attracted a cross-continental team of technopreneurs, scientists, and doctors to co-found GenterpretR Inc. with the mission of building an AI-backed precision oncology platform.

In the last six years as an independent group, we have prevailed over the fear of our lack of traditional molecular biology training and experienced the power of ‘naivety,’ which catalyzes seeking interdisciplinary collaboration. We realized how reducing a grand challenge to a mathematical problem statement fuels its rapid solution, where the delta can be quantified. Our works reinforce that with apt computing strategies, gene expression patterns can be tracked and leveraged for clinical decision-making and complement the widespread reductionist approaches that are exclusively focused on specific target molecules to motivate therapeutic decisions.


104 views0 comments

Recent Posts

See All


bottom of page