Data Science

Figure 1. Cancer cells employ drug-metabolism and drug-efflux enzymes to defend themselves against chemotherapies

A recent editorial in Nature Review Clinical Oncology called for more basic research on the current library of FDA-approved drugs. This editorial claimed that optimizing the application of current therapies would have a greater clinical impact than the development of new drugs (Nat. Rev. Clin. Onco. 2018, 15, 193). The key idea implicit in this editorial is, currently, we don’t fully understand why chemotherapies work in some cancers but not others.

Fortunately, we do know that a cancer’s tissue-of-origin is one of the best predictors of drug-response (Figure 1 top). For example, on the clinical side, colon cancer is known to be generally sensitive to 5-Fluoruracil (5FU) while testicular cancers can selectively cured with platinum-based therapies. In addition, recent machine-learning work on laboratory data, has shown that a cell-lines’s tissue-of-origin predicts drug-sensitivity better than most genomic data sets (Cell 2016, 166, 740).

While this tissue-of-origin paradigm has been successful in the clinic, it has done little for cancers that are different from the population average. In other words, not all colon-cancers behave the same way as “average colon cancer” and one of the primary goals of precision medicine is to identify drugs for under-served tails of a cancer-population curve (Figure 1 bottom). One recent example, has been the repurposing of melanoma immunotherapies for mismatch-repair (MMR) deficient colon cancers which are intrinsically resistant to 5FU (Science 2017, 357, 409)

The primary goal of the Douglass lab’s Data-science program is to boot-strapping these tissue-of-origin population trends into diagnostics that match drugs to individual patients. Our strategy is to:

  1. Explain Tissue-of-Origin Trends: by reconciling clinical data with pathway-information from the experimental literature (Figure 2 below)
  2. Mathematically Model Population Drug-Sensitivity: using differential equations based models of the pathways underlying drug sensitivity
  3. Fit #2’s Models to Individual Patient data: to identify, for example, breast cancers that “look like” average-colon cancer and therefore might respond to 5FU

Tactically, this work involves the reconciliation of standard methods in experimental science and data-science along with the “big data” of the experimental data from the cancer literature as detailed below(Figure 2).

Figure 2. Figure 2. Barriers between computational & experimental “big data” have prevented scientific synergy

Given the large number of genes that affect drug efficacy neither the tools of statistical learning nor experimental/deterministic modeling are fully equipped to model drug efficacy. Statistical learning approaches are common with large clinical databases, which give us measurements of 20,000 genes across 10,000 tumors (Figure 2 left). On the other hand, the experimental approach to modeling drug efficacy, often involves ordinary differential equations (ODE) models of “blueprints” of the biochemical pathways affecting drug sensitivity(Figure above right). While statistical learning approaches often yield clinically relevant associations, these are correlative and at best provide a “parts list” NOT a “blueprint”. On the other hand, while experimental modeling approaches provide causality insight, this information often reflects artificial laboratory conditions and is sometimes clinically irrelevant . Our approach to modeling drug-efficacy combines the advantages of both approaches where we use statistical learning on clinical data sets to identify the clinically relevant parameters within experimental science’s “blueprints.”