RealHealthData is now part of Amplity Health, the leading pure-play commercialization service organization. Learn more

Analysis of free text in electronic health records for identification of cancer patient trajectories.

Analysis of free text in electronic health records for identification of cancer patient trajectories.

Read the original article.

Author information

  1. Norwegian Centre for E-Health Research, University Hospital of North Norway, Norway.
  2. Department of Statistics, University of Warwick, United Kingdom.
  3. Department of Signal Theory and Communications, University Rey Juan Carlos, Spain.
  4. Department of Mathematics and Statistics, UiT The Arctic University of Norway, Norway.
  5. Department of Gastrointestinal Surgery, University Hospital of North Norway, Norway.
  6. Department of Systems Biology, Technical University of Denmark, Denmark.
  7. Department of Mathematics, Imperial College London, Exhibition Road, London, United Kingdom.
  8. The Alan Turing Institute, British Library, 96 Euston Road, United Kingdom.
  9. Department of Gastrointestinal Surgery, Akershus University Hospital, Oslo, Norway.


With an aging patient population and increasing complexity in patient disease trajectories, physicians are often met with complex patient histories from which clinical decisions must be made. Due to the increasing rate of adverse events and hospitals facing financial penalties for readmission, there has never been a greater need to enforce evidence-led medical decision-making using available health care data. In the present work, we studied a cohort of 7,741 patients, of whom 4,080 were diagnosed with cancer, surgically treated at a University Hospital in the years 2004-2012. We have developed a methodology that allows disease trajectories of the cancer patients to be estimated from free text in electronic health records (EHRs). By using these disease trajectories, we predict 80% of patient events ahead in time. By control of confounders from 8326 quantified events, we identified 557 events that constitute high subsequent risks (risk > 20%), including six events for cancer and seven events for metastasis. We believe that the presented methodology and findings could be used to improve clinical decision support and personalize trajectories, thereby decreasing adverse events and optimizing cancer treatment.

PMID: 28387314
PMCID: PMC5384191
DOI: 10.1038/srep46226
[Indexed for MEDLINE]