DeepPhe-CR (DeepPhe tool for Cancer Registries) is an advanced Natural Language Processing (NLP) software service designed to significantly improve the efficiency and efficacy of cancer registry data abstraction. Developed by an academic collaboration supported by the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program, DeepPhe-CR automates the manual and resource-intensive task of extracting key cancer details from patient clinical notes.
Built upon the base DeepPhe platform, the system uses a combination of NLP (based on the Apache cTAKES framework), machine learning, visual analytics, and a rich ontology to extract and summarize longitudinal histories of cancer patients. DeepPhe-CR is specifically engineered as a web-based NLP service API, providing REST-APIs for seamless integration into existing cancer registry data abstraction tools (such as SEER*DMS).
Key Capabilities:
- Automated Data Extraction: Extracts critical cancer attributes, including topography, histology, behavior, laterality, and grade, with high accuracy (F1 scores of 0.79-1.00) across common and rare cancer types (e.g., breast, prostate, lung, colorectal, ovary, and pediatric brain).
- Computer-Assisted Abstraction: Supports registrars by providing suggested, extracted items and highlighting the corresponding text spans in the source document for quick validation and one-click copying.
- Scalable Architecture: Provided as a suite of Docker containers for ease of installation and operation, utilizing a REST router and a Neo4j graph database for storing and managing results.
- Cross-Document Summarization: Supports summarization of cases across one or more documents to build a comprehensive patient history.
DeepPhe-CR is a critical tool for cancer surveillance efforts, allowing registries to expand their data collection to include additional information like genomic biomarkers while streamlining the overall workflow.