MedCAT logo

MedCAT

by NHS
VISIT OFFICIAL WEBSITE →

OVERVIEW

Open-source Medical Concept Annotation Tool (MedCAT) for Named Entity Recognition and Linking (NER+L) in clinical text using biomedical ontologies.

MedCAT (Medical Concept Annotation Tool) is an open-source Named Entity Recognition and Linking (NER+L) toolkit developed as part of the CogStack ecosystem, primarily by researchers associated with the NHS. It is designed to extract and structure information from unstructured biomedical documents, such as Electronic Health Records (EHRs). The tool links identified clinical concepts to major biomedical ontologies like SNOMED-CT and UMLS.

MedCAT employs a novel self-supervised machine learning approach for concept extraction and disambiguation, offering high performance, speed, and ease of use, with demonstrated strong transferability between different hospitals and datasets. It is lightweight and fast, capable of handling large-scale entity extraction. The software is distributed under the Elastic License 2.0.

Key Capabilities:

  • NER+L: Named Entity Recognition and Linking to millions of biomedical concepts.
  • MetaCAT: A component for detecting the status of a concept (e.g., affirmed, negated, or hypothetical).
  • MedCATtrainer: An accompanying open-source web interface (with a REST API) that allows clinicians and annotators to inspect, improve, and customize MedCAT models through supervised training and active learning.
  • Scalability: Supports multiprocessing for handling large datasets (100M+ documents).
  • Technology: The core library is Python-based, and the newer MedCAT v2 utilizes transformer-based models for improved contextual information and robustness.

RATING & STATS

Customers
100+
Founded
2019

KEY FEATURES

  • Named Entity Recognition and Linking (NER+L)
  • Biomedical Ontology Mapping (SNOMED-CT, UMLS)
  • Negation and Hypothetical Status Detection (MetaCAT)
  • Self-supervised Machine Learning for Concept Extraction
  • Web-based Annotation and Customization Tool (MedCATtrainer)
  • Transformer-based Models (MedCAT v2)
  • Scalable Multiprocessing for large datasets

PRICING

Model: free
Open-source software available on GitHub under the Elastic License 2.0. The software itself is free, but downloading pre-trained models based on SNOMED-CT and UMLS requires a valid NIH profile/UMLS license.
FREE TRIALFREE TIER

TECHNICAL DETAILS

Deployment: on_premise, cloud
Platforms: web, linux, windows, mac
🔌 API Available⚡ Open Source

USE CASES

Extracting information from Electronic Health Records (EHRs)Clinical research and analysisCustomizing and training NER+L modelsLarge-scale entity extraction

INTEGRATIONS

SNOMED-CTUMLSspaCyscispaCyNeo4jCogStack Ecosystem

COMPLIANCE & SECURITY

Security Features:
  • 🔒De-identification Mode (via MedCATservice)
  • 🔒API Access Control (implied by REST API deployment)
  • 🔒Role-based Access (implied by MedCATtrainer for annotators/admins)

SUPPORT & IMPLEMENTATION

Support: forum, documentation
Target Company Size: medium, enterprise
TRAINING AVAILABLE

PROS & CONS

✓ Pros:
  • +Open-source, free to use, and highly customizable
  • +State-of-the-art performance in Named Entity Recognition and Linking (NER+L)
  • +Strong transferability across different clinical datasets and hospitals
  • +Includes a dedicated annotation tool (MedCATtrainer) for fine-tuning models
  • +Scalable for large datasets (100M+ documents)
✗ Cons:
  • -Requires technical expertise (Python, Docker, NLP knowledge) for deployment and use
  • -Requires a separate UMLS license for access to some pre-trained models
  • -MedCAT v2 may require GPU compute resources for optimal performance

ABOUT NHS

RELATED NATURAL LANGUAGE PROCESSING (NLP) FOR CODING SOFTWARE

BROWSE SOFTWARE IN NATURAL LANGUAGE PROCESSING (NLP) FOR CODING

No previous software