Data extraction from SmPCs into structured data for ISO IDMP compliance

Asphalion

/@Asphalion.

Published: July 10, 2017

Open in YouTube
Insights

This video provides an in-depth exploration of automated data extraction from Summary of Product Characteristics (SmPCs) into structured data, primarily to address the complex requirements of ISO IDMP (Identification of Medicinal Products) compliance. The presentation, featuring experts from Asphalion and Unterpharma, highlights the inefficiencies of manual data handling, particularly in light of past xEVMPD (Extended EudraVigilance Medicinal Product Dictionary) implementations, and introduces a technological solution to streamline this critical regulatory process. The core message emphasizes that while ISO IDMP implementation faces delays, its strategic importance for the pharmaceutical industry remains paramount, necessitating a shift towards structured data management.

The discussion begins by setting the stage for the challenge of ISO IDMP, which demands extensive data collection from traditionally unstructured documents like SmPCs. Jan Voskuil from TechniQ/Unterpharma introduces their "extractor" tool, built on semantic technologies and linked data, designed to innovate information flow in the pharma domain. This tool is presented as a flagship product within a larger suite that includes vocabulary management and text verification, aiming to transform text fragments into structured, annotated data. Remco Romijn from Asphalion then provides a live demonstration of the extractor, showcasing its ability to process SmPCs in multiple languages (Dutch, Spanish, English), automatically identify and extract key data elements such as product names, strengths, dosage forms, ATC codes, registration numbers, indications, and adverse effects. The demo highlights the tool's intuitive web-based interface for user verification and editing, as well as its capability to export structured data in various formats like Excel, CSV, and XML.

A significant portion of the webinar focuses on the integration of this extracted data into a Regulatory Information Management (RIM) solution, specifically Exchido's mpd manager. This integration aims to establish a central product database for managing regulatory activities, generating reports, and tracking acknowledgments in a GxP-compliant environment. The vision extends to the future generation of electronic SmPCs (eSmPCs) and Patient Information Leaflets (ePILs) from this structured data, promising faster updates and direct patient communication, thereby optimizing current slow, paper-based processes. The presentation concludes with an update on the EMA's ISO IDMP status, acknowledging delays due to Brexit and organizational challenges but reaffirming the project's budget allocation and strategic necessity. It urges the industry to view IDMP as an opportunity for optimization and to prepare for this "marathon" by adopting advanced data extraction technologies.

Key Takeaways:

  • ISO IDMP is a Strategic Imperative: Despite implementation delays, ISO IDMP remains a critical regulatory standard for the pharmaceutical industry, requiring a fundamental shift towards structured data management for medicinal products. It is a long-term "marathon" that demands proactive preparation.
  • Manual Data Extraction is Inefficient: Past experiences with xEVMPD demonstrated that manual data extraction from SmPCs is highly time-consuming, prone to errors, and leads to data inconsistencies across various internal systems, with many companies still relying on Excel as a primary "RIM solution."
  • Automated Extraction Drastically Improves Efficiency: Tools like Unterpharma's "extractor" can automate the process of converting unstructured SmPC text into structured data, reducing extraction time from hours to minutes or even seconds, significantly enhancing efficiency and accuracy.
  • Multi-Language and Intelligent Processing: The demonstrated tool supports data extraction from documents in multiple languages (e.g., Dutch, Spanish, English) and uses semantic technologies to intelligently identify and categorize data elements like invented names, strengths, dosage forms, and ATC codes.
  • User-Friendly Verification and Editing: The web-based interface allows users to easily verify, edit, and confirm extracted data, providing a semi-automated verification process that reduces manual effort while maintaining data quality.
  • Integration with RIM Solutions is Key: The structured data extracted can be seamlessly exported (e.g., to Excel, CSV, XML) and imported into Regulatory Information Management (RIM) systems like Exchido's mpd manager, creating a central product database for comprehensive regulatory activities and GxP-compliant tracking.
  • Vision for Electronic Labeling: The ultimate goal is to leverage structured data to generate electronic SmPCs and Patient Information Leaflets (ePILs/eSmPCs), enabling rapid updates of critical product information and direct, real-time notification to patients, thereby optimizing current slow processes.
  • xEVMPD Remains Relevant: Due to IDMP delays, xEVMPD will remain mandatory for at least another 3-5 years, underscoring the ongoing need for robust data management solutions for existing regulatory requirements.
  • Beyond SmPCs: While the focus is on SmPCs, the underlying technology is adaptable for extracting data from other unstructured documents, such as Patient Information Leaflets and Model 3 documents, offering broader applications for data comparison and regulatory compliance.
  • Opportunity for Industry Optimization: IDMP should be viewed as an opportunity to achieve a "single source of truth" for product information, streamline regulatory processes, and enhance overall operational efficiency, rather than solely as a compliance burden.
  • Preparedness is Crucial: Companies need to assess their current data processes, IT infrastructure, and organizational readiness for IDMP, including establishing specific objectives, providing training, and staying updated on regulatory developments.
  • Advanced Data Classification: The tool can also extract and classify adverse effects from SmPCs, including their frequencies, demonstrating its capability to handle complex and nuanced data elements beyond basic product characteristics.

Tools/Resources Mentioned:

  • Unterpharma's "extractor": A flagship tool for automated data extraction from SmPCs.
  • TechniQ: Consultancy and parent company of Unterpharma, specializing in linked data and semantic technologies.
  • Asphalion: International regulatory and scientific consultancy, partner in the webinar.
  • Exchido: German software provider, specifically their "mpd manager" product database for RIM solutions.
  • Rockabiary Connect: Unterpharma's platform for managing controlled vocabularies.
  • Text Verification Tool: Used for cross-checks in the larger data management picture.
  • EMA (European Medicines Agency): Regulatory authority mentioned in the context of IDMP and xEVMPD.
  • FDA (US Food and Drug Administration): US regulatory authority, mentioned in relation to SPL (Structured Product Labeling) and IDMP.
  • SPL (Structured Product Labeling): FDA's standard for product labeling, mentioned as a structured data source in the US.

Key Concepts:

  • ISO IDMP (Identification of Medicinal Products): A set of international standards for the unique identification and structured data management of medicinal products, crucial for global regulatory harmonization.
  • SmPC (Summary of Product Characteristics): A comprehensive document providing essential information about a medicinal product, primarily for healthcare professionals.
  • xEVMPD (Extended EudraVigilance Medicinal Product Dictionary): The EMA's system for collecting and managing data on authorized medicinal products, a precursor to IDMP.
  • Linked Data: A method of publishing structured data on the web so that it can be interlinked with other data, making it more useful through semantic queries.
  • Semantic Technologies: Technologies that enable machines to understand the meaning (semantics) of data, facilitating more intelligent data processing and interpretation.
  • RIM (Regulatory Information Management) Solution: Software systems used by pharmaceutical companies to manage and track regulatory submissions, product registrations, and compliance data throughout the product lifecycle.
  • eSmPC/ePIL (Electronic SmPC/Patient Information Leaflet): The future vision for digital, structured versions of these documents, designed for faster updates, easier access, and direct patient communication.

Examples/Case Studies:

  • Live Demo of SmPC Extraction: The webinar included a live demonstration of the "extractor" tool processing SmPCs in Dutch, Spanish, and English.
  • Specific Data Elements: Examples of extracted data included product names (e.g., "Asphalina 10 milligram"), strengths, dosage forms (e.g., "capsule heart"), ATC codes, registration numbers, indications, and adverse effects.
  • xEVMPD Manual Pain Points: The speakers referenced the challenges faced by companies during xEVMPD implementation, where significant manual effort was required to copy-paste data from SmPCs into Excel or other basic systems.
  • Integration with Exchido's mpd manager: The concept of exporting structured data from the extraction tool and importing it into Exchido's mpd manager was presented as a solution for centralized product data management and regulatory activities.