Introduction to the Identification of Medicinal Products (IDMP) Ontology Project
Object Management Group
/@ObjectMgmtGroup
Published: July 25, 2022
Insights
This video provides an in-depth introduction to the Identification of Medicinal Products (IDMP) Ontology Project, spearheaded by the Pistoia Alliance. Presented by Heiner Oberkampf, Co-Founder & CEO of Accurids, the session outlines the critical role of ISO IDMP standards in uniquely identifying and describing medicinal products globally, driven primarily by regulatory mandates from bodies like the FDA and EMA. The core problem addressed is the current fragmentation and inconsistency in IDMP implementations across the pharmaceutical industry, largely due to the standards being published as non-machine-actionable PDF documents, leading to varied interpretations and significant data integrity risks.
The project's central solution is the development of an IDMP Ontology, designed to augment existing standardization efforts by enabling deep, semantic interoperability based on the FAIR principles (Findability, Accessibility, Interoperability, Reusability). This semantic layer aims to counteract diverse implementations, facilitate automation of data processes, enhance patient safety, and foster seamless collaboration among regulators, pharma companies, suppliers, and healthcare providers. The EMA's impending enforcement of IDMP compliance from 2023 onwards underscores the urgency and importance of this initiative, which has garnered support from major pharmaceutical companies including Bayer, Novartis, Merck, Boehringer Ingelheim, Roche, GSK, J&J, Amgen, and AstraZeneca, with active collaboration with the FDA.
Oberkampf details the project's modular ontology development approach, which takes the ISO standards as a starting point but restricts its scope to address specific "competency questions" derived from real-world business cases and regulatory needs. The ontology leverages semantic web standards (RDF, OWL, SKOS) and reuses foundational "common ontologies" from the EDM Council for non-pharma-specific concepts like data types, identifiers, and quantities, allowing the team to focus on complex chemical and pharmaceutical aspects. A key modeling pattern discussed is the concept of "contextualized roles," which allows for precise definitions of how a substance plays a specific role (e.g., active moiety) within another object under a given context (e.g., regulatory vs. scientific). The project uses Accurids for use case implementation and applying the ontology on actual data, and the EDM Council provides the ontology development governance and hosting environment.
The presentation highlights several critical use cases, including enhancing patient safety through unambiguous product identification for pharmacovigilance, and efficiently answering complex regulatory questions such as identifying clinical trials where a specific substance was administered in a particular region. Oberkampf demonstrates how the ontology can be used to query public data (like FDA's Global Substance Registration Service and the ChEBI ontology) to identify substances with shared active moieties or specific registered identifiers. The distinction between the ontology (a few hundred conceptual definitions) and the vast data graph (millions of instances) is emphasized, with the data graph being crucial for testing the ontology's effectiveness and demonstrating its value in answering real-world problems for both public and private pharma data.
Key Takeaways:
- Regulatory Imperative for IDMP: The ISO standards for Identification of Medicinal Products (IDMP) are driven by regulatory requirements, with the EMA enforcing compliance from 2023, making robust IDMP implementation a critical industry need.
- Addressing Data Inconsistency: The current PDF-based IDMP standards lead to varied interpretations and inconsistent implementations across the industry, posing significant risks to data integrity and hindering automation in submissions and patient safety.
- Semantic Interoperability through Ontology: The IDMP Ontology project aims to provide deep, semantic interoperability for medicinal product data, enabling machine-actionable definitions and consistent data interpretation across diverse stakeholders.
- Leveraging FAIR Principles: The ontology development is grounded in FAIR principles, particularly focusing on "Interoperability" to bridge disparate data systems and facilitate data exchange within and between organizations.
- Bridging Internal and External Silos: The IDMP Ontology acts as a crucial translator, connecting simplified departmental data views within pharma companies to a common semantic definition, and enabling cross-organizational collaboration (e.g., between pharma and regulators, or between partners like Pfizer and BioNTech).
- Real-world Use Cases: The project is driven by specific "competency questions" from pharma companies and authorities, such as identifying clinical trials for a specific substance, finding all substances with a common active moiety, or retrieving specific regulatory codes.
- Modular and Reusable Ontology Design: The ontology is built modularly, reusing established "common ontologies" (e.g., from EDM Council) for generic concepts like identifiers, quantities, and registries, allowing the project to focus on domain-specific pharmaceutical complexities.
- Contextualized Roles for Precision: A key modeling pattern allows for defining how a substance plays a specific role (e.g., active moiety) within another object, distinguishing between different contexts like regulatory versus scientific views.
- Importance of Persistent Identifiers: All data objects are assigned persistent identifiers to ensure unique and unambiguous referencing, crucial for data linking and interoperability.
- Deep Chemical Linking: Accurate matching and linking of substances rely not just on names (which can be misleading) but also on deep chemical information, specifically molecular structures, to ensure precise identification.
- Collaborative Industry Effort: The project is a broad, open initiative involving numerous major pharmaceutical companies, regulatory bodies (FDA, EMA), and expert consultants, fostering industry-wide adoption and impact.
- Distinction Between Ontology and Data Graph: The ontology provides the conceptual structure (hundreds of concepts), while the data graph represents the actual data instances (millions of objects), which is used to test the ontology's effectiveness in answering competency questions.
- Demonstrated Value and Future Expansion: The project has successfully demonstrated the ontology's value on both public (FDA GSGRS) and private pharma data, with plans for parallel implementations across more companies and future phases to expand coverage beyond small molecules to biological products.
Tools/Resources Mentioned:
- Accurids: A tool used for implementing use cases, applying the IDMP ontology on actual data, and exploring data.
- EDM Council (Enterprise Data Management Council): Provides the ontology development governance and hosting environment, including common ontologies (e.g., FIBO for financial business ontology, which shares governance principles).
- Pistoia Alliance: The framing organization for the IDMP Ontology project.
- FDA Global Substance Registration Service (GSGRS): A public data source used for testing the ontology.
- Chemical Entities of Biological Interest (ChEBI) Ontology: A reference ontology for chemical entities, used for alignment and deeper chemical classification.
- ISO Standards (IDMP): The foundational Identification of Medicinal Products standards (e.g., ISO 11238 on substance, 11240 on units, 11615 on medicinal products).
Key Concepts:
- IDMP (Identification of Medicinal Products): A set of five ISO standards providing an international framework to uniquely identify and describe medicinal products, driven by regulatory requirements.
- Ontology: A formal, explicit specification of a shared conceptualization, used here to provide a machine-actionable, semantically rich representation of IDMP standards.
- FAIR Principles: A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. The IDMP Ontology specifically targets the "Interoperability" aspect.
- Semantic Interoperability: The ability of computer systems to exchange data with unambiguous, shared meaning, crucial for automating processes and ensuring data consistency.
- Active Moiety: The part of a substance that is responsible for its biological or pharmacological activity.
- Contextualized Roles: A modeling pattern that allows for defining how a substance plays a specific role within another object, with the role being dependent on a particular context (e.g., regulatory, scientific).
- Persistent Identifiers (PIDs): Unique, long-lasting identifiers assigned to data objects, ensuring they can be reliably referenced over time and across different systems.
- Molecular Graph: A representation of a molecule's structure that captures atoms and their bonds, providing a highly accurate basis for chemical identification and matching, superior to string-based representations.
Examples/Case Studies:
- Patient Safety/Pharmacovigilance: The "blue pill" scenario, where a patient reports adverse symptoms after taking a medication, requiring unambiguous identification of the exact product, its ingredients, and related substances for safety assessment.
- Clinical Trial Query: A health authority asking, "In which clinical trials were substance X with ingredient product Y registered in the EMA region administered to patients?" This question, which currently takes weeks and often yields incorrect results, is a primary driver for the ontology project.
- Shared Active Moiety: Identifying all substances that share a common active moiety, which is critical for understanding drug families and potential cross-reactivity or safety signals.
- Specific Substance Codes: Retrieving specific codes (e.g., EMA's EVMPD codes, FDA codes, WHO codes, internal codes) for a given substance, highlighting the complexity of identifier management.
- Cross-Organizational Collaboration: The example of Pfizer and BioNTech collaborating on the COVID vaccine, where a standardized, semantically aligned representation of substance and manufactured item information would greatly facilitate data sharing and accelerate development.