AI in eTMF - Auto-Classification and Metadata: Efficiency and Compliance
Flex Databases
/@Flexdatabases
Published: March 29, 2024
Insights
This webinar explores the application of Artificial Intelligence (AI) in electronic Trial Master Files (eTMF) for enhancing efficiency and ensuring GxP compliance in clinical research. Presented by Flex Databases, with contributions from Deloitte, the session demonstrates how AI can automate document filing and metadata assignment within an eTMF system, discusses the unique challenges of AI validation compared to traditional software, and provides a customer perspective on assuring GxP compliance for embedded AI applications. The core message revolves around leveraging AI as an "assistant" to improve both the speed and accuracy of TMF management, thereby reducing costs and freeing up clinical staff for more critical tasks, while meticulously addressing regulatory requirements.
The presentation begins with an overview of Flex Databases' eClinical suite, highlighting their Trial Master File and document management system, which already boasts advanced features like TMF reference model support, audit trails, and quality review functions. The speakers then introduce the AI functionality, demonstrating how documents—whether emailed, dragged, or even photographed—can be automatically classified and assigned metadata with high confidence levels. A key feature, the "copilot," allows users to set confidence thresholds for AI classification, enabling manual review and confirmation for documents falling below a certain probability, thus ensuring human oversight and continuous learning for the AI model. The discussion also addresses critical concerns regarding data privacy and security, emphasizing that Flex Databases uses local, private servers and separate instances for each client, unlike public AI models.
Following the practical demonstration, the webinar delves into the complexities of AI validation. Malik Bilgin from Flex Databases explains the fundamental difference between traditional, deterministic software validation and the probabilistic nature of AI, where outputs can vary due to factors like data quality and model training. He outlines a validation approach aligned with GAMP 5 life cycle phases, focusing on concept definition, risk assessment, and operational monitoring, with particular emphasis on data life cycle management and the capture of AI-specific details like algorithms and hyperparameters. Dr. Nico Erdmann from Deloitte then provides the customer's perspective, emphasizing the need for an integrated validation approach that considers both the vendor's pre-trained model and the customer's refined model. He highlights the importance of robust risk management, defining acceptable error rates, and establishing comprehensive governance for the operational phase, especially given the varying levels of AI autonomy.
Key Takeaways:
- AI for eTMF Efficiency: AI can significantly enhance the efficiency of eTMF management by automating document classification and metadata assignment, leading to faster processing, reduced manual effort, and cost savings, particularly for large-scale studies.
- Improved Compliance through Accuracy: AI is trained to classify documents with high precision, reducing errors in filing and metadata application, which directly contributes to maintaining compliance with regulatory requirements like GxP.
- Real-time Processing and Scalability: AI operates 24/7, enabling real-time document processing and metadata assignment. It can also scale efficiently to handle increased document volumes without requiring additional human staff, allowing clinical teams to focus on core activities.
- "Copilot" for Human Oversight: The system incorporates a "copilot" feature that allows users to set confidence thresholds for AI classifications. Documents falling below this threshold require human confirmation or declination, ensuring that AI acts as an assistant rather than a fully autonomous system and continuously learns from user feedback.
- Data Security and Privacy: Unlike public AI models, the eTMF AI solution uses local, private servers with protected API channels and encrypted connections. Each client has their own dedicated AI processing request to prevent data mixing and ensure compliance with data governance standards.
- AI Validation Challenges: AI validation differs from traditional software validation due to its probabilistic nature. Key risk factors include data quality, the amount and relevance of training data, and the potential for emergent situations or bugs, necessitating a continuous validation approach.
- GAMP 5 Aligned Validation: The validation process for AI in eTMF should follow GAMP 5 life cycle phases, including concept definition (what AI will do, expected results, error margin), specification (data usage, algorithms, architecture), risk assessment (identifying data-related and operational risks), and ongoing operational monitoring.
- Integrated Vendor-Customer Validation: Customers must consider a split validation approach, qualifying the vendor's pre-trained AI model and then validating the refined model based on their specific data. This requires close collaboration and clear delineation of responsibilities between vendor and client.
- Importance of Data Quality and Training: The accuracy and reliability of AI classification heavily depend on the quality, quantity, and diversity of the training data. The system learns from user confirmations and declines, continuously improving its recognition of specific study documents and non-standard files.
- Defining Risk Appetite and Review Processes: Before implementation, organizations must define their risk appetite regarding AI accuracy (e.g., 95% vs. 99% confidence). This informs the need for subsequent human review processes or acceptance of residual risks, which should be integrated into existing quality management systems.
- Comprehensive Validation Documentation: Tech providers should supply extensive validation support documents, including validation certification, User Requirement Specifications (URS), Installation Qualification (IQ), Traceability Matrix, Operational Qualification (OQ), maintenance plans, User Acceptance Testing (UAT) scenarios, training certificates, and 21 CFR Part 11 assessments.
- Handling Handwritten and Complex Data: While the system uses OCR to recognize text in pictures, handwritten information remains a challenge for AI. For documents with multiple dates, the system can extract and add each date separately to metadata, but it does not compare or determine "final" dates based on complex contractual logic.
Tools/Resources Mentioned:
- Flex Databases eTMF: An electronic Trial Master File system with integrated AI capabilities.
- GAMP 5 Second Edition: A guide for validating computerized systems in regulated environments, referenced for AI validation methodology.
- 21 CFR Part 11: FDA regulations concerning electronic records and electronic signatures, mentioned as a compliance requirement.
Key Concepts:
- eTMF (Electronic Trial Master File): A system for managing essential documents of a clinical trial in an electronic format, crucial for regulatory compliance.
- Auto-Classification: The AI-driven process of automatically categorizing and filing documents into the correct folders within the eTMF.
- Metadata Assignment: The automatic extraction and application of relevant descriptive data (e.g., document date, site, country) to documents.
- GxP Compliance: A set of good practice guidelines (e.g., Good Clinical Practice, Good Manufacturing Practice) ensuring quality and integrity in life sciences.
- Computer System Validation (CSV): The process of ensuring that a computerized system meets its intended use and regulatory requirements.
- Deterministic vs. Probabilistic Systems: Traditional software is deterministic (same input, same output), while AI is probabilistic (outputs can vary due to learning and data factors).
- Confidence Level/Probability: A measure of how sure the AI is about its classification or metadata extraction, used for human oversight.
- Hyperparameters: Configuration variables used to control the learning process of an AI model.
- Audit Trail: A chronological record of all actions taken within a system, essential for compliance.
- OCR (Optical Character Recognition): Technology that enables the system to recognize text within images or scanned documents.