Season 2 Episode 2: Title Debunking the Myths of AI

Veeva Systems Inc

/@VeevaSystems

Published: September 8, 2023

Open in YouTube
Insights

This video provides an in-depth exploration of applying Artificial Intelligence (AI) and Machine Learning (ML) to enhance efficiency and quality in clinical data management. Hosted by Veeva Systems, the discussion features Andy Cooper, CEO of CluePoints, an expert in risk-based quality management systems. The primary goal is to move beyond the AI "buzzword" and identify concrete, high-value applications within the clinical trial lifecycle, emphasizing the necessity of combining deep industry context with advanced algorithmic techniques.

The conversation begins by defining AI as a broad, often overused term, contrasting it with Machine Learning (ML), which involves applying algorithms to programmatically understand data, find patterns, and inform decisions. A critical distinction is made between supervised learning, which requires data labeling (e.g., classifying film reviews as positive or negative) and achieves high accuracy quickly, and unsupervised learning, where the algorithm finds patterns in unlabeled data. Unsupervised learning is highlighted as particularly powerful for data anomaly detection in clinical trials, helping to reduce the "noise" generated by unnecessary or erroneous edit checks and focusing resources on critical data issues. The speakers stress that generic AI tools like ChatGPT are ineffective with clinical data because they lack the necessary clinical context, necessitating customized, self-supervised ML approaches that learn the nuances of the pharmaceutical world.

A major theme is the industry's shift toward risk-based monitoring (RBM) and away from the outdated paradigm of 100% Source Data Verification (SDV). The guest notes that while reducing SDV saves significant time and money, the core objective must be increasing data quality. CluePoints' work focuses on using statistical methodologies and ML to eliminate manual review processes, enabling predictive capabilities that flag potential site issues early in a trial. This predictive approach is vital because mistakes made early in a study often perpetuate, making late correction impossible. The ultimate ambition discussed is achieving "submission ready" status in hours or days, rather than weeks or months, a goal that requires fundamental change management and process centralization, not just minor technological tweaks. The discussion concludes by addressing the challenge of regulatory trust in the "black box" nature of ML, asserting that trust must be built over time by consistently demonstrating accurate and reliable results, such as the FDA's adoption of CluePoints’ engine for data detection work.

Key Takeaways:

  • AI vs. ML Definition: AI is a broad, often overused term; Machine Learning (ML) is the practical application of algorithms to find patterns in data. If a solution is written on a PowerPoint, it's often called AI; if it's written in Python, it's likely ML.
  • Clinical Context is Essential: Standard, off-the-shelf AI models (like general LLMs) are ineffective for clinical data because they do not understand the specific context, requiring customized, clinically-trained ML approaches.
  • Unsupervised Learning for Anomaly Detection: Unsupervised ML is highly effective for data anomaly detection in clinical trials, allowing algorithms to find patterns and flag potential issues that traditional, rule-based edit checks might miss, thereby reducing noise and focusing data management efforts.
  • Reducing SDV and Manual Effort: The industry must continue shifting away from 100% Source Data Verification (SDV) toward risk-based approaches (RBM). Eliminating manual processes like excessive edit checks frees up resources and improves data quality simultaneously.
  • Predictive Analytics for Early Intervention: ML enables predictive capabilities that identify emerging patterns of oversight or potential data fraud early in a trial, allowing sponsors and CROs to intervene and course-correct before errors become systemic.
  • Accelerating Submission Timelines: The goal should be moving from last patient last visit to submission readiness in hours or days, not weeks. Achieving this requires a massive shift in working processes, prioritizing constant data review and centralization of oversight.
  • Successful Technology Partnerships: Effective technology partnerships require alignment on core values and a shared vision, often involving customer input, to ensure that the combined solutions address genuine industry pain points.
  • Case Study: Automated Medical Coding: A successful partnership between Veeva and CluePoints demonstrated the power of ML by achieving over 99% accuracy in MedDRA and WHO Drug coding, significantly surpassing the 65-70% accuracy rate of traditional synonym-list-based coding.
  • The "Black Box" Challenge: The inherent lack of transparency in ML algorithms (the "black box") is a major concern for a technical and regulated industry. Trust must be built by consistently demonstrating the accuracy and reliability of the results, as seen with regulatory bodies like the FDA adopting ML engines for data review.
  • Need for Data Consolidation: A major inhibitor to agility is distributed data across multiple systems. The industry needs a common environment or ecosystem where all information flows into one place, improving data currency and simplifying data movement for all stakeholders.
  • Eliminating Redundant Assessments: A major efficiency drain is the constant redoing of the same patient assessments, instruments, and translations across different studies, a human-controlled requirement that, if eliminated, would save substantial time and cost.

Tools/Resources Mentioned:

  • Veeva Vault CDMS
  • CluePoints
  • ChatGPT (used as a comparison for general AI)
  • Python (mentioned as the language often used for ML)

Key Concepts:

  • Supervised Learning: Machine learning approach where the input data is labeled, allowing the algorithm to learn quickly from examples.
  • Unsupervised Learning: Machine learning approach where the input data is unlabeled, requiring the algorithm to find hidden patterns and structure on its own, ideal for anomaly detection.
  • Self-Supervised Learning: A form of ML where the algorithm teaches itself by generating labels from the input data (e.g., predictive text).
  • Risk-Based Monitoring (RBM): A quality management approach emphasizing the identification and mitigation of risks to critical data and processes, guided by ICH E6 R3.
  • Source Data Verification (SDV): The traditional, resource-intensive process of checking every data point against the source records.
  • Data Anomaly Detection: Using ML to identify unusual or potentially fraudulent data patterns that deviate from expected norms.

Examples/Case Studies:

  • Veeva/CluePoints Coding Partnership: Achieved over 99% accuracy in medical coding (MedDRA/WHO Drug) using machine learning, replacing inefficient synonym lists that typically yield only 65-70% accuracy.
  • Regulatory Adoption: The FDA has adopted CluePoints' engine to perform detection work on submitted clinical trial data, validating the use of advanced statistical and ML methodologies for ensuring data integrity.