Season 3 Episode 3: Does Data Science Require Data Perfection?

Veeva Systems Inc

@VeevaSystems

Published: October 9, 2024

Open in YouTube
Insights

This video provides an in-depth exploration of the evolution of data science, the practical application of artificial intelligence, and the concept of "data perfection" within the pharmaceutical and life sciences industries. Hosted by Richard Young, VP of Clinical Data Strategy at Veeva, the episode features Demetris Zambas, Global Head of Data Monitoring and Management at Pfizer. Zambas shares his extensive 33-year journey in the industry, from laboratory roles to pioneering clinical data management, and discusses how his early experiences shaped his focus on organized, outcome-driven data. The conversation emphasizes the critical role of data in proving hypotheses and supporting regulatory submissions, highlighting that the entire investment in a clinical trial hinges on generating "fit for purpose" data.

A significant portion of the discussion centers on the transformation of data management into data science. Zambas argues that a good data manager has always been a data scientist, characterized by critical thinking and the ability to ensure data is trustworthy and adequate for its intended purpose, rather than merely adhering to checklists. He recounts the period when data management was commoditized, focusing on output metrics like "queries per day," which obscured its true value. The video also delves into the strategic use of AI, likening it to a specialized tool rather than a universal solution. Zambas advocates for AI as an "assistant" or "co-pilot" to data scientists, capable of automating routine checks and flagging critical insights, thereby accelerating processes and allowing human experts to focus on higher-value tasks.

The conversation further explores the importance of industry collaboration and regulatory engagement. Zambas details his significant contributions to the Society for Clinical Data Management (SCDM), including making the Global Clinical Data Management Plan (GCDMP) publicly accessible to facilitate regulatory referencing and broader industry benefit. He stresses the ethical imperative for companies to collaborate on non-competitive issues like fraud and anomaly detection, citing the unprecedented cross-company data management calls during the COVID-19 vaccine development as a prime example. The video concludes by addressing the convergence of central monitoring and data science roles, the growing recognition of data management's value in areas like Real-World Evidence (RWE), and Zambas's personal aspirations for overcoming resistance to change and improving global access to medicines.

Key Takeaways:

  • Evolution of Data Management to Data Science: The role of a data manager has evolved from a commoditized function focused on output metrics (e.g., queries per day) to a critical data science discipline requiring deep critical thinking and an outcome-oriented approach to ensure data fitness for regulatory consumption.
  • "Fit for Purpose" Data Over Absolute Perfection: The standard for data quality should be "fit for purpose" – adequate for proving a hypothesis and supporting regulatory submissions – rather than striving for an unattainable "perfection." Effort should be tiered, with significant focus on endpoints and safety data.
  • AI as a Strategic Assistant/Co-pilot: AI should be viewed as a specialized tool, not a "Swiss army knife." Its most impactful application in data science is as an assistant or co-pilot, automating routine validation checks and highlighting critical data patterns for human data scientists, thereby accelerating insights and efficiency.
  • Outcome-Driven Focus: It is crucial to focus on meaningful outcomes (e.g., earlier market access for therapies) rather than solely on output metrics. Data professionals must articulate how seemingly "boring" operational improvements, potentially driven by AI, can lead to significant strategic advantages.
  • Importance of Industry Collaboration: For non-competitive areas like fraud detection, anomaly detection, and navigating shared challenges (e.g., regulatory communication during a pandemic), industry-wide collaboration among data management leaders is vital for collective success and patient benefit.
  • SCDM's Role in Discipline Advancement: Organizations like SCDM are crucial for enabling data professionals to impact their discipline, establish best practices (e.g., GCDMP), and engage directly with regulators (FDA, EMA, PMDA) to shape industry standards.
  • Public Accessibility of Guidelines: Making industry guidelines, such as the GCDMP, publicly available is essential for regulators to reference them and for fostering broader adoption and understanding across the community.
  • Convergence of Central Monitoring and Data Science: The roles, technologies, and processes of central monitoring and data science are increasingly converging, suggesting a future where data management and monitoring plans are integrated to drive more detailed, signal-driven data dives.
  • Increased Recognition of Data Management's Value: The discipline has transitioned from being considered non-core and outsourced to being recognized as a critical function, now actively invited to contribute to new areas like Real-World Evidence (RWE) data structuring and management.
  • Overcoming Resistance to Change: A significant impediment to progress is resistance to change. Professionals are encouraged to at least "try" new approaches in a controlled manner, even if not fully convinced, to foster innovation and efficiency.
  • Data Quality for Regulatory Trust: The ultimate goal of clinical data management is to deliver data that is robust enough to convince regulators and stakeholders that it is trustworthy for proving hypotheses and validating endpoints, thereby justifying the significant investment in clinical trials.
  • Global Access to Medicines: Beyond the technical aspects, the broader mission of data management and clinical trials is to facilitate the timely and equitable access to life-saving medicines for patients worldwide, a deeply motivating factor for industry professionals.

Tools/Resources Mentioned:

  • Veeva: The host, Richard Young, is VP of Clinical Data Strategy at Veeva.
  • Face Forward: An older EDC (Electronic Data Capture) system mentioned in the context of a tech transfer.
  • SCDM (Society for Clinical Data Management): A key industry organization discussed for its role in advancing the data management discipline.
  • GCDMP (Global Clinical Data Management Plan): A set of guidelines developed by SCDM, made public to aid regulatory referencing and industry best practices.
  • Python/R: Mentioned as utilities that a data scientist would use, rather than being the definition of data science itself.

Key Concepts:

  • Fit for Purpose: The primary definition of quality in clinical data, meaning the data is adequate and trustworthy for its intended use, particularly for regulatory submissions and proving hypotheses.
  • Data Commoditization: A historical period where data management was viewed as a low-value, outsourceable function, often measured by simple output metrics like "dollars per page" or "queries per day."
  • Central Monitoring: The process of remotely reviewing aggregated data to identify potential risks, trends, or issues across clinical trial sites, distinct from traditional on-site field monitoring.
  • Risk-Based Monitoring (RBM): An approach to clinical trial oversight that focuses monitoring activities on the most critical data and processes, based on identified risks, to ensure patient safety and data quality.
  • Real-World Evidence (RWE): Data derived from real-world settings (e.g., electronic health records, claims data) used to make inferences about the usage and potential benefits or risks of a medical product.
  • AI as Co-pilot/Assistant: A concept where AI systems augment human capabilities by automating routine tasks, analyzing large datasets, and flagging critical information, allowing human experts to focus on complex problem-solving and decision-making.

Examples/Case Studies:

  • Pfizer's COVID Vaccine Development: Demetris Zambas described regular, senior-leader-blessed calls between heads of data management across competing companies (e.g., J&J, AstraZeneca) during the COVID vaccine studies to share challenges and best practices, demonstrating industry collaboration for public good.
  • CAR T-cell Therapy Success: Zambas recounted the story of the first young girl cured of leukemia using CAR T-cell therapy, highlighting how a data manager at UPenn identified a critical pattern of increasing cytokine levels, prompting medical intervention and saving the patient's life. This example underscores the direct patient impact of meticulous data management.
  • Challenges with RWE Data: The discussion touched on how Real-World Evidence data, when initially received, is often unstructured and "a mess" compared to carefully designed clinical trial data, leading to invitations for data management experts to help structure, control, and manage it.