Episode 4: How Clean is Your Data?
Veeva Systems Inc
/@VeevaSystems
Published: October 10, 2022
Insights
This video provides an in-depth exploration of the evolving landscape of clinical data management, featuring a conversation between Richard Young, Vice President of Vault CDMS Strategy at Veeva, and Trevor Griffiths, Senior Director of Clinical Data Management at Syneos Health. The discussion centers on the increasing complexity of clinical trials due to diverse data sources and the transformative role of technology, particularly AI and machine learning, in streamlining data cleaning and management processes. Both speakers, with over two decades of experience in data management, reflect on the industry's shift from traditional paper-based methods to a highly digital and integrated approach.
The conversation highlights a significant change in the CRO industry, where clinical trials now commonly involve 10 to 12 different data sources, moving beyond traditional EDC and central labs to include wearables, bedside monitors, and handheld devices. This diversification necessitates a robust digital clinical trial (DCT) strategy. A key revelation is Syneos Health's success in eliminating over 3,000 hours of manual data review in a single large trial through the application of machine learning. This efficiency gain underscores a fundamental shift in the data manager's role, moving away from repetitive, manual tasks towards more strategic functions, with data managers evolving into "data scientists" who are pivotal to the early stages of trial design and overall data strategy.
The speakers also delve into the capabilities of platforms like Veeva CDB, which facilitates the collation of data from multiple sources, enables centralized cleaning, and allows for automated query generation back into EDC systems or direct notification to vendors. This integrated approach is crucial for managing the increased number of stakeholders and data streams, especially under growing pressure to shorten database lock timelines. The discussion concludes with a forward-looking perspective, envisioning a future where manual data review is entirely eliminated, data formats are standardized across the industry, and true data lake strategies enable seamless data capture without complex integrations, ultimately driving higher quality and efficiency in clinical trials.
Key Takeaways:
- Evolution of Clinical Data Management: The industry has transitioned from primarily paper-based CRFs to EDC, and now faces a new era of highly diverse digital data sources, with 10-12 different types of data (e.g., EDC, central labs, IVRS, wearables, bedside monitors) becoming standard in a single clinical trial.
- Transformative Impact of AI and Machine Learning: AI and machine learning are proving instrumental in cleaning clinical data, particularly in automating manual review processes. Syneos Health successfully reduced manual cleaning effort by over 3,000 hours on a single large trial using these technologies.
- Shifting Role of the Data Manager: The data manager's role is evolving from a task-oriented position focused on manual cleaning to a more strategic "data scientist" role. They are becoming pivotal team members involved earlier in trial design, focusing on data strategy, analytics, and coordination rather than minutiae.
- Investment in Data Science Skills: Companies like Syneos Health are actively investing in "data scientist districts" and providing training to upskill existing data managers, equipping them with the necessary skills to handle diverse data types and leverage advanced analytical tools.
- Centralized Data Platforms are Crucial: Platforms like Veeva CDB are essential for collating data from multiple sources into a central location, enabling comprehensive review, automated query generation back to EDC, and efficient notification of data issues to vendors.
- Pressure on Database Lock Timelines: Despite the increased complexity and number of data sources, there is growing pressure to shorten database lock timelines. This is being achieved through the strategic use of AI/ML and robust project and stakeholder management.
- Importance of Early CRO Involvement: CROs bring extensive experience with diverse DCT vendors and technology solutions. Their early involvement in protocol design can provide thoughtful input, ensuring optimal data collection methods and vendor selection from the outset.
- Desire for No-Code Listing Creation: A significant pain point is the need for programming complex listings. The industry desires systems where data managers can create listings by clicking and dragging or using simple, function-based code, reducing the reliance on specialized programming.
- Advocacy for Industry-Wide Data Standardization: The lack of universal data format standardization (beyond CDASH and SDTM) across the industry is a major inefficiency. There is a strong call for a common, standardized format to enhance efficiency and simplify data journeys.
- Vision for a True Data Lake Strategy: The ideal future state includes a system capable of capturing all different data formats without any integration effort, implying an intelligent data lake that can ingest and interpret diverse data automatically.
- Elimination of Manual Data Review: The ultimate goal for data cleaning is to remove all forms of manual review, with cleaning processes being entirely electronic or handled by AI/machine learning to drive increased efficiency and quality.
- Potential Evolution of the CRA Role: While CRAs will always be needed, some of their responsibilities, particularly those related to remote monitoring and source data verification, could transition due to increased data availability and central monitoring capabilities.
- Interactive Patient Profiles: A desired capability is an interactive patient profile that allows for a holistic, manual "sanity check" of patient data, capturing cumulative review insights to inform and predefine additional edit checks and rules.
Tools/Resources Mentioned:
- Veeva CDB: Veeva’s clinical data platform for complete and concurrent data, designed to collate data from multiple sources and facilitate cleaning.
- EDC (Electronic Data Capture): Standard system for capturing clinical trial data.
- IVRS (Interactive Voice Response System): Used for randomization and drug supply management.
Key Concepts:
- Clinical Data Management (CDM): The process of collecting, managing, and ensuring the quality of data for clinical trials.
- Decentralized Clinical Trials (DCTs): Clinical trials where some or all trial-related activities occur at participants' homes or local sites, often leveraging digital technologies and diverse data sources.
- FSP (Functional Service Provider): A model where a pharmaceutical company outsources specific functions (like data management) to a CRO.
- Data Scientist District: An internal initiative by Syneos Health to train and evolve their data managers into data scientists, equipping them with advanced analytical and data handling skills.
- Data Lake Strategy: An approach to data storage that involves storing large amounts of raw data in its native format until it's needed, with the ability to capture diverse data types without prior integration definitions.
- CDASH (Clinical Data Acquisition Standards Harmonization) & SDTM (Study Data Tabulation Model): Standards developed by CDISC (Clinical Data Interchange Standards Consortium) for collecting and submitting clinical trial data.
Examples/Case Studies:
- Syneos Health's Machine Learning Implementation: Syneos Health successfully reduced over 3,000 hours of manual data review on a single large clinical trial by implementing machine learning algorithms for data cleaning.