AstraZeneca: CTMS Data Quality and Migration Approach
Veeva Systems Inc
/@VeevaSystems
Published: February 22, 2021
Insights
This video provides an in-depth analysis of AstraZeneca’s strategy for ensuring data quality and managing the complex migration process during a Clinical Trial Management System (CTMS) implementation, specifically moving away from a two-decade-old legacy system. The discussion centers on overcoming the "Fear Factor" associated with migrating vast amounts of historical data while ensuring the quality meets the standards requisite for a modern platform. The core challenge addressed is defining the scope of migration, executing extensive data cleaning, and choosing the optimal deployment strategy to minimize disruption to end-users.
AstraZeneca's initial step involved rigorously defining the scope of the migration. Given the 20-year history of the legacy CTMS, the team had to decide which data sets were essential to transfer versus which could remain in an accessible archive. The criteria established for migration included any studies with an existing record in the Trial Master File (TMF), all currently ongoing studies, and historical studies dating back to 2009. This scoping exercise resulted in a substantial workload of approximately 1,300 studies requiring migration. The speaker emphasized that underestimating the complexity of data quality and cleaning activities is a critical mistake in such large-scale projects.
The migration process required extensive engagement with the vendor (Veeva) and their partners to conduct multiple dry runs. A major learning point highlighted was the necessity of cleaning not only the operational data (study metrics, site details) but also the underlying reference data, such as the Global Directory. The legacy system lacked data mastering capabilities, necessitating a massive upfront cleaning exercise to standardize and harmonize data elements before transfer. This cleaning effort was crucial because data quality issues, particularly in older CTMS platforms lacking automated workflows and roll-up calculations, can often be hidden until the migration process exposes them.
Crucially, AstraZeneca chose an "all-in" migration approach over a phased implementation. Initially, they considered phasing the rollout—starting with new studies, then moving newer historical studies, and finally the oldest ones. However, this phased approach was rejected because it would have resulted in end-users operating under different ways of working simultaneously, creating confusion and inefficiency. By opting for the "all-in" strategy, the company committed to cleaning all 1,300 studies upfront, ensuring that when the new system went live, all users would immediately transition to standardized, clean data and unified workflows, thereby alleviating downstream issues on the new platform.
Key Takeaways: • Data Migration is the Primary Barrier: The biggest initial fear factor and operational barrier in a new CTMS implementation is the process of migrating data and ensuring its quality meets the standards of the new system. • Rigorous Scope Definition is Essential: Organizations must clearly define the criteria for data transfer. AstraZeneca decided to migrate only ongoing studies, studies linked to their TMF, and historical studies back to 2009, resulting in a manageable scope of roughly 1,300 studies. • Do Not Underestimate Data Cleaning: Data quality and cleaning activities require significant resources. This effort must cover both operational data (e.g., study status, site information) and underlying reference data (e.g., investigator names, global directories). • Mastering Reference Data is Critical: Legacy systems often lack proper data mastering. A major cleaning exercise is required to standardize and harmonize reference data, such as global directories, which are key components of CTMS functionality. • Hidden Data Quality Issues: In older CTMS platforms that lack automated workflows and auto-calculation features, data quality deficiencies can be masked. Migration dry runs are essential for exposing these hidden issues before the final go-live. • Phased vs. All-In Implementation: While a phased approach (starting with new studies, then moving older ones) might seem less risky, it creates disparate ways of working for end-users. AstraZeneca chose the "all-in" approach to ensure immediate standardization and unified workflows across the organization. • Upfront Cleaning Alleviates Future Issues: The decision to clean all 1,300 studies upfront, though resource-intensive, was deemed necessary to ensure the data was as clean as possible before hitting the new platform, thereby reducing post-implementation data remediation efforts. • Leverage Vendor Partnerships: Engaging the system vendor (Veeva) and their implementation partners to conduct multiple dry runs is a best practice for testing the migration process and identifying technical and data-related issues early. • Resource Allocation for Migration: Data migration is not a purely technical task; it requires a large, dedicated team to support the cleaning, validation, and execution activities across the defined scope.
Tools/Resources Mentioned:
- Veeva (Implied CTMS platform)
- Legacy CTMS (20 years old)
- TMF (Trial Master File)
Key Concepts:
- CTMS (Clinical Trial Management System): Enterprise software used in the life sciences industry to manage, plan, track, and report on clinical trials.
- Data Migration: The process of moving data from one system (the legacy CTMS) to another (the new CTMS), often involving transformation and cleaning.
- Reference Data: Static data sets used consistently across the system (e.g., site names, country codes, investigator lists, Global Directory). Ensuring the quality and mastering of this data is crucial for system integrity.
- Dry Runs: Practice runs of the migration process used to test data transformation scripts, identify errors, and calculate the time and resources required for the final migration.
Examples/Case Studies:
- AstraZeneca Case Study: The entire transcript serves as a case study detailing AstraZeneca’s migration from a 20-year-old CTMS, involving approximately 1,300 studies, and their strategic decision to adopt an "all-in" implementation approach after extensive upfront data cleaning.