Superset Meetup: How Veeva Nitro is using Superset to power LifeSciences Analytics
Preset
/@Preset-io
Published: November 11, 2021
Insights
This video provides an in-depth exploration of how Veeva Systems, a leading cloud software provider for the life sciences industry, leverages Apache Superset to power its Veeva Nitro analytics platform. Nitesh Baranwal, Director of Product for Veeva Nitro, and Chatham Reed, from Veeva's product management, detail Veeva's vision for an "industry cloud" for life sciences, covering both the development and commercial sides of the pharmaceutical coin. The presentation focuses on Nitro, their data science and analytics platform, which is specifically geared towards commercial operations. They explain how Nitro integrates proprietary Veeva data (CRM, CLM, Engage) with industry data (sales, claims, formulary) through robust data pipelines, making it available for analytics and data science.
The speakers elaborate on the three key personas targeted by Veeva Nitro: system administrators who execute data pipelines, data analysts and scientists who build transformations and perform advanced analytics, and business users who consume reports and dashboards. A significant portion of the discussion centers on "Nitro Explorer," Veeva's bundled and packaged version of Apache Superset, which serves as their modern BI tool for HQ reporting, aiming to replace traditional tools like Tableau, Qlik, and Power BI. They highlight the strategic decision to adopt Superset, emphasizing its open-source nature, cloud-native compatibility, significant cost and time-to-market savings, and the strength of its community.
Veeva has made several enhancements to the base Superset product to create Nitro Explorer, including direct connectivity to their data warehouse, S3-based workspaces for blending local files, integration with Nitro's existing authentication and authorization, and a unique multi-instance deployment model to support different customer groups or regions within a multi-tenant architecture. They also discuss challenges encountered, such as managing user expectations around SQL proficiency, handling large data volumes, and dashboard personalization. The presentation concludes with a showcase of various internal and external dashboards built using Nitro Explorer, demonstrating its versatility for use cases ranging from business benchmarking and digital event tracking to ETL job monitoring, sales performance, distribution analysis, medical insights, and digital engagement.
Key Takeaways:
- Veeva's "Industry Cloud" Vision: Veeva aims to provide an end-to-end cloud solution for the life sciences industry, supporting both the drug development (clinical operations, research) and commercialization (sales, marketing, safety) phases with software, data, and consulting services.
- Veeva Nitro's Role in Commercial Operations: Nitro is positioned as an essential data science and analytics platform, integrating diverse data sources (Veeva products, industry data like sales and claims) through intelligent pipelines to power commercial analytics and strategic partnerships.
- Target Personas for Nitro: The platform caters to system administrators for pipeline execution, data analysts/scientists for custom transformations and advanced modeling (forecasting, clustering), and business users for reporting and dashboarding, providing a comprehensive solution.
- Benefits of Veeva Nitro: It offers accelerated time to value (weeks instead of months for data pipeline setup), delivers actionable insights through both field-facing (Veeva CRM MyInsights) and HQ-level reporting (Nitro Explorer), and is built as a flexible, cloud-native system on AWS leveraging a big data stack.
- Strategic Adoption of Apache Superset: Veeva chose Superset (branded as Nitro Explorer) as its primary HQ BI tool due to its open-source nature, self-management capabilities, cloud-native integration, significant time-to-market advantage (saving months of development), and an active, supportive community.
- Commitment to Open Source: Nitro Explorer represents Veeva's first user-facing open-source project, signifying a deep commitment to Superset, as it's not easily replaceable once integrated into customer workflows. Veeva plans to stay current with Superset versions and contribute back to the community.
- Multi-Tenant Deployment of Superset: Veeva uniquely deploys Superset in a pseudo multi-tenant fashion using Kubernetes, allowing them to manage a single code line across all customers, which is crucial for their "product as a service" model in life sciences.
- Veeva's Enhancements to Superset (Nitro Explorer): Key additions include direct connectivity to Veeva's data warehouse, S3-based "workspaces" for users to blend local files with enterprise data, integration with Nitro's SSO/authentication, and the ability to deploy multiple Explorer instances for regional or brand segregation.
- "Explorer for Explorer" for Usage Analytics: Veeva developed a usage analytics dashboard within Nitro Explorer itself, by tapping into Superset's internal logs, to track top actions, most active users, and popular dashboards, which helps in developing future analytics roadmaps.
- Challenges and Best Practices: Users coming from traditional BI tools require a mindset shift towards SQL proficiency for effective data set creation in Superset. Best practices for large datasets involve optimizing SQL at the data set level (filtering, projections, aggregations) to minimize data transferred to the dashboard.
- Internal Use Cases at Veeva: Nitro Explorer is used internally by Veeva's Business Consulting team for benchmarking (migrating from Tableau) and by the Digital Events team (migrating from legacy .NET/SSRS) to rapidly create reports, and by service teams for ETL job monitoring.
- External Customer Use Cases: Pharma customers utilize Nitro Explorer for diverse analytics, including sales performance, distribution center analysis (using arc maps), specialty sales tracking, medical insights (e.g., word clouds for trending topics), and digital engagement measurement.
- Rapid Dashboard Development: The agility of Nitro Explorer allows teams to quickly build and deploy dashboards once data is onboarded, reducing reliance on traditional BI development cycles and empowering business users and analysts.
Tools/Resources Mentioned:
- Veeva Products: Veeva CRM, Veeva Nitro, Veeva CLM, Veeva Engage, MyInsights
- BI/Analytics Tools: Apache Superset (Nitro Explorer), Tableau, Qlik, Power BI, MicroStrategy, Hotspot
- Cloud/Big Data Technologies: Amazon Web Services (AWS), Amazon S3, Amazon EMR, Hadoop, Spark, Kafka, Kubernetes, Redshift (used as an IDP)
- Legacy Technologies: .NET, SSAS (SQL Server Analysis Services), SSRS (SQL Server Reporting Services)
- Charting Libraries: Echarts (mentioned as underlying Superset charts)
Key Concepts:
- Industry Cloud: A specialized cloud platform tailored for a specific industry, offering integrated software, data, and services.
- Public Benefit Corporation (PBC): A type of for-profit corporate entity that includes positive impact on society, workers, the community, and the environment in addition to profit as its legally defined goals.
- Multi-tenant Architecture: A single instance of a software application serves multiple customers (tenants), where each tenant's data is isolated and remains invisible to other tenants.
- Time Machine: A concept within Veeva Nitro for aggregating and rolling up metrics by different cycles (e.g., quarterly, monthly) based on client needs.
- Field Reporting: Analytics and reports designed for sales representatives and field-based teams, often accessed on mobile devices.
- HQ Reporting: Analytics and dashboards for headquarters staff, typically used for strategic analysis and decision-making.
- Data Pipelines: Automated workflows for moving and transforming data from various sources to a target destination for analysis.
- ETL (Extract, Transform, Load): A data integration process that extracts data from sources, transforms it to fit business needs, and loads it into a data warehouse or other system.
- SSO (Single Sign-On): An authentication scheme that allows a user to log in with a single ID and password to gain access to multiple connected systems without being prompted for credentials again.
- AuthN/AuthZ (Authentication/Authorization): Authentication verifies user identity; Authorization determines what an authenticated user is allowed to do.
- Data API: An interface that allows programmatic access to data, enabling other applications or services to retrieve or manipulate it.
Examples/Case Studies:
- Pfizer and Moderna: Mentioned as customers that Veeva helped through their vaccine development during the pandemic, highlighting the essential nature of the life sciences industry.
- Business Consulting Team Hackathon: A 2-day workshop where Veeva's business consultants were able to convert 30-40% of their existing Tableau dashboards to Nitro Explorer, demonstrating the tool's rapid development capabilities. They are now near 100% conversion.
- Digital Events Team Migration: An internal Veeva team migrating their reporting from a legacy .NET application using SSRS to Nitro Explorer, significantly reducing the time needed to create new reports.
- ETL Job Monitoring Dashboards: Internal operational dashboards built by Veeva's service teams using Nitro Explorer to track daily/weekly job metrics, durations, and errors across client instances and connectors.
- Sales Performance Dashboards: Examples of traditional sales performance dashboards, distribution analysis (using arc maps to show product spread), and specialty sales tracking, all replicated and enhanced in Nitro Explorer.
- Medical Insights Dashboard: An example featuring a word cloud to visualize trending medical topics, top medical insights, and inquiries, helping reps prepare for doctor meetings.
- Digital Engagement Tracking: A dashboard showing the correlation between increased digital engagement (e.g., virtual doctor contacts) and sales activity, particularly relevant during the pandemic.