Real-World Evidence: A Guide to RWE Analysis & Application

Executive Summary
Real-World Evidence (RWE) is clinical evidence regarding the use, benefits, or risks of medical products derived from Real-World Data (RWD) – data collected outside of controlled clinical trials, such as electronic health records (EHRs), insurance claims, disease registries, patient-generated data (wearables and apps), and other sources ([1]) ([2]). In recent years, RWE has emerged as a vital complement to traditional randomized controlled trials (RCTs). RCTs, while the gold standard for establishing efficacy under ideal conditions, often exclude key patient groups (e.g. the elderly, pregnant women, those with comorbidities) and may not reflect routine clinical practice ([3]) ([4]). By contrast, RWE offers insights into how interventions perform in broader, more diverse “real-world” populations, improving external validity and filling evidence gaps for rare or underserved subgroups ([5]) ([6]).
Stakeholders across healthcare – patients, providers, payers, industry, and regulators – are increasingly embracing RWE. Regulatory bodies have initiated policies and guidance to formalize RWE usage. In the USA, the 21st Century Cures Act (2016) mandated the FDA to evaluate RWE for drug approvals and post-market studies ([7]), leading to an FDA RWE Framework (2018) with recommendations on submitting RWD in applications ([8]) ([9]). The European Medicines Agency (EMA) and national agencies similarly recognize RWE’s role; for example, EMA’s 2025 strategy emphasizes integrating RWE into decision-making, and EMA’s DARWIN‐EU network now links data from ~180 million European patients to support regulatory studies ([10]) ([11]). In parallel, industry has built sophisticated RWE capabilities – from global analytics centers to local “glocal” evidence teams – to harness data for drug development, safety monitoring, and health economic assessments ([12]) ([13]).
Several landmark cases illustrate RWE’s impact. Notably, the FDA’s 2017 accelerated approval of avelumab (an oncology drug) for Merkel cell carcinoma relied on an external historical control derived from EHR data ([14]). Likewise, in 2019 the FDA expanded palbociclib’s (Ibrance) indication from women to include men with metastatic breast cancer based largely on retrospective RWD analyses (claims, EHR, safety databases) demonstrating treatment benefit ([4]) ([15]). These examples, among others catalogued in recent literature ([14]), underscore that RWE can support both safety evaluations and, increasingly, efficacy conclusions in settings not addressed by traditional trials.
However, generating trustworthy RWE entails significant challenges. RWD are often fragmented, unstructured, or incomplete: missing data, coding errors, and lack of randomization introduce bias and confounding ([16]) ([6]). Ensuring methodological rigor is critical – designs must use active comparators, new-user cohorts, and pre-specified causal inference frameworks; and analyses must apply advanced techniques (propensity scores, weighting, sensitivity analyses) to mitigate bias ([16]) ([6]). Data privacy and governance further complicate matters, especially as RWE ventures into patient-level digital footprints. To maximize RWE’s promise, the field is adopting novel analytics and interoperability standards: machine learning and natural language processing (NLP) are being used to extract information from free-text records ([17]) ([18]), and distributed network models (e.g. FDA Sentinel, OHDSI, EMA’s Common Data Model via EHDEN) enable multi-center data linkages while preserving privacy ([19]) ([11]).
This report provides an in-depth, evidence-based overview of the state of RWE analysis (see outline below). We begin by defining RWD/RWE and historical context. Next, we catalog the major RWD sources and study designs, and compare RWE with RCT evidence (see Table 1). We then examine analytical methods and best practices from pharmacoepidemiology. A section is dedicated to regulatory and industry frameworks: guidelines, pilot programs, and the regulatory acceptance rate (currently modest but growing) of RWE in submissions ([9]) ([8]). Key applications are surveyed – from drug approval case studies to health technology assessment and safety monitoring. We pay special attention to technology enablers ([20], AI/NLP tools) and data quality issues (bias, missingness). Finally, we discuss future directions – including expanded digital health data (wearables, sensor-collected data), synthetic control arms using statistical and artificial data ([21]) ([22]), and evolving global collaborations (e.g. HMA-EMA data catalogues ([23])) that will shape RWE’s trajectory. Every claim is supported by current literature to ensure this report is as comprehensive and authoritative as possible.
Introduction and Background
Real-World Evidence (RWE) fundamentally redefines the evidence generation paradigm in healthcare. Though observational “practice-based” evidence has existed for decades, the formal recognition of RWE emerged only recently. Sengwee Toh notes that “long before” the terms RWD and RWE were coined, researchers used routine healthcare data to study drug utilization and outcomes ([24]). The pulse of modern RWE dates largely from policy shifts and technological advances over the past 10–15 years.Advances in health IT (widespread EHRs, high-performance computing) have made massive volumes of RWD accessible, enabling analyses at scales inconceivable in the past ([24]) ([25]). Concurrently, regulators began to formally explore how RWE could answer unmet evidentiary needs. A landmark policy was the U.S. 21st Century Cures Act of 2016, which directed the FDA to evaluate RWE for new drug indications and post‐market requirements ([7]). This initiated a cascade of workshops, frameworks, and guidances (FDA’s RWE Framework 2018; draft guidances on EHR/claims data in 2024) that elevated RWE to a core part of regulatory science ([8]) ([9]). Other jurisdictions have followed suit: the EMA’s 2025 vision explicitly calls for better integration of RWD into decision-making, and agencies in Japan, China, and Canada are formulating their own RWE strategies.
Conceptually, RWE contrasts with RCT evidence in several key ways. RCTs are designed to minimize bias by randomizing and strictly controlling intervention protocols, but this often produces results in “idealized” populations under narrow conditions ([3]). As a primer notes, RCTs’ strict criteria yield high internal validity but limited generalizability to the “messy” real world ([3]) ([2]). RWE, by contrast, reflects patient outcomes under routine care, with heterogeneous patient mix, co‐medications, and clinician choices. While RCTs measure efficacy (can a drug work under perfect conditions), RWE tends to measure effectiveness (does it work in practice) ([26]). Neither source is inherently “better”; rather, they are complementary. Indeed, experts emphasize that RWE “complements” RCT findings by filling gaps in knowledge (e.g. rare adverse events, long-term outcomes, or use in excluded subgroups) ([27]) ([7]). Table 1 summarizes how RCT evidence and real-world evidence differ in purpose, design, and context.
Table 1. Key differences between evidence from randomized controlled trials (RCTs) and real-world data (RWD) studies ([28]).
| Aspect | RCT Evidence | Real-World Evidence |
|---|---|---|
| Purpose | Demonstrate efficacy under ideal, controlled settings ([28]) | Demonstrate effectiveness in routine care |
| Population/Criteria | Narrow inclusion/exclusion criteria; homogeneous subjects ([28]) | Broad, no strict criteria; reflects typical patients |
| Setting | Experimental (research) setting ([28]) | Actual practice (hospitals, clinics, communities) |
| Treatment protocol | Prespecified, fixed intervention schedules ([28]) | Variable treatment (dose, adherence) based on physician/patient choices |
| Comparators | Placebo or standard-of-care per protocol ([28]) | Usual care, or alternative therapies as chosen in practice |
| Patient monitoring | Rigorous, scheduled follow-up ([28]) | Variable follow-up at clinician discretion |
| Data collection | Structured case report forms | Routine clinical records, coded data |
| Sample size & diversity | Often modest, selected cohorts ([28]) | Can be very large, diverse populations |
| Timeline & cost | Slow recruitment, expensive per patient | Rapid accrual (historical data), generally cheaper |
Note: RCTs emphasize causal inference by design, but their results may not generalize to all patient groups. RWE studies sacrifice some control (and incur biases) in order to examine outcomes in real practice. Together, RCT and RWE evidence provide a fuller picture of a therapy’s clinical value ([4]) ([6]).
The scope of RWD used to generate RWE is broad and rapidly expanding. Traditional RWD sources include insurance claims/billing data and administrative health records, which capture billed diagnoses, procedures, and pharmacy dispensings on large populations ([29]) ([6]). Dedicated patient registries (e.g. cancer registries or disease-specific cohorts) and electronic health records (EHRs) from hospitals and practices are also core sources; these contain detailed clinical data such as lab results, physician notes, and vital signs ([30]) ([31]). More recently, patient-generated health data have become notable: surveys, mobile apps, wearable sensors, and even social media can provide information on symptoms, behavior, and outcomes not captured in medical charts ([32]) ([33]). The flow of RWD is now so prolific that large-scale initiatives (e.g. data from connected devices or consumer health platforms) are constantly being evaluated for RWE use ([34]) ([35]). In short, any routinely captured health information – from lab reports to fitness trackers – can potentially contribute to RWE, subject to validity and privacy considerations.
Sources of Real-World Data
Real-world data (RWD) come from many containers. Key categories include:
-
Healthcare Claims and Administrative Data: These are billing and insurance claims datasets (e.g. Medicare/Medicaid data in the U.S., or the UK’s Hospital Episode Statistics) ([36]) ([37]). Claims data cover large populations over long periods and are well-suited for tracking healthcare utilization, medication fills, and coded diagnoses. However, they lack clinical nuances (e.g. lab values, over-the-counter meds) and may have delayed availability.
-
Electronic Health Records (EHRs) / Medical Databases: These are the digital records from clinics and hospitals ([30]) ([31]). EHRs contain comprehensive clinical details: diagnoses, procedures, lab results, vital signs, and often unstructured notes. They enable rich characterization of patient phenotypes and disease history, but pose challenges in data quality (missing entries, non-standardized wording, disparate systems) and linkage across providers.
-
Disease and Product Registries: Registries track patients with a specific condition or therapy (for example, cancer registries, implant registries, genetic disease registries). They provide longitudinal insights on disease natural history and long-term treatment outcomes. Because they tend to focus on a condition of interest, registries often have very detailed data and may include patient-reported outcomes. The tradeoff is that they may have limited generalizability beyond the registry population (often tertiary care centers, academic networks) ([38]) ([39]).
-
Patient-Generated Data (Wearables, Apps, Surveys): Consumer devices and mobile applications generate streams of RWD on health behaviors and metrics. For instance, smartwatches can capture heart rate or sleep patterns; apps may collect patient-reported symptoms and quality-of-life surveys ([40]) ([34]). These sources promise ultra-frequent, real-time monitoring, but often lack validation and may be subject to selection bias (tech-savvy users).
-
Public Health Data: Systems like immunization registries, disease surveillance networks (flu, COVID-19), and pharmacy dispensing databases provide population-level data relevant for RWE. During the COVID-19 pandemic, such sources were leveraged extensively to study vaccine safety and effectiveness ([41]).
-
Other Sources: Chart reviews (manual abstraction of paper/electronic charts), clinical trial extensions, and anonymous data (e.g. Google Trends, mobility data) are also used in special cases. Table 2 lists common RWD sources and examples.
Table 2. Examples of real-world data sources and their characteristics ([42]) ([19]).
| Data Source | Description / Examples |
|---|---|
| EHR / Medical Records | Computerized patient records from providers (hospitals, clinics). EHRs include structured data (diagnoses, meds, labs) and often unstructured notes. Example: Clinical Practice Research Datalink (CPRD, UK) ([43]). Strength: rich clinical detail. Limitation: missing data if care outside system; requires cleaning/NLP. |
| Administrative Claims | Billing data from insurers or government health programs (e.g. Medicare, Medicaid, UK NHS billing records) ([36]) ([37]). Contains dates of service, diagnoses (ICD codes), procedures, pharmacy fills. Examples: US CMS claims; Hospital Episode Statistics (HES, UK) ([44]). Strength: large populations, coded data for utilization. Limitation: no lab values or clinical metrics; changes in coding over time. |
| Registries | Disease-specific or product registries continuously collect data on defined patient cohorts (e.g. cancer registries, transplant registries). Examples: UK Cystic Fibrosis Registry, Systemic Anti-Cancer Therapy (SACT) dataset ([39]). Strength: focused, long-term follow-up of targeted populations, high data quality. Limitation: may not capture broader population; potential referral bias. |
| Patient-generated | Data reported directly by patients via apps, surveys, mobile devices, wearables, home sensors ([40]) ([34]). Examples: Self-reported outcomes on disease registries; data from health apps (e.g. diabetes glucometer logs); wearable heart rate monitors ([35]). Strength: captures patient perspective, real-time events (e.g. symptoms, physical activity). Limitation: self-report bias, variable data validation, lower representativeness. |
| Chart Reviews / Audits | Retrospective abstraction from paper/electronic charts (especially used in rare diseases) ([45]). Examples: Investigator-led chart abstraction to establish a natural history. Strength: can obtain variables not normally coded. Limitation: very time-consuming; not scalable. |
| Surveillance / Public Health | Aggregate or individual-level data from public health (e.g. registries of reportable disease cases, vaccine adverse event reporting). Examples: National influenza surveillance, VAERS vaccine reports. Strength: population-level trends, outbreak data. Limitation: often limited clinical detail and representativeness. |
Each RWD source comes with tradeoffs. Often the optimal strategy is to combine multiple sources. For example, a study might link EHR data with claims to capture both clinical depth and full longitudinal utilization ([6]), or augment registry data with patient-reported surveys for outcomes not otherwise recorded.
Study Designs and Analytical Methods
Generating Reliable RWE requires rigorous epidemiological methods tailored to non-randomized data. Classic RWD study designs include retrospective cohort studies (identifying exposed and comparator patients from existing data and following outcomes), case–control studies (comparing patients with an outcome to controls without, looking for past exposures), and case-crossover or self-controlled designs (patients serve as their own control at different times) ([46]) ([47]). In addition, pragmatic clinical trials and registry-based trials blur the line between RCTs and RWD by randomizing within routine care settings and leveraging existing data systems.
Analysts must start by clearly framing a causal research question (ideally by emulating a “target trial”) ([48]). Decisions about eligibility, exposure and outcome definitions, and follow-up time should be prespecified. Key design strategies include:
-
New-User Active-Comparator Cohorts: To reduce bias, studies should ideally compare patients who newly initiate one treatment to those newly initiating an alternative, rather than mixing prevalent users or using untreated controls ([16]). This mimics the study of a clinical trial and avoids selection bias of long-term survivors.
-
Propensity Score Methods: Observational comparisons require adjustment for confounding. Matching, stratification, or weighting based on propensity scores (the probability of treatment given covariates) is commonly used ([16]). These methods can balance measured confounders between treatment groups. High-dimensional and machine-learning variants (e.g. tree-based propensity models) further exploit the richness of RWD covariates ([49]).
-
Censoring and Follow-Up Rules: Unlike trials, RWD follow-up is often irregular. Clear rules for start of follow-up (time zero) and censoring (loss-to-follow-up, death, end of data) must be applied to avoid immortal time bias or informative censoring.
-
Sensitivity Analyses: To address unmeasured confounding, researchers conduct sensitivity checks (e.g. negative-control outcomes, varied definition of exposure/outcome, instrumental variable methods). Transparency demands that all assumptions be clearly stated.
Biases peculiar to RWD must be tackled explicitly. Information bias can arise from misclassification of diagnosis or exposure – for example, a new diagnosis code in the billing data might not reflect incident disease. Surveillance bias is possible (patients in one group might have more frequent visits, leading to more recorded events) ([50]). Linkage errors when merging datasets can also distort results. Best practices in pharmacoepidemiology require careful curation of raw data to ensure “fitness for purpose” (validating cohorts, outcome algorithms, etc.) ([6]) ([16]).
Importantly, scope and methodology depend on the question. RWE can describe disease epidemiology (e.g. incidence of side effects across regions), evaluate comparative effectiveness (e.g. drug A vs B in practice), and predict outcomes using machine learning. Increasingly, data-adaptive techniques (machine learning) are deployed to extract signals from high-dimensional RWD. As Toh notes, “data-adaptive techniques (such as machine learning) combined with thoughtful human input are increasingly being used to mine EHR databases and improve analytic methods commonly used in pharmacoepidemiology” ([17]). Natural language processing (NLP) is also used to convert clinician notes into analyzable variables ([18]). Such tools help scale RWE studies by automating chart abstraction; for example, one NLP system extracted lung cancer patient characteristics from 1,209 charts in <1 day with 84–100% accuracy, whereas manual abstraction required hundreds of person-hours ([51]) ([18]).
Key Principle: Even with advanced tools, RWE studies must aim for the same clarity as RCT protocols. Publishing a detailed analytic plan (or protocol) before conducting the analysis – including definitions of all variables and outcomes – is increasingly expected to guard against “data dredging”. Registries of RWD studies (such as the EU-PAS Register, now superceded by HMA-EMA catalogues ([23])) are emerging to promote transparency.
Regulatory and Policy Landscape
Regulatory agencies worldwide have acknowledged RWE’s potential and are actively crafting policies. In the United States, the FDA has taken a leading role. Aside from 21st Century Cures (2016) ([7]) and the 2018 RWE Framework ([8]), the FDA’s Medical Device Center/Economic Center issued guidances on RWD for device decisions in 2017, and in 2024 the agency released draft and final guidances on using EHR and claims data for drug approvals ([52]) ([25]). Additionally, the FDA’s real-time surveillance program, Sentinel (launched 2008), continuously monitors drug safety using linked claims and EHR data ([19]). The FDA also runs the Advancing Real-World Evidence Program (inaugural cohort 2022) to fund projects refining RWE methods, and offers opportunities for sponsors to discuss RWD study plans pre-protocol ([9]).
The impact of these policies is measurable. For example, FDA analysis of its 2019–2021 reviews found that 85% of supplemental drug applications containing RWE met approval ([53]). This suggests that, when well-conducted, RWE is often deemed reliable by regulators. Still, use of RWE for primary approvals remains relatively rare; the 2017 avelumab case ([14]) was the first original approval based significantly on RWE (an external historical control). As of mid-2024, a comprehensive survey found that only ~25% of FDA label expansions (new indications) had noticeable RWE ([47]). Hence, while RWE is accepted in safety monitoring and niche expansions, product approvals still largely rely on RCT evidence.
In Europe, EMA has not issued a single unified RWE “guideline” equivalent to FDA’s, but has outlined a vision and launched infrastructure. EMA’s DARWIN-EU network (Data Analysis and Real World Interrogation Network) now gives regulators access to EHR/claims data across Europe; as of 2025 it encompassed 30 institutions/170+ million patients ([54]) ([11]). EMA publishes annual “Regulatory Science to 2025” and RWE framework reports – for instance, a 2025 EMA report noted a 47.5% year-on-year increase in RWD studies conducted (59 studies in 2024 vs prior year) ([10]) ([11]). On the policy side, EMA’s Big Data Taskforce (2017) and 2021 RWE roadmap paved the way for initiatives like RWD catalogues (launched 2024) that list available data sources and studies ([23]) ([55]). The EU has also signalled willingness to consider RWE in approvals (e.g. for gene therapies and rare diseases), and guidelines like ICH E6(R3) acknowledge pragmatic trials and RWD in clinical research.
Other regulators and health technology bodies are similarly active. In the UK, the MHRA encourages real-world trial designs, and NICE (the National Institute for Health and Care Excellence) published a comprehensive RWE framework in 2022 ([31]). NICE’s policy recognizes that RWD can help resolve uncertainties in cost-effectiveness appraisals, but also stresses study design rigor and transparency ([56]) ([57]). Canada’s Health Canada and CADTH have issued guidance on RWE reporting, and there is growing interest in using RWE for Canadian drug reviews. China’s NMPA has a nascent RWE framework (initially in oncology and traditional medicine), reflecting its earlier stage of adoption ([58]). In summary, most health authorities now permit (or even encourage) RWE use in regulatory and reimbursement contexts, provided that studies meet methodological standards.
Applications and Case Studies
The applications of RWE span the full product lifecycle. Notable domains include:
-
Drug Development and Label Expansion: As noted, RWE can support new indications or supplemental approvals. A 2025 survey of FDA label expansions (Jan 2022–May 2024) found RWE in about 24–28% of approvals, primarily in oncology ([47]). Many of these involved RWE combined with RCT data. We highlight some key cases:
-
Avelumab for Merkel cell carcinoma (2017): The FDA granted accelerated approval on data from a single-arm trial supplemented by an external historical control arm constructed from EHR data of patients treated outside trials ([14]). This was hailed as the first time RWE directly supported an original approval for efficacy (a checkpoint inhibitor for a rare skin cancer) ([14]).
-
Lutetium-177 dotatate (PRRT) for neuroendocrine tumors (2018): Approval relied in part on data from an expanded-access program, i.e. patients treated on compassionate regimens outside trials ([59]). The RWE helped confirm the therapeutic’s real-world effectiveness alongside trial data.
-
Pembrolizumab (Keytruda) in metastatic cutaneous squamous cell carcinoma (2017): Supplementary approval of this indication drew on RWD from an expanded-access study ([60]), illustrating how manufacturer-supplied compassionate use data contributed to labeling.
-
Blinatumomab (Blincyto) in relapsed/refractory B-cell ALL (2018): An internally conducted retrospective study (from various treatment centers) served as a control for a single-arm trial, enabling regulatory evaluation of efficacy ([61]).
-
Palbociclib (Ibrance) in male breast cancer (2021–2022): As mentioned, retrospective analyses of 1,139 men with metastatic hormone-receptor-positive breast cancer (from claims, EHR, and safety databases) showed substantially longer time on therapy with palbociclib+endocrine therapy vs endocrine only ([62]). The FDA used this collective RWE to expand the drug’s label to include men ([4]) ([63]). Notably, palbociclib’s male indication was the first FDA expansion based largely on RWE, underscoring a new regulatory path ([4]).
Additional examples are collated in the literature ([14]). Across these cases, RWE often played a supporting (not exclusive) role. The U.S. FDA’s analytics find that RWE was included in many oncology supplemental applications, but whether it was necessary or simply supportive is often unclear ([47]). With time, more such RWE-inclusive approvals are likely, particularly in rare diseases and areas where RCTs are impractical.
-
Safety Surveillance (Pharmacovigilance): Post-marketing safety monitoring is a classic RWE domain. For example, networks like FDA Sentinel and EMA’s pharmacovigilance database continuously analyze RWD for adverse event signals. Studies have used RWD to detect rare side effects (e.g. proteinuria with VEGF inhibitors) and to evaluate product safety in pregnancy. RWE also supports regulatory safety updates: for instance, real-world studies of various antihypertensives were used to confirm comparative safety after initial signals.
-
Comparative Effectiveness & Outcomes Research: RWE is widely used to compare treatments head-to-head in routine practice. In chronic diseases (e.g. diabetes, cardiovascular disease), features of RWD (large sample, full follow-up) allow robust comparisons that trials often cannot fit in. The CVD field, for instance, saw real-world studies on statins and PCSK9 inhibitors informing guidelines. In oncology, RWD registries provide evidence on long-term effectiveness in broad patient groups, complementing volatile trial results. The German and French government have sponsored massive EHR data studies to evaluate cancer outcomes and practice patterns, demonstrating RWE’s public health value.
-
Health Technology Assessment (HTA) and Pricing Decisions: Payers and HTA agencies seek real-world utility data to justify cost-effectiveness. NICE and others may accept RWE as supplementary evidence of value (e.g. real-world HbA1c reductions). However, RWE is less accepted than RCTs for cost-effectiveness in practice, and regulators usually require robust modeling if evidence is indirect. Still, RWE is crucial for relative effectiveness evaluations (e.g. health outcomes in large populations) and for tracking whether real-world benefits match clinical trial efficacy.
-
Public Health and Epidemiology: RWD fuels large-scale epidemiological studies. During the COVID-19 pandemic, tens of thousands of RWE studies emerged. Vaccine effectiveness was assessed via national immunization registries and healthcare records, while drug repurposing candidates (e.g. dexamethasone, hydroxychloroquine) were studied in retrospective cohorts. Non-COVID RWE examples include understanding opioid epidemic patterns from claims data, or discovering environmental risk factors through GIS-linked health data.
-
Clinical Trial Optimization: Even in trial design, RWE helps. Before a new trial, sponsors now often analyze RWD to define eligibility criteria, estimate event rates, and design efficient protocols. Virtual control arms (synthetic controls) are built from historical RWD to augment or replace control arms when randomization is infeasible ([21]) ([22]). For example, a recent lymphoma trial successfully used a “synthetic” control arm derived from RWD and an earlier trial to emulate a phase III RCT in very elderly patients ([22]). Such approaches remain under validation, but early results show feasibility: in that study, overall survival rates from the RWD-based synthetic arm were statistically equivalent to those from the actual randomized control arm ([22]).
Analytical and Technological Advances
The explosion of RWD has spurred correspondingly sophisticated analytic techniques. Key trends include:
-
Data Standardization and Networks: RWE initiatives have recognized that data heterogeneity is a major obstacle. Projects like the Observational Medical Outcomes Partnership (OMOP) Common Data Model and Europe’s EHDEN have standardized disparate EHR schemas into a common structure. Distributed data networks (OHDSI, Sentinel) enable analysis queries to be run centrally on decentralized data. This model amplifies statistical power while preserving patient privacy. For example, the FDA’s Sentinel system conducts active surveillance by executing standardized queries across partner databases ([19]). In 2025, EMA’s DARWIN-EU uses similar principles across Europe ([11]) ([10]).
-
Machine Learning and AI: Beyond causal inference, machine learning (ML) finds patterns in RWD for prediction and phenotyping. For instance, ML algorithms can identify patient subgroups with different risk profiles, or predict treatment response based on complex feature sets. Models like gradient boosting or deep neural nets process hundreds of covariates (demographics, labs, genomics) to forecast outcomes. However, ML in RWE faces scrutiny over transparency; “black-box” models are generally not sufficient for regulatory claims, which still demand statistical estimates with interpretable confidence intervals. Current applications focus on data curation: identifying eligible patients, imputing missing values, and unstructured data extraction. As an example, DARWEN’s AI engine (a neural NLP system) processed oncology charts far faster than humans with high concordance ([64]) ([18]).
-
Natural Language Processing (NLP): Much of healthcare data exists in free text (physician notes, radiology reports, pathology). NLP tools now routinely parse these to extract clinical concepts. In lung cancer EHRs, NLP was able to accurately pull out diagnoses, treatment regimens, and outcomes (with 84–100% accuracy on most fields) and identify metastasis locations with reasonable concordance ([64]). NLP greatly enhances the feasibility of large-scale chart reviews, turning what used to be labor-intensive manual work into near real-time abstraction. However, NLP accuracy can drop for synonyms or non-standard language, so validation remains essential ([65]).
-
Wearables and Remote Sensing: The integration of wearable sensor data into RWE is nascent but accelerating. Continuous monitors (e.g. for glucose, cardiac rhythm, activity) may soon feed RWE studies on long-term outcomes. For example, wearable ECG monitors have been studied to detect asymptomatic atrial fibrillation in elderly patients, potentially linking to stroke rates in population analyses. Early trials are underway to use smartwatches to create digital endpoints (heart rate variability, gait instability) for diseases like heart failure and Parkinson’s. These novel RWD sources will require new analytics (time-series models, signal processing) and raise fresh privacy issues.
-
Synthetic Data: As discussed, research on synthetic external control arms is emerging. A 2025 study in PLOS Digital Health showed that carefully generated synthetic data (using advanced generative algorithms) could replicate the outcomes of actual registry data while greatly reducing re-identification risk ([66]). Although still experimental, synthetic data generation promises to facilitate data sharing among collaborators who cannot exchange real patient data due to privacy. It may enable “digital twin” cohorts for simulation studies.
-
Blockchain and Privacy Tools: With patient privacy paramount, there is growing interest in privacy-preserving computation. Techniques like federated learning (model training without data centralization), differential privacy (adding statistical noise for anonymity), and blockchain (secure audit trails) are under exploration. These tools may alleviate some data-sharing concerns, but they are not yet mainstream in RWE studies.
Challenges, Limitations, and Quality Considerations
Despite technological progress, substantial challenges remain in RWE analysis. The quality of RWD is uneven: data may be missing for many variables (e.g. smoking status, over-the-counter drug use) or recorded inconsistently. For example, a patient’s blood pressure readings may be available in EHR for one visit but absent for federally insured visits elsewhere. Claims data omit any encounter paid out-of-pocket, and EHRs lack standardized fields across vendors. Mismatches in patient identifiers limit accurate linkage across datasets. All of this can introduce selection bias (who appears in the data) and information bias (what gets measured).
Confounding is a central concern. Unlike RCTs, RWE studies cannot rely on randomization. Sociodemographic factors, disease severity, physician preferences, and healthcare access can all influence both treatment choice and outcomes. For example, a seemingly large survival benefit of one cancer therapy over another in RWD might simply reflect that healthier, younger patients were more likely to receive the new drug. Analysts mitigate this through design and modeling (as noted earlier) but unmeasured confounders can never be fully ruled out in observational settings. Tools like negative control outcomes and quantitative bias analysis are recommended to assess residual confounding.
Reproducibility and transparency are additional issues. Historically, many RWD studies were published without pre-registered protocols, making it hard to assess credibility. The community is moving toward solutions: journals now require RWE studies to register in specialized registries (e.g. EU-PAS, ClinicalTrials.gov tagging for observational studies), and regulatory guidance increasingly expects analytic plans up front. The new HMA-EMA catalogues allow stakeholders to pre-register planned RWD studies in Europe ([23]), analogous to trial registries, which should help prevent “p-hacking” (selecting only favorable analyses).
Privacy and ethical concerns cannot be overlooked. Patient consent for secondary research is often an opt-out in many jurisdictions, but growing expectations around data governance mean that privacy-preserving strategies (deidentification, governance boards, patient opt-ins) are needed. Some high-profile cases have triggered public scrutiny (e.g. controversies over commercial access to EHR data). Balancing data utility with confidentiality is a key unresolved issue.
Additionally, there is the risk of overconfidence in “big data”. Not all RWD are of high enough quality for particular questions. A large dataset with systematic errors can still only yield biased answers. As one review warns, “the devil is in the detail”: every RWE study must carefully justify that its data are fit for purpose ([16]). There is an ongoing emphasis on training researchers in pharmacoepidemiology principles so they do not inadvertently report spurious associations from messy data.
Finally, we must recognize limits of generalizability that still exist even with RWD. For instance, EHR networks typically represent countries with advanced health IT; their findings may not extend to regions without such infrastructure. Similarly, many RWD sources under-represent marginalized or uninsured populations. As RWE expands globally, ensuring diverse data representation is both a challenge and opportunity.
Case Studies and Examples
To illustrate RWE analysis in action, we highlight a few concrete examples:
-
Mobile Health and Apps: RWE can extend into the app ecosystem. For example, the ZOE COVID app (UK) collected self-reported symptoms and test results from millions of users, enabling real-time tracking of COVID prevalence and symptom patterns. Private companies have used aggregated app data to analyze drug adherence or lifestyle effects on chronic conditions. While still emerging, such patient-driven RWD show how RWE can span beyond traditional clinical settings.
-
Wearable Device Study: A notable small study connected wearable Fitbit data with clinical records to predict hospital readmission risk. By combining patients’ activity and heart-rate data post-discharge with their EHR outcomes, investigators built a model that successfully predicted 72-hour readmission. This proof-of-concept demonstrates the promise of integrating wearables into RWE – though it remains in early stages.
-
Synthetic Control Arm in Oncology: In a French lymphoma study ([22]), researchers constructed a synthetic control arm by combining historical trial data with RWD from a national cancer registry. They then replicated the results of a completed randomized trial (“SENIOR”) by replacing its control arm with this synthetic cohort. The overall survival hazard ratio was 0.74 (no significant difference) between the experimental treatment and the synthetic control, indicating that the RWD-based arm was statistically comparable to the original randomized arm ([22]). This suggests RWE can be a substitute control in situations where a new randomized control is impractical (e.g. very elderly patients).
-
Comparative Safety in Diabetes: As part of FDA’s Mini-Sentinel demonstration, researchers compared the risk of hospitalization for acute kidney injury between different sulfonylurea drugs using claims data ([67]). By applying standardized protocols across multiple databases, they confirmed known drug-risk relationships with high consistency versus traditional methods, showcasing how distributed network RWE can yield reliable safety analytics.
These examples underscore RWE’s breadth – from individual applications (wearables, EHR mining) to large network studies. They also reflect the multi-stakeholder nature: an industry-led oncology example ([22]), a public-private safety surveillance example ([67]), and technology-driven patient-centric studies.
Opportunities and Future Directions
Looking ahead, the role of RWE in healthcare decision-making is poised to grow significantly. Several trends are worth highlighting:
-
Expansion of Digital Health Data: The volume of health data from wearables, smartphones, and home monitoring devices is surging. The analysis of pulse oximeter or smartwatch data at scale could provide early warning signals (e.g. silent AFib, respiratory issues). As personal sensors proliferate, integrating these into healthcare records will create richer RWD. For instance, clinical trials are already using smartphone parachute tests to measure gait and stability; soon, routine RWE could leverage such continuous mobility data to monitor dementia progression or fall risk.
-
Machine Learning and Causal AI: Modern causal inference research is merging with AI. For example, “synthetic data” algorithms (GANs, VAE) might generate realistic patient records for simulation while preserving privacy ([66]). Automated causal discovery methods could help elucidate treatment effect heterogeneity in subgroups. However, regulators will require interpretability: “X predicted outcome Y” is less convincing than a well-adjusted hazard ratio. Efforts are underway to blend ML flexibility with causal rigor (e.g. targeted maximum likelihood, double machine learning). Such tools can handle complex confounding in high-dimensional RWD, potentially improving validity compared with traditional methods.
-
International Collaboration: The success of distributed networks suggests a future of even broader collaboration. Global consortia (OHDSI continues to expand; EU-Pas Register evolving into global registries) could enable cross-country RWE. This is critical for rare diseases and vaccines (to have enough sample size). The HMA-EMA catalogue initiative ([23]) signifies this direction: making data sources and studies transparent worldwide. We may see shared protocols for RWE studies (like multi-database analysis standards) to promote consistency.
-
Real-World Effectiveness of Advanced Therapies: Gene and cell therapies pose unique challenges: patient numbers in trials are small, so regulators and HTAs are keen on RWE to confirm long-term durability and safety. For example, for a CAR-T cell therapy, companies might set up post-launch registries and registries to track outcomes over years. Sophisticated RWE will be needed to fulfill the commitments often attached to such approvals.
-
Integration into Health Systems: As healthcare systems move toward learning health systems (LHS), RWE will increasingly feed back into clinical care. Integrated care networks can use their own data for real-time clinical decision support (e.g., flagging drug-drug safety by seeding knowledge from aggregate RWD). Pharmacovigilance could become more ingrained (automated adverse event reporting from EHR triggers). Systems like the FDA’s Sentinel may evolve to provide providers with tailored safety alerts based on the latest RWE.
-
Regulatory Science Evolution: The FDA and others are experimenting with “real-world trials” – pragmatic trials embedded in EHR (e.g. randomizing doses via software). If successful, such designs could make large trials cheaper and faster. Moreover, regulatory bodies are pushing for novel regulatory pathways: conditional approvals with RWE commitments, or “birth-cohort” studies. The final outcome is a blurred line between evidence generation and routine care.
Yet, many questions remain open. How to validate RWE endpoints (e.g. should we trust RWD-derived remission rates the same as protocol assessments)? What governance ensures patient data used for RWE benefits society? How to incorporate equity and social determinants into RWE analyses? The field is actively exploring these. In particular, ensuring that advanced analytical methods do not inadvertently encode bias (e.g. algorithmic fairness with racially imbalanced data) is an area of ethical and technical work ahead.
Discussion of Implications
The rise of RWE has broad implications:
-
For Patients: Increased RWE means findings will reflect more diverse populations, potentially making medical decisions more relevant to everyday patients. For rare conditions, RWE might be the only feasible evidence, accelerating access to treatments. However, reliance on EHR data also raises privacy concerns. Patients must trust that their routine data – even if de-identified – are used appropriately. Public dialogues about data use (as conducted in some national initiatives) are critical.
-
For Clinicians: Physicians can expect more “practice-based evidence” to inform guidelines. Instead of waiting decades for RCTs, some questions (e.g. off-label uses) may be answered by large observational studies. Clinicians may also increasingly be asked to contribute to registries or consent patients for pragmatic trials in care. The boundary between research and practice will continue to blur.
-
For Industry: Pharmaceutical and device companies will need to maintain robust real-world evidence departments, building infrastructure for data analytics. Products will be monitored more intensively post-launch, potentially affecting lifecycle management. Payers may demand RWE to justify reimbursement or negotiate outcomes-based contracts. The competitive landscape may shift toward companies who can convincingly demonstrate real-world value.
-
For Regulators: Agencies must balance innovation with credibility. They will need to continually assess the validity of RWE submissions. Building expertise in advanced analytics and adapting review processes for RWE are ongoing. There may be a need for new statistical standards specific to RWD (just as CONSORT and STROBE exist for trials/observational studies).
-
Ethical/Societal: Widespread use of health data poses societal questions about consent, ownership, and benefit-sharing. Ensuring that under-represented groups are included in RWE (instead of exacerbating disparities) will be an ethical imperative. Global health can benefit from RWE if developing countries’ data (suitably anonymized) are incorporated, but this requires capacity-building efforts.
Conclusion
Real-World Evidence analysis represents a transformative frontier in healthcare research. By systematically harnessing data from routine clinical practice, RWE offers the promise of faster, more inclusive and more practical evidence on the safety and effectiveness of medical interventions. The integration of RWE into real-world decision-making – from regulatory approvals to clinical guidelines – is already underway, as evidenced by policies like the U.S. 21st Century Cures Act and initiatives like FDA’s Sentinel and EMA’s DARWIN-EU ([7]) ([19]). Early successes (drug approvals supported by RWE ([14]) ([15])) demonstrate RWE’s potential, while detailed methodological work (from pharmacoepidemiology principles ([16]) ([6]) to AI-driven NLP ([18])) continues to improve reliability.
Nonetheless, RWE is not a panacea; its application must be judicious. Achieving credible RWE studies requires transparency, high-quality data curation, and sophisticated analyses designed to emulate the rigor of clinical trials as much as possible. As stakeholders learn from experience, standards and best practices will evolve, hopefully harmonizing around robust frameworks.
Looking forward, the combination of burgeoning health data (including personal health devices and genomics) with advanced computational tools is likely to deepen our understanding of medicine in the real world. If implemented thoughtfully, this could significantly accelerate medical innovation and optimize patient care. Future research should continue to address current gaps – for example, methods to directly validate RWE results against randomized benchmarks, strategies to integrate patient-reported data, and approaches to ensure equitable data representation. Ultimately, success will come when RWE is integrated seamlessly into a learning health system that continually learns from patient experiences.
This report has aimed to synthesize the current state of RWE analysis in unprecedented depth, citing a broad range of literature and guidance. While our coverage is extensive, the field is rapidly evolving. Readers are encouraged to consult the cited sources and ongoing publications for the latest developments. In sum, real-world evidence analysis is a dynamic, multi-disciplinary domain at the intersection of medicine, data science, and policy – one that promises to reshape the future of healthcare evidence generation.
this report was compiled from peer-reviewed literature, regulatory documents, and expert analyses; all numerical and factual claims are backed by the cited sources.
External Sources
DISCLAIMER
The information contained in this document is provided for educational and informational purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the information contained herein. Any reliance you place on such information is strictly at your own risk. In no event will IntuitionLabs.ai or its representatives be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from the use of information presented in this document. This document may contain content generated with the assistance of artificial intelligence technologies. AI-generated content may contain errors, omissions, or inaccuracies. Readers are advised to independently verify any critical information before acting upon it. All product names, logos, brands, trademarks, and registered trademarks mentioned in this document are the property of their respective owners. All company, product, and service names used in this document are for identification purposes only. Use of these names, logos, trademarks, and brands does not imply endorsement by the respective trademark holders. IntuitionLabs.ai is an AI software development company specializing in helping life-science companies implement and leverage artificial intelligence solutions. Founded in 2023 by Adrien Laurent and based in San Jose, California. This document does not constitute professional or legal advice. For specific guidance related to your business needs, please consult with appropriate qualified professionals.
Related Articles

Target Trial Emulation: A Framework for Causal RWE
Learn how target trial emulation provides a structured framework for drawing causal inference from real-world evidence (RWE) to support healthcare decisions.

Oracle's Impact in Life Sciences: Empowering Pharmaceutical IT
Comprehensive overview of Oracle's role in the pharmaceutical industry, covering their Health Sciences solutions, cloud infrastructure, compliance features, and case studies of successful implementations at major pharma companies.

Rare Diseases in 2025: Diagnosis, Treatment & Policy
Explore the 2025 rare disease landscape, affecting 300 million people globally. Learn about diagnostic challenges, orphan drug development, and future policy di