IntuitionLabs
Back to ArticlesBy Adrien Laurent

FDA Draft Guidance on AI in Drug Development Explained

Executive Summary

In January 2025, the U.S. Food and Drug Administration (FDA) issued its first draft guidance on the use of artificial intelligence (AI) in the development of drugs and biological products ([1]). Titled “Considerations for the Use of Artificial Intelligence To Support Regulatory Decision-Making for Drug and Biological Products” (FDA-2024-D-4689), the guidance establishes a risk-based, 7-step credibility framework for sponsors to assess and document the trustworthiness of AI models in a specific context of use (COU) ([2]) ([3]). Critically, the guidance expressly excludes certain AI applications: specifically, (1) AI used in drug discovery, and (2) AI used to streamline internal operations (for example, drafting a regulatory submission) when such use “does not impact patient safety, drug quality, or the reliability of results” ([4]) ([3]). In effect, the guidance focuses only on AI tools that generate data or insights directly informing regulatory decisions about a drug’s safety, effectiveness, or quality.

This comprehensive report analyzes the FDA draft guidance in detail. We discuss the historical context of AI in drug development and regulatory submissions, outline the scope and key provisions of the guidance (including in-scope and out-of-scope applications), and unpack the 7-step credibility framework that sponsors must follow. We draw on FDA materials, industry analyses, and expert commentary to present multiple perspectives. The report includes case scenarios (both hypothetical and real-world) to illustrate how AI is being used today in pharmaceutical R&D and manufacturing, and how sponsors should prepare to meet the new FDA expectations. We also examine the implications of the guidance for industry innovation, regulatory strategy, and patient safety, and consider future directions as AI technologies advance. All claims are substantiated with citations to the guidance, FDA announcements, regulatory analyses, and relevant literature.

Introduction

Artificial intelligence and machine learning (AI/ML) technologies are transforming many aspects of healthcare and drug development. Over the past decade, sponsors have increasingly incorporated AI tools to analyze complex datasets, optimize clinical trials, and predict patient outcomes. These tools can accelerate drug discovery, improve manufacturing processes, and generate real-world evidence from health records. However, AI’s rapid rise also raises new regulatory challenges. Chief among these is credibility: regulators must be able to trust that an AI model’s output is accurate, reliable, and appropriate for its intended regulatory role. Without clear guidance, companies have faced uncertainty about what AI information the FDA will accept, and what documentation will be required in submissions.

The FDA has recognized the urgent need for clarity. In January 2025, FDA Commissioner Robert Califf announced a draft guidance to address “the use of artificial intelligence (AI) intended to support a regulatory decision about a drug or biological product’s safety, effectiveness or quality” ([1]). This was the agency’s first guidance on AI in drug development. The guidance provides non-binding recommendations on how to establish and demonstrate an AI model’s credibility for regulatory decision-making. It builds on FDA’s previous experiences: over 500 drug or biologics submissions with AI components have been received by the FDA since 2016, particularly in areas like oncology and neurology ([5]). As expected, FDA has seen an exponential increase in AI uses in submissions since 2016 ([6]), and has collaborated across its centers (CDER, CBER, CDRH, CVM, OCP, and OII) to craft a unified framework.

Key points of the FDA draft guidance include:

  • Scope (Included Uses): AI applications across the drug lifecycle are covered if their outputs produce data or information intended to support regulatory decisions about a product’s safety, efficacy, or quality ([4]). This encompasses uses in nonclinical research, clinical trials, manufacturing, and post-market surveillance. For example, using AI to select patients for trials, optimize dosage based on model-informed pharmacology, detect safety signals in real-world data, or control manufacturing quality would fall under the guidance ([7]) ([8]).

  • Scope (Excluded Uses): The guidance does not apply to certain AI uses that do not directly affect a regulated output. Specifically, it excludes AI used for drug discovery and AI used to streamline internal operations (such as drafting regulatory documents, managing workflows, or allocating resources) when those tasks do not impact safety, drug quality, or study results ([4]) ([3]). In practical terms, routine use of AI for productivity (e.g. using a language model to write a report or an AI to schedule lab experiments) remains outside FDA regulation. Only AI whose outputs directly feed into a regulatory decision (for instance, recommending which patient subgroup has therapeutic benefit) is covered.

  • Risk-Based Credibility Framework: The guidance introduces a 7-step framework for establishing the credibility of an AI model in its specific context of use ([9]) ([10]). “Context of use” (COU) refers to the precise role of the model – what question it addresses and how its output will be used. The steps range from defining the question and COU, to assessing model risk, developing and executing a credibility assessment plan, and finally determining if the model is adequate for its use ([11]). The framework requires sponsors to systematically document data sources, training/testing procedures, validation activities, and any post-deployment monitoring, all commensurate with the model’s risk.

  • Regulatory Interaction: The draft guidance encourages early and ongoing engagement with FDA. Sponsors are urged to meet with the agency before using a high-risk AI model in a critical decision, to align on expectations for the credibility plan ([12]) ([13]). The FDA does not necessarily require submission of all credibility documentation, but sponsors must decide whether and when to share the plan and evidence with the agency.The agency will consider how the AI was validated when reviewing applications (INDs, NDAs/BLAs, etc.) containing AI-derived data, and may request further justification if needed.

This guidance is situated in a broader policy context. The European Medicines Agency (EMA) issued its own Reflection Paper on AI in drug development in September 2024 ([14]). While FDA focuses explicitly on credibility and risk in decision-making, EMA’s approach also addresses principles and good practices for AI use across the lifecycle ([14]) ([15]). Industry experts note that “regulatory clarity is one of the top three barriers” to wider AI adoption in pharma ([16]), so the FDA guidance is seen as a step toward removing uncertainty. Meanwhile, AI-enabled medical devices have been governed under separate FDA policies; indeed FDA issued a device AI draft guidance on the same day (Jan 7, 2025) ([17]). Taken together, these developments signal that the FDA is integrating AI into its regulatory oversight, treating it as an increasingly important part of the “regulated stack” for drug products.

This report provides an in-depth analysis of the FDA’s AI-in-drug-development draft guidance. We begin by detailing the historical and regulatory background of AI in life sciences. We then dissect the guidance’s contents: scope, 7-step framework, risk and credibility requirements, and affected stakeholders. We intersperse discussion with case examples (both hypothetical and actual) to illustrate how the guidance applies. We compare the FDA’s approach with international perspectives and discuss emerging trends. Throughout, we cite extensive evidence from FDA documents, legal and industry commentaries, and relevant scientific literature to support our examination. The goal is to give readers a complete, authoritative understanding of this guidance and its implications for the future of AI in drug development.

Historical Background and Context

The pharmaceutical industry has long grappled with high costs and long timelines for bringing new drugs to market. Recently, AI and machine learning have offered tools to mitigate these challenges. Machine learning algorithms can sift through complex datasets – genomic data, clinical trial results, electronic health records, literature – to find patterns humans might miss. For example, AI models have been used to predict patient responses to therapy, identify potential new drug targets, optimize chemical synthesis routes, and detect adverse event signals from post-market data ([18]) ([19]). One study noted that AI is being applied across the drug lifecycle to reduce reliance on animal studies, build predictive clinical trial models, and integrate heterogeneous data sources ([19]). In oncology and rare diseases, sponsors are experimenting with AI-driven patient stratification to enrich clinical trials or adapt dosing dynamically.

Despite these innovations, integrating AI into regulated drug development has been cautious. Regulators and industry have raised concerns about data quality, model bias, interpretability, and robustness over time. AI models can inherit biases in their training data (e.g. underrepresentation of certain populations) or degrade when deployed under new conditions (known as model drift). The black-box nature of some AI, especially complex neural networks, also challenges validation: it may be hard to trace why an AI made a certain decision. As Ismail Amin notes, key challenges include “variability in data quality, potential biases, difficulty in understanding AI models, and changes in model performance over time” ([20]). In short, regulators worry about trusting AI outputs without adequate evidence.

Historically, companies have often incorporated AI-based analyses under existing regulatory frameworks without specific AI rules. For instance, model-informed drug development (MIDD) tools – such as pharmacometric models to predict a safe starting dose – are not new. The FDA has accepted computational models for certain bridging data. But AI’s ability to “learn” from data dynamically is a new variable. Until recently, most FDA guidance addressing AI focused on medical devices (e.g. draft guidance on AI-enabled SaMD in 2021) or on general medical software. There was no dedicated FDA guidance on AI in drug development. This left ambiguity: if a sponsor used an advanced AI algorithm to analyze trial data or optimize manufacturing, it was unclear what explanation or validation the FDA would expect. Indeed, industry leaders have cited lack of regulatory clarity as a top barrier to AI adoption in drug R&D ([16]).

The pace of AI innovation accelerated with the rise of big data in healthcare and the availability of powerful computational resources. Concurrently, regulators around the world began taking notice. In 2024, the European Medicines Agency (EMA) published a Reflection Paper on AI in the medicinal product lifecycle ([14]) ([15]). EMA’s document (finalized in September 2024) discusses principles for AI use from discovery to post-authorization, advising developers on good practices and noting potential benefits and risks. The EMA reflection paper emphasizes the need for transparency, validation, and human oversight in AI-based systems. Similarly, in the UK the Medicines and Healthcare Products Regulatory Agency (MHRA) has released guidance on AI use in medical devices and health products. However, until the FDA guidance, no U.S. agency had provided specific recommendations for AI in drug development.

In the U.S., the FDA had been receiving increasing numbers of submissions containing AI. As of late 2024, the FDA reports over 500 submissions with AI components since 2016 ([5]). These submissions span numerous therapeutic areas, reflecting a broad interest in leveraging AI (especially in oncology, neurology, and gastroenterology) to improve trial efficiency or data analysis. FDA officials observed an “exponential” rise in AI usage ([6]). This trend caught the agency’s attention: Commissioner Califf stated that with “appropriate safeguards in place, artificial intelligence has transformative potential to advance clinical research and accelerate medical product development to improve patient care” ([21]). But that potential must be balanced with standard regulatory priorities. The FDA’s mission to protect public health demands confidence that decisions driven by AI are valid and reproducible.

Thus, in January 2025 the FDA proactively issued draft guidance to begin filling the void. This draft guidance – its first for drug-related AI – formally acknowledges the growing role of AI and defines how FDA intends to evaluate it. It provides a structured, science-based approach so that sponsors can prepare appropriate documentation. As Fahimeh Mirakhori (Ph.D.) summarizes: the framework is “structured, risk-based,” focusing on COU and associated risks, and guiding both regulators and developers on when AI models can be “trusted to support critical decisions” ([22]). By laying out the expectations now, the FDA aims to encourage innovation while ensuring rigorous standards are met.

Regulatory Framework and Historical Precedents

Before this guidance, the FDA’s oversight of AI in drug development fell under existing regulations for data and analysis. For example, any computer software that meets the definition of a “medical device” (under 21 U.S.C. §360j) must follow device premarket requirements. Some AI tools (like an algorithm diagnosing disease from medical images) are regulated as devices, whereas tools purely for research might not be. In the drug context, if an algorithm outputs data that ends up in a New Drug Application (NDA) or Biologics License Application (BLA), the agency would review that as part of the evidence of safety/effectiveness, but historically with no tailored guidance on how to validate the algorithm.

The closest precedent is the ASME V&V 40 standard for medical devices, which the FDA has acknowledged. V&V 40 provides a framework for computational model verification and validation (V&V) for medical devices. It uses a risk-based approach: models impacting critical decisions must have stronger validation. The FDA’s new drug guidance largely echoes this paradigm but applies it to AI models for drugs. In effect, it says: treat AI models like any other scientific tool – they need evidence proportional to their impact.

Another precedent is FDA’s own guidance on model-informed drug development (2019) and on synthetic control arms (2018). These guidance documents encouraged development of new methods, including statistical and computational models, but emphasized justification and evidence for novel approaches. The new AI guidance can be seen as an extension: just as a statistical model predicting outcomes must be validated, so must an AI/ML model. However, unlike a typical statistical model, AI models may adapt and re-train, requiring lifecycle oversight. The draft guidance addresses this by including steps for maintaining credibility over time (see later sections).

The law also plays a role. Under the 21st Century Cures Act (2016), certain medical device software functions that do not “drive” clinical decisions or treat patients are exempted from device regulation. FDA’s guidance regarding software as a medical device (SaMD) delineates when a clinical decision support software is or isn’t “medical device.” Similarly, in drug development, if an AI tool merely assists internally (no patient safety impact), it falls outside. The draft guidance basically codifies these exemptions: it explicitly carves out “internal workflows and operational efficiencies” from its scope ([3]) ([4]), in line with not regulating purely administrative tools.

Finally, on the machine learning side, FDA had in November 2021 released a discussion paper on “Proposed Regulatory Framework for Modifications to AI/ML-based Software as a Medical Device,” focusing on how to handle continuously learning algorithms. While that was device-oriented, it established the idea that regulators need a means to handle evolving AI. The drug guidance similarly assumes an AI model’s performance can change over time and thus requires a lifecycle plan (Step 6, see below).

In summary, the FDA’s 2025 draft guidance builds on these precedents by explicitly addressing AI/ML models used for regulatory support in drugs. It sets expectations that sponsors define clear contexts of use for their models, assess risks, and compile credibility evidence. The guidance is careful to distinguish AI uses that should be part of regulatory review versus those that are outside. The remainder of this report delves into the language and implications of the guidance, step by step.

Scope and Applicability of the Guidance

The draft guidance clearly delineates which AI uses are in-scope and which are out-of-scope. Understanding this delineation is crucial for sponsors to know whether their AI activities will attract FDA scrutiny under this guidance.

In-Scope Uses of AI

The guidance applies to any AI model used during the lifecycle of a drug or biological product where the AI’s output is intended to support a regulatory decision about the product’s safety, effectiveness, or quality ([4]) ([2]). This is a broad inclusion of virtually all drug R&D stages where AI-generated data can influence outcomes. Specifically, the guidance mentions use during:

  • Nonclinical (preclinical) development: For example, an AI model that predicts toxic risk from in vitro data or an animal study design. If the AI’s output informs the safety profile used in an Investigational New Drug (IND) application, it’s in scope.

  • Clinical development: For example, an AI model that recommends patient inclusion/exclusion criteria, or one that predicts efficacy endpoints from trial data. An AI algorithm that classifies patient risk and thus determines dosing or monitoring during a trial is in scope. FDA’s Goodwin analysis gives a hypothetical: an AI model categorizing patients by risk of life-threatening side effects, used to decide outpatient vs. inpatient monitoring. Because an AI error could lead to a fatal outcome, this high-stakes use clearly requires credibility assessment ([23]).

  • Manufacturing and Chemistry, Manufacturing, and Controls (CMC): AI tools that optimize manufacturing parameters, detect out-of-specification batches, or ensure product consistency would be in scope if their outputs affect product quality. For instance, an AI system that checks quality control (QC) assay data and flags contamination risks for action is within the guidance. A low-risk example from DLA Piper: an AI identifies manufacturing batches out-of-spec, then a human reviews and executes a corrective action plan ([24]). Even though human oversight exists, because AI influenced QC, the guidance would apply (with an appropriately low-risk credibility plan).

  • Postmarketing surveillance (pharmacovigilance and real-world evidence): If AI analyzes electronic health records or adverse event data to generate insights used in regulatory filings (e.g. a periodic safety update report), this is in scope. The FDA explicitly mentions pharmacovigilance contexts and real-world data uses ([25]).

  • Product lifecycle analysis: The guidance’s title and summary confirm it covers all phases (preclinical through postmarket) as long as the AI influence is on regulated attributes ([26]).

In all these cases, the AI’s role is to produce information or data used for decision-making about a regulated product. In short, if a sponsor is relying on AI-derived results as part of evidence (e.g. a model-predicted endpoint, simulation, or biomarker analysis) that will appear in an IND, NDA, BLA, or other regulatory submission, the AI is likely in scope.

Out-of-Scope Uses of AI

Crucially, the guidance is explicit that many AI uses are not covered. It excludes AI used in purely research or operational contexts unless the AI output directly affects a regulatory decision. Specifically, the guidance “does not address” AI when used in:

  1. Drug discovery: This includes AI for target identification, lead optimization, virtual screening, and other pre-development activities. For example, if an AI algorithm helps identify a new molecular scaffold before any preclinical tests, that use is out-of-scope. The guidance’s language is unequivocal: it “does not address the use of AI models … (1) in drug discovery” ([4]). (However, as discussed later, if an AI in “discovery” actually generates data influencing safety/efficacy decisions, say by predicting toxic metabolites, that narrow aspect might tangentially be in scope. But general discovery activities are excluded.)

  2. Operational efficiencies/internal workflows: This covers any AI that helps with administrative or operational tasks if those tasks do not impact safety/quality. The guidance cites examples such as resource allocation and drafting or writing a regulatory submission ([4]) ([3]). In plain terms, using AI to improve efficiency (e.g. automating document editing, generating an initial draft of a study report, scheduling clinical appointments, managing supply chain logistics) is not regulated by this guidance. As a Pfizer executive noted, the use of AI in medical writing and transcript analyses falls under “internal workflows” and is excluded ([27]). In that category also falls AI for general data management or digital charting that never feeds into a specific regulatory claim.

  3. “Low-impact” safety/quality tasks: If an AI influences processes but the end result does not impact patient safety, drug quality, or study reliability, it’s out. For example, an AI that automates labeling artwork for a non-critical change might be exempt, unless labeling content itself becomes a regulatory issue. The FEDERAL REGISTER notice [29] explicitly defines the exclusion as any internal use “that do not impact patient safety, drug quality, or the reliability of results from a nonclinical or clinical study” ([4]).

These exclusions mean the guidance is deliberately narrow: it focuses on AI that has material regulatory effect. This is consistent with policy goals. FDA is not trying to police whether your company uses ChatGPT to write emails or to optimize office staffing; it’s only concerned when the AI’s output could influence a decision about whether a drug is safe and efficacious or of acceptable quality. As one summary put it, the guidance is limited to models that impact core regulatory considerations ([28]). In practice, sponsors can continue to leverage AI for internal work (often an enormous efficiency gain) without needing to go through FDA’s credibility process, as long as those tools are segregated from the regulated evidence.

Table: In-Scope vs Out-of-Scope AI Uses

Application AreaIn Scope?Example
Drug discovery (target ID, lead optimization)No: AI used only to identify potential drug targets or optimize lead compounds before preclinical development.AI algorithm suggests a novel compound binding a cancer target (no direct safety/efficacy data yet) – out-of-scope.
Nonclinical modeling (toxicity, PK/PD)Yes: If AI models generate safety/efficacy data that inform Investigational New Drug (IND) applications. Requires credibility.AI predicts organ toxicity from molecular structure, informing IND risk assessment – in scope.
Clinical trial design & analysisYes: AI used to set inclusion criteria, predict endpoints, or analyze trial data affecting study outcomes.AI classifies patients by biomarker, selecting trial participants; or AI method predicts efficacy endpoint based on interim data – in scope.
Manufacturing and quality controlYes: AI tools used for process control, batch release decisions, or QC data analysis that affect product quality.AI-driven monitoring flags a batch outside specifications and suggests corrective action before market release – in scope.
Pharmacovigilance and real-world evidenceYes: AI systems mining safety databases or EHRs to generate signals or evidence used in regulatory submissions.AI identifies an unexpected adverse event pattern from real-world data; findings included in a risk-management plan – in scope.
Regulatory writing and reviewNo: Using AI (e.g. LLMs) to draft submission text, or tools for editing/review, provided human experts verify content. Does not impact safety/quality directly.An LLM drafts the text of a Clinical Study Report; experts then revise and validate it. – out-of-scope, assuming no safety data generation ([4]) ([3]).
Internal administrative toolsNo: AI for scheduling, resource allocation, document formatting, etc., where outputs do not influence clinical or quality decisions. → excluded.AI schedules clinic appointments or optimizes lab workflow, but does not affect any patient safety or product quality decisions – out-of-scope.
Use of AI without human oversight in high-risk areasYes: If AI makes autonomous decisions on safety/efficacy without human review, it’s higher risk and definitely in scope.AI independently classifies an MRI scan to include a patient in a trial (no human review) – in scope and high risk.
Low-risk oversight (human-in-loop)Yes (but lower risk): If AI provides a recommendation that is reviewed by a human for final decision, it is still in scope (though credibility requirements may be lighter).AI suggests batch off-spec; human pharmacist reviews and confirms before rejecting the batch – in scope, with moderate risk ([24]).

Table 1: Examples of AI applications in drug development and whether they fall under the FDA draft guidance. In-scope uses involve AI outputs that directly feed into regulatory decisions about safety, efficacy, or quality. Out-of-scope uses are for discovery or internal tasks without direct regulatory impact ([4]) ([3]).

The table illustrates the line drawn by FDA: AI in discovery and back-office is out, AI used as part of validated development or review processes is in. It is important to note that in-scope vs out-of-scope is judged on the impact of the AI’s output, not the technology itself. For example, the same natural language AI could draft a private memo (out-of-scope) or assist in writing a patient consent form that is used in an IND (in-scope if content affects risk assessment). Similarly, an AI used in initial lead discovery (out-of-scope) might later be repurposed in a model predicting clinical response (in-scope). The guidance instructs sponsors to define their context of use precisely and assess whether the model will affect regulatory conclusions ([2]) ([10]). Sponsors uncertain about whether an AI application is within scope are advised to engage FDA early for clarification ([4]) ([29]).

The FDA’s 7-Step Credibility Framework

At the heart of the draft guidance is a 7-step, risk-based credibility framework for AI models. The framework is meant to be a planning and evaluation process: it guides sponsors on how to build and maintain evidence that their AI model can be trusted (“credible”) for its intended use. Each step corresponds to a key element of demonstration. These steps reflect model risk management principles (similar to those in software validation or quantitative model validation) tailored to AI/ML.

FDA identifies these steps as Defining the context and model, Risk assessment, and then implementing and verifying a Credibility Assessment Plan. The steps are:

  1. Define the Question of Interest (QI) – the problem or decision addressed by the AI model.
  2. Define the Context of Use (COU) – the precise role and scope of the AI model in answering the QI.
  3. Assess Model Risk – evaluate the risk that the AI model’s use will lead to incorrect decision or outcome.
  4. Plan Credibility Assessment Activities – develop a systematic plan of tests/analyses to establish model credibility for its COU.
  5. Execute the Plan – carry out the validation and evaluation activities.
  6. Document Results – compile a credibility assessment report, including results, conclusions, and any deviations from the plan.
  7. Determine Adequacy – conclude whether the evidence supports using the model for the COU, or if further work is needed.

Each step is described in detail in the draft and accompanying analysis (see Table 2). These steps must be proportional to risk: high-risk applications (where AI model influence on a critical decision is large) demand more thorough evidence. The FDA emphasizes early planning: sponsors should incorporate these steps into their development of AI tools, and engage with FDA to align on expectations at steps 4–5. Although not every step must be submitted to FDA automatically, this framework documents a complete lifecycle understanding.

The 7-step framework, as summarized by Goodwin Law, is:

StepActionKey Focus
1Define the Question of Interest.Identify the specific decision or parameter the AI will address. ([11]). Example: “Classify patient as high/low risk for toxicity.”
2Define the Context of Use (COU).Specify how the model will be used and combined with other information ([30]) ([31]). E.g., “Model output will determine hospital monitoring level.”
3Assess Model Risk.Evaluate risk by (a) model influence (extent model affects decision) and (b) decision consequence (impact if wrong) ([32]). Example: high influence + high consequence = high risk.
4Develop Credibility Plan – Document the planned activities to demonstrate reliability.Outline evidence needed: model development details, data sources, validation datasets, performance criteria ([33]). E.g., “Plan to test model on independent trials data.”
5Execute Plan.Perform the validation activities (e.g. testing, cross-validation, sensitivity analyses). Engage FDA to confirm approach as needed ([13]).
6Document Credibility Results.Prepare a report with methodology, test results, consistency checks, and any deviations. The report is meant to "provide information that establishes the credibility of the AI model for the COU" ([34]).
7Conclude Adequacy.Decide whether the model is fit for the COU; if evidence is insufficient, then take steps (e.g. collect more data, reduce model influence) to meet requirements ([35]).

Table 2: FDA’s 7-Step Credibility Assessment Framework for AI Models (from draft guidance). Steps involve defining the question and context, assessing risk, planning and executing validation, and concluding adequacy. (Source: FDA Goodwin summary ([11]) and DLA Piper analysis ([10]) ([32]).)

Below we walk through these steps, elaborating on their intent and components. References in brackets cite FDA’s guidance or expert analysis of the guidance.

Step 1: Define the Question of Interest

Each AI application must start with a clear problem statement. The guidance calls this the “question of interest” ([11]). This is not the same as the COU; it is the specific goal the model aims to achieve. For example, a question of interest might be: “Can this AI model predict which patients will experience life-threatening side effects from our drug candidate?” or “Can the model determine if a manufacturing batch meets quality specifications?”. Defining the question ensures the model development is purpose-driven, and it helps identify what data and metrics are relevant. The question might come from any point in the lifecycle: in a clinical trial, it could be “should we dose-adjust a patient based on their predicted risk?”; in postmarket, it could be “does the model detect previously unreported adverse events in EHR data?”.

The sponsors should document this question explicitly. It sets the stage for all later steps. A poorly defined question leads to a mismatch between model output and regulatory need. The guidance uses examples like patient selection criteria or risk stratification in trials ([36]), but any AI-driven decision point ought to begin with Step 1. As DLA Piper notes, Step 1 requires “the proposal should define the specific question, decision, or concern being addressed by the AI model.” ([31]).

Step 2: Define the Context of Use (COU)

After formulating the question, the next step is to specify the context of use (COU) of the AI model ([30]) ([31]). Context of use is a term the FDA has borrowed from medical device regulation (and from modeling standards). It means: What role will the model play, and under what conditions? The COU includes details such as:

  • Scope of the model: What exactly is being modeled (e.g. patient survival probability given biomarker data)?
  • Use in workflow: How will the output be used (as a primary decision-maker, a decision support, an alert, etc.)? Will human experts verify it, or act autonomously? (For example, is it fully automated or part of a human-in-the-loop process ([32]).)
  • Data inputs and boundaries: What inputs does the model use (clinical trial data, lab results, imaging, etc.) and from what population? Are there other information sources that will supplement the model’s output (e.g. will clinicians still require an MRI or lab test in addition to the AI output)?
  • Operating parameters: Will the model ask new queries over time, or is it a one-time analysis? Is it deployed in a closed setting (e.g. within a single trial), or broadly across different studies?

The COU essentially describes the environment and assumptions for the AI model. It is critical because an AI model’s credibility cannot be assessed without knowing how it is supposed to be applied. The draft guidance stresses that different COUs can lead to different validation needs: an AI in one trial may be high-risk, while the same model in a different, less-critical setting might be lower risk ([37]) ([31]). Defining COU also aligns with how FDA reviewers think: they evaluate a model’s output only within the stated scope. If sponsors later want to expand the COU (e.g. apply the model to a different patient population), they would likely need to repeat some steps.

In practice, sponsors should answer questions like: “What exactly is the AI predicting? How will this prediction be used by the trial or manufacturing process? In what patient population or manufacturing context? Is the model being used standalone or together with other data?” For instance, if an AI model predicts patient response, will doctors automatically adjust doses based on it, or will they see the AI’s suggestion and then decide? These details belong in Step 2. A well-defined COU sets the risk level and guides later steps.

Step 3: Assess the AI Model Risk

With the scenario framed, Step 3 is to assess the risk posed by the AI model in that context. The FDA guidance recommends evaluating two factors: model influence and decision consequence ([32]).

  • Model Influence: How much does the AI model influence the decision? If the AI’s output is used as a final determinant, influence is high. If it’s one of several inputs, a secondary check, or is overridden by human check, influence is lower. For example, an algorithm that automatically determines patient dosage without review is high influence. An algorithm that suggests dosing but requires a doctor’s sign-off is moderate influence.

  • Decision Consequence: What is the potential harm (or benefit) if the model is wrong? This considers the severity of a wrong answer. A decision consequence is high if a mistake could lead to serious patient harm or product failure. For example, misclassifying a high-risk patient as low-risk (preventing necessary care) has high consequence. Misclassifying an expected mild side effect has lower consequence.

Combining these two dimensions yields overall model risk. DLA Piper explains: “AI models and use-cases in which AI is used to make a final determination … without human intervention will be considered as higher risk,” especially when safety is involved ([32]). Conversely, models that only support minor decisions with human oversight may be lower risk.

As a practical guide, sponsors can create a simple risk matrix or narrative. If either influence or consequence is high, risk is high. Lower the risk by reducing influence (e.g. adding human checks) or mitigating consequences (e.g. ensuring backup safety measures). The draft guidance suggests this assessment should determine how rigorous the subsequent credibility activities must be ([32]).

For illustration, consider two scenarios:

  • High Risk: An AI model that autonomously classifies whether a patient with a new therapy needs life-saving intensive care. Here model influence is effectively 100% (the AI makes the call), and consequence is high (a wrong call could be fatal). This demands thorough validation of the AI’s performance, as any error has grave implications.

  • Lower Risk: An AI tool that identifies potential outliers in manufacturing data, but only flags them for human review. Influence is moderate (a human ultimately decides action) and consequence is moderate. Such a scenario still requires validation, but the bar may be somewhat lower. FDA notes that requiring human review after the model output generally reduces risk ([32]).

In Step 3, sponsors should document their reasoning on model risk. This includes describing the worst-case impact and how much they are relying on the AI. Some experts suggest using frameworks like NASA’s model credibility assessment or cross-industry risk assessments to quantify this. The key is that risk informs the stringency of Steps 4–6: higher-risk models need more data, more testing, and possibly external studies, whereas low-risk models might need only basic internal validation.

Step 4: Develop a Credibility Assessment Plan

Step 4 is to plan the credibility activities that will establish the AI model’s trustworthiness for its context of use. Think of this as creating an audit plan for the model. The plan should list the specific tests, analyses, and documentation that will be done (or have been done) to verify the model.

Key elements of the plan include ([33]):

  • Model Description: Detail of the AI model’s structure (algorithm type, architecture, hyperparameters). For a trained ML model, include how it was built.
  • Data Sources: Comprehensive description of the data used for training, tuning, and testing. This includes data provenance, preprocessing steps, and any limitations (e.g. missing data, imbalances). For example, if training data came from a particular demographic or region, note that.
  • Performance Metrics: Identify the metrics and acceptance criteria to be evaluated. This could be accuracy, sensitivity, specificity, area under the curve (AUC), or other domain-specific measures (e.g. positive predictive value for a safety marker). The plan should state performance targets that are acceptable for the COU.
  • Validation Approach: Specify how the model will be tested. This might involve cross-validation, use of an independent test set, simulation studies, or prospective validation. It should also include robustness checks (e.g. testing on edge cases, stress testing with noisy or unexpected inputs).
  • Comparators: If applicable, describe any baseline or comparator models. FDA will want to know how the AI compares to existing methods or to standard-of-care.
  • Provisions for Updates: If the model will be updated (adaptive learning), include how future data will be incorporated and how changes will be tested. (See Step 7 discussion).
  • Data Integrity and Security: Outline how data was handled ethically and securely, and how the model’s code is controlled. (Although not emphasized in the draft, good practice is to maintain version control and audit trails for the model.)
  • Bias and Fairness Checks: Since bias is a known issue, the plan should consider how fairness will be assessed. For example, analyzing performance across subgroups. The guidance itself doesn’t enumerate this, but many commentaries advise it.

The plan is essentially a protocol for model validation. It should be tailored to the COU and commensurate with the model risk. For a high-risk model, Step 4’s plan would be extensive: numerous evaluation datasets, rigorous performance thresholds, and possibly third-party review. For a low-risk model, the plan might be simpler.

From a regulatory standpoint, the plan provides transparency: it shows FDA how the sponsor intends to assure quality of the AI. Importantly, the draft guidance indicates that sponsors should decide whether and when to submit the plan to FDA. If the model is critical and/or novel, sponsors might include the plan with an IND or BLA as part of their submission package or discuss it in a pre-sub meeting. If the model is lower risk, they may keep it internally but still share summaries if requested. The FDA states it will “set expectations regarding appropriate credibility assessment activities” during early engagement ([13]).

Step 5: Execute the Credibility Plan

Having a plan, Step 5 is execution. This means carrying out all the tests and evaluations as described. The deliverables of this step are the raw results: the model outputs, error rates, statistical analyses, etc. Essentially, this is where the sponsor gathers the evidence. Key activities might include:

  • Testing on Independent Data: Use held-out data or prospective data (if available) to test how the model performs outside its training set.
  • Sensitivity and Specificity Analysis: Evaluate how the model’s performance metrics behave under different scenarios.
  • Stress Testing: Challenge the model with edge cases, noisy inputs, missing values, or out-of-distribution samples to see how stable it is.
  • Comparative Benchmarks: If relevant, run the model alongside current best practices or other algorithms to show its added value.
  • Iterative Tuning: If initial tests reveal shortcomings, retrain or fine-tune the model and retest. Document those iterations.

Execution must be thorough enough to populate the credibility report. Sponsors should log everything meticulously, because later steps require justification of any decisions (see Step 6). The draft guidance emphasizes that FDA engagement is important during execution ([13]). This means sponsors may present interim results to FDA in meetings, get feedback on their plan, and adjust if necessary. For example, if FDA review staff thinks a planned validation dataset is too small, they might request a larger study.

From a project management view, Step 5 is often the longest and most resource-intensive phase, especially for complex models. It may involve statisticians, data scientists, and subject matter experts. It is critical that execution follows the plan; if it diverges (e.g. unforeseen issues force a changed test method), those changes must be documented as plan deviations in the next step.

Step 6: Document the Results

Once the plan is executed, Step 6 is to document all findings in a credibility assessment report. This report should compile the evidence, explaining how it was gathered and what it shows about the model’s reliability. Main contents of the report include ([38]):

  • Description of Activities: A record of what was done (e.g. “We tested the model on 200 independent subjects’ data from Trial X”).
  • Results Summary: Tables/figures of performance metrics, error analyses, confusion matrices, ROC curves, etc.
  • Bias and Subgroup Analysis: If performed, results of checks for consistent performance across demographic or disease subgroups.
  • Deviations from Plan: Any differences from the original plan (Step 4) should be explained here. For example, “due to insufficient data, the test dataset was smaller than planned.” Sponsors must justify any deviations and discuss how they affect confidence in the model.
  • Regulatory Considerations: If discussions were had with FDA, summarize any agreements about data sufficiency or additional tests.
  • Limitations: Note any known limitations of the validation. For instance, “the model has not been tested in patients over age 75.” Transparency about limitations helps FDA understand applicability.

The credibility report is meant to provide “information that establishes the credibility of the AI model for the COU” ([38]). It should be clear and comprehensive: an FDA reviewer unfamiliar with the model should be able to conclude how well it works. Crucially, this is not a marketing document: it may contain negative findings if tests failed, but any such issues must be addressed. For example, if a model’s accuracy dips to 80% vs the target 90%, the sponsor should note this and possibly explain how the model will still be managed (maybe requiring additional confirmatory tests in practice).

Sponsors should consider whether to submit the credibility report to FDA. The draft guidance suggests discussing this with FDAERS early on. High-risk models with borderline metrics might need FDA pre-approval of the plan and results, whereas low-risk models might keep the report for internal assurance and only provide a summary in the application. Either way, having a well-documented report is crucial; it can be requested during an inspection or review.

Step 7: Determine Adequacy of the Model for the COU

The final step is drawing conclusions on whether the model’s performance and evidence meet the needs of the COU. In other words, is the model credible enough to be used as intended? This determination should weigh the evidence in the credibility report against the risk profile and the original question.

Possible outcomes of Step 7 include:

  • Adequate Credibility: If the model met its performance objectives and no significant issues arose, the sponsor concludes the model is adequate. They may then proceed to rely on it in regulatory submissions or operational decisions, documenting this conclusion in their filings. They should also continue to monitor the model (passing data from actual use back into the model lifecycle plan).

  • Borderline/Insufficient: If the evidence is insufficient or reveals problems (e.g. high error in a subgroup or unexpected drift), the draft guidance suggests there are fallback options ([35]). The sponsor might: (1) gather more data or adjust the credibility plan; (2) restrict the model’s COU (e.g. narrow the patient population); (3) reduce the model’s influence on decisions (e.g. add a human gatekeeper); (4) continue to collect evidence post-deployment and monitor closely; or (5) abandon the model if it cannot be trusted at all. The FDA draft guidance specifically provides five possible adjusting approaches if credibility is not established ([35]). This flexibility is important: sponsors are not expected to pass every model the first time, but to iteratively improve it if needed.

Whatever conclusion is reached, it should be justified by data. Sponsors should frame Step 7 findings in the context of risk: a high-risk model with marginal performance may not be acceptable, whereas a moderate-risk model with similar performance might be. The decision also sets the stage for regulatory communication: if there are uncertainties, sponsors need to explain how they’ll manage them, either via mitigations (like extra patient monitoring) or follow-up studies.

Overall, the 7-step process is designed to be transparent and iterative. It parallels good practices in engineering (“verification and validation”) and in biostatistics. It encourages scientific rigor: define the hypothesis (question), plan the experiment (credibility plan), run it, and then see if the hypothesis holds. The novelty is applying this rigor to AI models in regulated contexts. Several industry analyses confirm that if sponsors follow these steps, the FDA expects a predictable review process ([11]) ([10]). It is worth noting that this approach is non-prescriptive in the sense that FDA does not mandate specific thresholds or tests; rather, it leaves flexibility so that plans can be tailored. For example, it does not say “learning rate must be <0.1” or “use exactly 10,000 data points,” since optimal methods vary by model and domain. Instead, it provides a structured framework so that sponsors and FDA can have a common language about AI credibility.

Contextual Considerations

Defining Credibility in AI Context

The concept of credibility is central to the guidance. FDA defines an AI model’s credibility as “trust in the performance of an AI model for a particular context of use” ([39]). This echoes ideas from computational science (like the NASA credibility framework) and medical device V&V. In essence, credibility means that the model does what it is expected to do, and does it consistently, for the specific question at hand.

Building credibility involves showing that the model is valid (it accurately solves the intended problem) and reliable (it does so repeatedly under the specified conditions) ([22]) ([34]). Sponsors can think of credibility as akin to analytical or clinical validation in other contexts. For example, an assay or lab test must be validated for accuracy and precision before being trusted; similarly, an AI model must be validated through evidence. FDA does not prescribe a specific “credibility score,” but expects that the truth of the model’s performance is demonstrated through data in the credibility report.

Several expert commentaries underline that credibility is not static. One article describes credibility as a continuum requiring continuous oversight with model updates and real-world monitoring ([22]). FDA’s inclusion of a step on documenting results (and deviations) and its emphasis on life-cycle suggest that credibility must be maintained, not just achieved once. For example, after the AI model is used in manufacturing or postmarket, any significant retraining or drift would trigger a re-assessment or updating of the credibility documentation.

Thus, expected practices to ensure credibility include: robust cross-validation with separate data; transparency in algorithm design (to the extent possible); testing under varied conditions; monitoring outputs for anomalies; and updating models with new data responsibly. While the draft guidance does not mandate interpretability or explainability methods, the spirit of credibility implies that sponsors should at least understand their model’s key drivers (e.g. using model explainability tools) to verify it aligns with scientific understanding.

AI Model Risk is Context-Specific

It bears repeating that risk is context-specific. The same AI algorithm might be used for very different ends. For example, a deep neural network segmenting medical images could be part of a device system (excluded from this guidance as it would be regulated as a device) or be used to quantify tumor changes for a drug trial primary endpoint (included here). Each use-case would have its own COU and risk profile.

Sponsors should categorize their AI use-cases similarly to earlier table: high-risk (e.g. life/safety decisions, final determinations) versus low-risk (e.g. supplementary analyses). DLA Piper notes that final decisions without human intervention are considered high risk ([32]). Another way to think of it: whenever patient welfare or product integrity could be compromised by model error, thorough scrutiny is needed. When an AI model is one part of a decision-making chain (with human oversight and fallback), the team can justify a lighter validation, but must still make sure the chain as a whole is safe.

AI model risk should also consider the maturity of the technology. Cutting-edge novel models may inherently carry more unknowns and thus be treated as higher risk. Conversely, a well-established statistical method (even if it qualifies as AI) might be seen as lower risk. That said, even a transparent model (like logistic regression) used in a high-stakes context requires careful validation; and a complex neural network used in a casual context could be overkill.

Relation to Other Regulations

This guidance does not exist in isolation. For AI-enabled medical devices, the FDA has separate guidance (finalizing a framework for AMA-driven software as a medical device) on how to handle software modifications and learning loops. The principles overlap (both emphasize risk, validation, and transparency) but the device guidance is tailored to software’s role in diagnosis/treatment. Drug sponsors should be aware of any overlaps: if an AI model for drug use also relates to a medical device (e.g. a companion diagnostic), the corresponding device guidance may apply.

Importantly, the FDA clarifies in the draft that it is issuing non-binding recommendations. This means that legally, companies are not compelled to follow every suggestion or method. However, deviating from the guidance without justification is risky: since it represents FDA’s thinking, ignoring it could trigger questions at review. If sponsors use alternative approaches (e.g. a different risk framework than “7-step”), they should clearly justify why their method is equally robust. The FDA’s stated intent is to harmonize expectations, not to constrain innovation. Indeed, the agency solicits comments on whether more detailed guidance is needed for post-market AI use, indicating the process may evolve.

Data, Model Validation, and Evidence Considerations

A credible AI model depends on data quality and rigorous validation. Sponsors should ensure that training and testing datasets are appropriate and representative of the COU. For example, if an AI predicts outcomes for a certain patient population, the training data should cover that population’s demographics and disease variations. If there are known subgroups (e.g. different age groups or genetic variants), the data should ideally include them, or the model’s limitations in each subgroup should be documented.

FDA expects that sponsors will apply systematic verification and validation (V&V) methods to AI models, akin to other regulated tools. Verification means checking the model was implemented correctly (code review, unit tests, etc.), and validation means checking it yields clinically or scientifically correct output. While the guidance does not explicitly use the term “validation,” Steps 4–6 are essentially a validation process: testing the model to confirm that it works as claimed.

Some specific evidence considerations:

  • Training/Tuning Data vs. Test Data: Sponsors should clearly separate datasets. Data used to train the model (including any used for hyperparameter tuning) must be distinct from data used for evaluation. Overfitting is a known risk: a model might perform perfectly on training data but fail on new data. Thus, performance metrics should come from an unbiased test set. If proprietary or limited data means test sets are small, sponsors should discuss this limitation.

  • Real-World Data (RWD): If the model uses RWD (electronic health records, insurance claims, etc.), the data’s provenance and quality (coding accuracy, completeness) should be assessed. The guidance notes AI can analyze large real-world datasets ([18]), but sponsors must be mindful of biases in such data (e.g. undercoding of certain populations) and adjust accordingly.

  • Model Updates and Continual Learning: If the AI model is planned to be updated during its use (such as retrained on new data), sponsors must describe how updates will be managed. Continuous learning can improve performance but also risk “drift” if not carefully monitored. While the guidance does not lay out specific update procedures, it implies that any significant model change should involve a repeat of the credibility assessment cycle (at least Steps 4–6 and possibly engagement with FDA).

  • Reproducibility and Audit Trails: In a regulated environment, sponsors should maintain audit trails for AI development. This includes version control of code, records of parameter settings, and logs of when models were trained or retrained. This information helps if questions arise during review or inspection. The guidance expects transparency: in Step 4, the model description should enable an auditor to understand what was done.

  • Statistical Calibration: Especially for predictive models, calibration (agreement between predicted probabilities and observed frequencies) is important. For example, if an AI model predicts a safety event with 10% probability, but the event actually occurs 30% of the time in test data, the model is miscalibrated. Calibration assessment is typically part of validation in clinical models. While not explicitly mentioned in the draft, sponsors should ensure their AI models are well-calibrated for decision use or adjust probability thresholds accordingly.

  • Generalizability: FDA recognizes that the real world can differ from controlled studies. Sponsors should think about how well the model will generalize to different settings. This might involve testing the model on external data (if available) or simulating broader conditions. For instance, if a model was trained on clinical trial data from one country, will it work equally well for another population? Any known limitations in generalizability should be disclosed.

In anticipation of these concerns, many commenters on AI in healthcare have stressed explainability and fairness. The draft guidance does not explicitly require explainable AI, but the notion of credibility implicitly demands that sponsors understand their models. Some sponsors may choose to apply explainable AI techniques (like feature importance or SHAP values) to illustrate that the model’s drivers are logical. In regulated contexts, being able to explain a model’s reasoning can increase trust, even if technically not mandated.

Statistical oversight also intersects: whenever AI models use patient data, FDA’s Good Clinical Practice (GCP) and data privacy regulations still apply. For example, data used for AI training might need to be de-identified or meet HIPAA standards. These requirements are outside the draft’s scope, but they should not be neglected when building an AI system.

Finally, sponsors should be aware that AI model failure modes can be subtle. Issues like data leakage (where training inadvertently includes future information), label errors (incorrect classification in training data), or subtle shifts in data distribution (e.g. a new diagnostic test with different calibration) can undermine performance. The credibility plan and report should demonstrate that such pitfalls were considered and mitigated.

Case Studies and Example Scenarios

To illustrate how the draft guidance applies, we consider several example use-cases drawing on published analyses and hypothetical scenarios. These examples span different stages of development and risk levels.

  1. High-Risk Clinical Decision AI (Hypothetical): A biotech develops an AI algorithm to classify patients with a novel oncology drug into “high-risk” or “low-risk” categories based on genetic markers and early lab results. This classification directly determines the treatment arm: high-risk patients get an intensive combination therapy, low-risk patients get standard therapy. This AI model is in scope because it impacts efficacy and safety decisions. (Step 1: QI = which arm to assign; Step 2: COU = stratify patients for risk-adapted therapy.) Because the AI’s decision could be life-altering, both influence and consequence are high (the AI has final say without human override). Model risk is therefore high. Under the guidance, the sponsor would need an extremely robust credibility plan: large, well-annotated training data from previous trials, multiple independent validation sets, and rigorous bias analysis. They might use cross-validation on past patient datasets and prospective validation in a pilot trial. Any failure (e.g. if the AI misclassifies a patient who then relapses) would be unacceptable. The sponsor would likely engage FDA early (perhaps at pre-IND) to agree on the plan ([13]). After execution, if the model beats pre-set benchmarks (say >95% accuracy on held-out test patients and balanced performance across subgroups), Step 6 documentation would detail these results. Step 7 would likely conclude adequacy, given the severity of risk was mitigated by strong performance. In practice, the sponsor may incorporate additional safeguards: for example, “all predictions must also be confirmed by a second method.” The developer might risk-reduce by not letting the AI fully automate the decision (thus reducing influence). Throughout, detailed documentation and early regulatory communication are key.

  2. Manufacturing Quality Control AI (Industry Example): A pharmaceutical CMC team implements an AI model to predict tablet dissolution failures before release, based on real-time sensor data during production. The output flags batches likely to be out-of-spec. Human operators then run confirmatory laboratory tests before discarding or reworking batches. Here the AI is in-scope (it affects product quality decisions), but risk is moderate: human review and retesting mitigate consequences of a false alert. In Step 3 terms, model influence is moderate (AI informs but does not decide alone) and decision consequence is moderate (a false negative could release a bad batch; a false positive wastes a batch).

The risk-based plan in Step 4 might thus be less onerous than for a clinical AI: fewer performance thresholds or smaller test sets might be acceptable. The sponsor could outline testing the model on historical batch data (Step 4), then applying it prospectively on a subset of production (Step 5). For example, in a retrospective test with thousands of batches, the model correctly predicted 92% of true failures and raised only 5% false alarms (accuracy metrics above pre-specified minima). This would be documented ([34]). In Step 6, the validation report would include confusion matrix and perhaps analyses of feature importance. If results were satisfactory, Step 7 would likely find the model adequate for use with human oversight, and the sponsor could implement continuous monitoring (e.g. after each production lot, compare predicted vs. actual dissolution results to watch for drift).

  1. Post-market Pharmacovigilance AI (Emerging Trend): Some pharmaceutical companies are exploring AI to automatically analyze large-scale safety data (e.g. electronic health records, social media reports) to detect adverse event patterns quicker than traditional methods. Suppose a company has an AI system that scans EHR databases and flags statistically significant spikes in certain side effects. This system could be used to generate alerts for new safety signals. Since patient safety is at stake, such an AI would be considered in-scope.

For example, if the AI catches a pattern suggesting a rare cardiac issue, regulators would want strong evidence the pattern is real (not an artifact of data). In Step 3, model influence is medium (the AI alerts but human investigators verify), and consequence is high (missed safety signals can harm patients). So overall risk is medium-high. The credibility plan might include back-testing the AI on known historical safety events (Step 4), showing it would have caught previous signals in retrospective data. It would also include a plan to continue to validate with real-time data (Step 5). Step 6 documentation could present metrics like sensitivity in detecting known signals. Step 7 would involve agreeing on how to act on AI alerts (will the company submit them as signals to FDA, or is this for internal monitoring?).

This scenario highlights an important nuance: AI in postmarketing is a developing area. The FDA guidance explicitly asks for comment on whether more detail is needed for AI in pharmacovigilance ([40]). That suggests regulators see this as an evolving field. For now, sponsors should treat it as in-scope but can expect FDA comments. Real-world examples are limited, but one can imagine regulators expecting high transparency: e.g., how the AI handles reporting biases, how cases are adjudicated after flagging, etc.

  1. Medical Writing Support AI (Excluded Use): A company uses a generative language model to draft sections of their Clinical Study Report and Investigator’s Brochure. Scientists provide bullet-point data, and the AI produces narrative text. Humans then edit the text. According to the guidance, this use is out-of-scope ([4]) ([3]) because it’s about internal workflow/streamlining rather than generating new safety/efficacy data. The company does not need to justify the AI’s performance to FDA for this task. However, the company should still ensure the final documents are accurate (regulations require truthful submissions regardless of how drafted).

This example aligns with the perspective in the medical writing community. Pravin Lakkaraju (Merative) wrote that the FDA explicitly excludes “internal workflows and operational efficiencies” ([27]). He noted that automating CSR drafting, improving text quality control, and harmonizing content are all internal uses. So companies can freely use an LLM for these tasks. The only caveat is that any data inserted by the AI (e.g. if the model extracted numbers from uploaded tables) still needs to be verified by humans. The guidance does not regulate the AI tool, but the usual rules on accuracy and data integrity still apply. Thus, no formal “credibility assessment plan” is needed for a writing AI, though best practice would be to track edits and ensure QA.

  1. Drug Discovery AI (Excluded vs. Included Nuance): Consider an AI model that proposes novel molecular structures with predicted drug-like properties. This model is used before any preclinical studies. By the guidance’s language, this is explicitly excluded ([4]). The rationale is that at the discovery stage, outputs are not directly evidence for safety or efficacy; one still does lab tests afterwards.

However, if an AI in discovery also provides output that later feeds into a regulated decision, nuances arise. For example, if the AI also predicts a toxic metabolite that will influence later safety testing, one could argue that component of the AI’s output has a regulatory impact. The FDA’s wording implies that basic discovery is out-of-scope, but sponsors should be cautious: any AI output that eventually affects the safety profile determination (like predicting toxicology study results) should be included under the 7-step process for that context. In practice, most discovery-stage AI falls outside the scope, but once the project crosses into preclinical candidate selection, any AI-derived evidence should be handled per the guidance.

As an illustrative nuance, the Foley/NatLawReview commentary noted that “the guidance emphasizes that it is limited to AI models (including for drug discovery) that impact patient safety, drug quality, or reliability of results” ([28]). This suggests that some aspects of discovery-driven AI could be in scope if they directly address safety/quality questions. Still, the safe interpretation is: AI suggestions for target chemistry (pure research use) are outside; model analyses of preclinical data (regulatory use) are inside.

These examples show how to apply the guidance’s criteria. In summary: if the AI’s output goes into a decision about a drug’s properties relevant to FDA review, it is in scope and must follow the 7-step framework. If it is purely an internal productivity tool or for non-regulated discovery activities, it is out of scope. This clear delineation lets sponsors classify their AI tools early in development.

Perspectives and Implications

The draft guidance has stirred discussion among various stakeholders. Here we summarize the perspectives of industry sponsors, legal advisors, and regulators, and consider the broader implications.

Industry Sponsors and Developers

For pharmaceutical companies and biotech firms, the guidance provides much-needed clarity. Previously, sponsors had to guess what documentation FDA might require when using AI. Now they have a formal roadmap. Many life-sciences companies are excited about AI’s potential but were held back by regulatory uncertainty. An industry executive noted that regulatory clarity was among the “top three barriers” to implementing AI in drug development ([16]). By codifying expectations, the FDA lowers that barrier.

Companies will need to adjust their development practices. AI model development must now include QA-like processes akin to what is done for clinical assays. Biostatistics and engineering teams will likely take on more prominent roles in proof-of-concept studies. Development plans should incorporate FDA’s 7 steps: e.g., IND planning should specify if an AI model will be used and include a credibility plan outline. For companies already using AI, retrospective documentation of credibility activities may need to be compiled. Smaller companies and startups might face challenges due to limited resources for such thorough validation, making it harder to include AI early. However, many sponsors may view this as worthwhile to gain regulatory confidence.

On the technical side, some sponsors fear that the guidance’s non-prescriptive nature could be interpreted inconsistently. For example, what constitutes “adequate” validation for a given model? The FDA says plans must be “commensurate with risk,” but does not give numeric criteria. As a result, companies may run mock audits internally or seek pre-IND meetings to gauge FDA staff expectations. Over time, as case precedents (or FDA-personnel guidance) emerge, standards will likely crystallize. Some sponsors may greatly exceed minimal requirements to avoid any question, adopting best practices like third-party audits of model development, extensive bias assessments, or even industry consensus standards for AI (if those develop).

Of note, the guidance’s explicit carve-out for administrative AI is widely seen as positive by sponsors. Many companies have already integrated generative AI for report drafting, data visualization, and workflow optimization. Having official recognition that these uses are outside FDA’s purview relieves anxiety. For instance, R&D teams can continue to use large language models for literature summarization or first-draft writing without fearing regulatory penalties. Nevertheless, sponsors must still ensure final submissions are accurate – but legally they are responsible for content regardless of how it was drafted. Some company policies may evolve to distinguish “decision-impacting AI” from “productivity AI” and to govern each category differently.

Startups and technology vendors that develop AI tools for pharma have new obligations under this guidance. They will likely need to provide documentation and validation support for their products if those products are used in regulated settings. For example, an AI company selling a patient stratification tool may need to share training methodology and validation data with the sponsor. This raises questions about proprietary algorithms and data privacy. Companies will need to balance transparency with IP protection. It is probable that service agreements will include clauses for “supporting regulatory review” of AI performance.

Regulatory and Policy Perspective

From FDA’s viewpoint, this guidance embodies the agency’s intention to be agile and risk-based. Commissioner Califf and others have emphasized that they want to “promote innovation and ensure the agency’s robust scientific standards are met” ([21]). The 7-step framework reflects a collaboration between multiple FDA offices, as noted in the Federal Register preamble ([41]). By engaging stakeholders early (even while the draft was being written, presumably), FDA aimed to produce guidance that is implementable rather than idealistic.

One notable aspect is the invitation for public comment and the openness to feedback on specific issues (e.g., alignment with industry experience, post-market AI use) ([40]). This suggests the guidance may evolve. FDA explicitly asked if the 7-step approach aligns with sponsors’ real-world experience, and whether further guidance is needed for life-cycle maintenance or pharmacovigilance. The comment period (through April 2025) could result in changes. For instance, sponsors may request numeric thresholds or examples of adequate evidence, while patient advocacy groups may push for stronger emphasis on fairness and transparency.

On the policy front, there is some tension to note. The Goodwin alert makes an interesting aside: it mentions a (fictional) “President Donald Trump” rescinding Biden’s AI Executive Order ([42]). This appears to be a mistake or hypothetical scenario in an otherwise factual overview. The real 2025 timeline has President Biden in office, though indeed the White House has also shown interest in AI regulation (AI Executive Orders in 2023, 2024). Regardless, FDA’s guidance was issued under the Biden Administration, consistent with its prior commitment to AI in healthcare (for example, in 2022, the White House had established AI R&D guidelines and ethical AI priorities). Future policy could impact this guidance: if federal AI policy is updated, FDA may revise its approach. But as of April 2026, this draft remains the primary document for drug- and biologic-related AI.

Internationally, the FDA guidance may influence other regulators. Firms often approach global submissions with an eye toward regulatory harmonization. While the FDA and EMA documents are not identical (EMA took a broader principles approach in its reflection paper ([15])), both emphasize risk and evaluation. Sponsors developing global strategies should ensure their credibility frameworks can satisfy multiple agencies. For example, if testing in Europe as well, companies may cite FDA’s guidance for their US discussions and EMA’s paper for EU bodies.

Expert and Industry Commentary

Early analyses of the FDA draft from law firms and consultancies have generally noted both its innovativeness and challenges. The American Bar Association’s Health Law section observed the “potential seven-step, risk-based framework” and noted the carve-outs for discovery and efficiencies ([43]). They viewed the draft as a cautious step to balance innovation and protection. The Foley & Lardner commentary highlighted that sponsors may have to prepare detailed AI documentation (data and governance) in submissions ([44]).

A DLA Piper white paper spelled out key takeaways and explicitly quoted FDA’s carve-outs ([3]). It also enumerated some practical strategies: for instance, at Step 6 they note that during early FDA consultations, sponsors should clarify if and how to submit the report.

Regulatory affairs experts generally agree that the guidance is a “good start” but is still just a draft. Some commentaries (e.g. from Bioprocess Online ([22])) stress that AI model maintenance and updating deserve more emphasis. Others suggest that terminologies like “internal workflows” should be defined clearly, so companies don’t misinterpret the liminal cases. The draft’s focus on context-specific risk is welcomed as science-based, but some have called for illustrative examples or case studies to be included for clarity.

Impact on Patients and Public Health

Ultimately, the goal of regulating AI in drug development is to protect patients. If AI tools churn out flawed analyses, patients could be harmed by unsafe drugs or denied beneficial therapies. The credibility framework aims to minimize this risk by raising the evidentiary bar. By requiring sponsors to thoroughly vet AI models, the FDA is reinforcing the principle that AI does not bypass safety and efficacy standards.

On the flip side, if properly implemented, this guidance could accelerate safe innovation. Patients could benefit sooner from new treatments because AI streamlines research (as long as there is confidence in the results). For example, AI could help identify rare responders to a therapy, enabling smaller, faster trials. The draft guidance encourages such uses by providing a clear pathway: sponsors know that if they follow the 7 steps, the regulatory review of their AI-augmented data should proceed smoothly.

However, there is a potential risk that the added burden might slow down some AI applications in the short term. Especially for smaller companies, conducting all the validation activities might delay product development. Over time, though, as best practices emerge and AI becomes more standardized, these hurdles may lessen. The FDA’s intention seems to be to codify requirements rather than stifle development.

An important patient-centric implication is that AI use in submissions will be documented and scrutinized. In cases where an AI model plays a major role, patients (and healthcare providers) can be assured that there is disclosure and analysis behind it. For example, if a drug labelling says efficacy was determined using an AI model “validated per FDA guidance,” this transparency is good for trust. Patient advocacy groups may push for consumer-friendly explanations of how AI was used in regulatory decisions. The framework sets the stage for this kind of accountability.

Future Directions

Several avenues could shape the future of AI regulation in drug development:

  • Finalization of Guidance: The FDA will likely issue a final guidance after reviewing comments from industry and other stakeholders. Comments may refine the document’s language, perhaps adding clarifications, examples, or even tightening definitions (e.g., explicitly defining “operational efficiency”). The final guidance might also address some of the open questions – for instance, how to handle AI in post-market safety monitoring, or whether to expect periodic updates to the credibility plan.

  • Emerging AI Technologies: Since January 2025, new AI paradigms (like advanced large language models, multimodal systems, etc.) have emerged. Generative AI (ChatGPT, DALL-E, etc.) is becoming more capable. The guidance excludes content creation, but sponsors might wonder about edge cases (e.g. if a generative model designs a new molecule). Regulatory policy will likely evolve to address novel technologies. For example, if an LLM were used as a decision support in evaluating preclinical data, would that be in scope? Over time, FDA guidance or Q&A may interpret the draft’s principles in light of new tech.

  • Standards and Best Practices: Expect the development of industry standards for AI validation. Groups like the International Council for Harmonisation (ICH) or ISO may work on guidance for AI models. If uniform standards emerge (e.g. a standard set of bias metrics or validation criteria for clinical AI), FDA and other regulators might adopt them. Being early with a credible framework could position FDA to influence such standards globally.

  • Integration with Digital Health Initiatives: The line between drugs and digital therapeutics may blur. If a drug’s effectiveness is partly mediated by an AI app (e.g. an AI that personalizes a diet or exercise regimen complementary to a drug), regulators will need to consider AI software in the regulatory mix. The current guidance might not fully cover combined drug-digital products, so that could be an area for future rules.

  • Post-market Surveillance of AI: As AI tools begin to be used, regulators will want to monitor their real-world performance. The guidance hints at life-cycle maintenance, but doesn’t detail post-market AI oversight. In the future, FDA may request periodic updates on deployed models, similar to post-market reporting for drugs (like annual reports), especially if the AI is adaptive. There is also the question of how FDA might audit AI systems (e.g. conducting software inspections or requiring submission of logs).

  • Global Regulatory Harmonization: The FDA guidance may encourage harmonization with other agencies. Future ICH guidelines on AI in drug development could emerge, given the global nature of pharma. Sponsors engaging with FDA’s 7-step could also cite analogous OECD or WHO AI principles for cohesion. Harmonization would benefit multinational trials and global submissions.

  • Innovation Incentives: Recognizing AI’s potential value, regulators and policymakers might introduce incentives for responsible AI. For example, there could be programs that provide “AI validation grants” for smaller companies, or expedited review pathways for therapies using rigorously validated AI. Alternatively, if AI proves essential to modern drug research, FDA might integrate AI assessment into existing expedited programs (like Breakthrough Therapy designation) to streamline AI oversight.

In sum, the January 2025 draft guidance is a landmark but not an endpoint. Its publication signals an ongoing evolution. The FDA and industry will continue to iterate on how best to benefit from AI while guarding patients. As one commentator noted, this guidance incorporates “comprehensive recommendations for the design, development, documentation, and maintenance of these AI models” ([40]), foreshadowing that the principles laid out today will form the backbone of AI regulation in life sciences for years to come.

Conclusion

The FDA’s January 2025 draft guidance on AI in drug development marks a pivotal moment in regulatory science. It is the first formal blueprint for how the FDA expects sponsors to handle AI tools whose outputs inform safety, efficacy, or quality decisions. By introducing a risk-based, 7-step credibility framework, the guidance provides a clear roadmap: define the question and context, assess risk, plan and conduct validation, and document outcomes. Importantly, the guidance also explicitly limits its scope to ensure that only AI with potential patient/quality impact is covered, while exempting purely internal or discovery uses ([4]) ([3]).

Throughout this report we have examined the details of the guidance and surrounding analysis. We have highlighted how it builds on regulatory precedents, how companies can implement its requirements, and what it means for future innovation. We showed that sponsors must distinguish between AI for evidence generation (in scope) and AI for operational efficiency (out of scope), and we have described the evidence and documentation needed to satisfy FDA that an AI model is “credible” for its purpose. With extensive citations, we underscored the significance of each element of the guidance.

Case examples illustrate that the guidance’s application can be intuitive: high-risk AI in trials needs robust validation, while advisory AI in manufacturing needs just enough evidence. We also noted areas needing attention, such as AI in pharmacovigilance, where FDA is actively seeking input.

Looking forward, this guidance is likely to shape industry practices for the foreseeable future. It invites sponsors to proactively integrate AI evaluation into their development pipelines, offers regulators a consistent way to review AI-backed data, and aims to reassure all stakeholders that innovation does not come at the cost of safety or quality. As AI technology continues to advance, both FDA and the biopharma industry will watch closely how this framework is received, refined through public comment, and implemented in real-world submissions.

In conclusion, the FDA’s draft guidance is a comprehensive attempt to bridge the gap between cutting-edge AI possibilities and well-established regulatory standards. By codifying the “6 Cs” of credibility – Context, Question, Risk, Credibility Plan, Results, Continuity (and step 7 as Conclusion) – FDA has set a high bar, but also a clear path. Sponsors who align with this framework will help ensure that AI-driven drug development remains safe, effective, and scientifically sound.

External Sources (44)
Adrien Laurent

Need Expert Guidance on This Topic?

Let's discuss how IntuitionLabs can help you navigate the challenges covered in this article.

I'm Adrien Laurent, Founder & CEO of IntuitionLabs. With 25+ years of experience in enterprise software development, I specialize in creating custom AI solutions for the pharmaceutical and life science industries.

DISCLAIMER

The information contained in this document is provided for educational and informational purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the information contained herein. Any reliance you place on such information is strictly at your own risk. In no event will IntuitionLabs.ai or its representatives be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from the use of information presented in this document. This document may contain content generated with the assistance of artificial intelligence technologies. AI-generated content may contain errors, omissions, or inaccuracies. Readers are advised to independently verify any critical information before acting upon it. All product names, logos, brands, trademarks, and registered trademarks mentioned in this document are the property of their respective owners. All company, product, and service names used in this document are for identification purposes only. Use of these names, logos, trademarks, and brands does not imply endorsement by the respective trademark holders. IntuitionLabs.ai is an AI software development company specializing in helping life-science companies implement and leverage artificial intelligence solutions. Founded in 2023 by Adrien Laurent and based in San Jose, California. This document does not constitute professional or legal advice. For specific guidance related to your business needs, please consult with appropriate qualified professionals.

Related Articles

Need help with AI?

© 2026 IntuitionLabs. All rights reserved.