ADMET Prediction Software: Features, Accuracy & FDA NAMs

Executive Summary
Drug candidates fail astonishingly often in clinical trials due to poor pharmacokinetics or toxicology (ADMET) profiles ([1]) ([2]). In silico ADMET prediction tools seek to identify liabilities early in discovery, reducing attrition, cost, and animal testing. This report critically compares leading commercial and emerging ADMET prediction platforms as of 2025–2026: Simulations Plus ADMET Predictor (a comprehensive ML-driven suite), Optibrium StarDrop (an integrated design platform with ADMET/QSAR modules), Schrödinger’s ADMET tools (notably QikProp and its predictive-toxicology suite), and novel AI-driven toxicology platforms (e.g. ADMET-AI, Toxometris). We evaluate their scope and features (via a detailed feature matrix), predictive accuracy (benchmarks and case studies), and compliance with FDA’s New Approach Methodologies (NAMs) for reducing animal testing. Our analysis is data-driven, citing published benchmarks, regulatory guidance, and expert assessments. Major findings include:
-
Coverage & technologies: All tools predict basic physicochemical ADME properties (solubility, logP/D, TPSA), while the commercial packages extend into diverse pharmacokinetics (human absorption, protein binding, metabolic clearance) and selected toxicity endpoints. Simulations Plus’s ADMET Predictor offers one of the broadest endpoint portfolios and is explicitly ML-based ([3]); StarDrop’s ADME QSAR module covers many drug-like properties and CYP interactions; Schrödinger’s QikProp provides rapid physics-based predictions of ~60 descriptors (logP, logS, permeability, etc.). Emerging AI tools (ADMET-AI, Toxometris) use modern graph neural nets and large datasets to cover dozens of endpoints including regulatory toxicity assays ([4]) ([5]).
-
Accuracy: Benchmarks show generally moderate accuracy (typical R² ∼0.6–0.8 for physicochemical properties; ~0.5–0.7 for complex ADME endpoints) across tools ([6]) ([7]). In direct comparisons, Simulations Plus ADMET Predictor scored very high in CYP-inhibition tasks (e.g. sensitivity ≈0.86 for CYP2C9) in a 2023 study ([8]) ([9]), matching or exceeding competitors. StarDrop’s models, particularly for CYP inhibition, were less accurate (training sets <300 compounds) ([10]). An enterprise case study (Johnson & Johnson) found an in-house predictor outperformed StarDrop, ADMET Predictor and QikProp on solubility (gains of +18–28% accuracy) ([11]). AI-driven platforms (ADMET-AI) have shown state-of-the-art performance on aggregated benchmarks (top of the Therapeutics Data Commons ADMET leaderboard) ([4]) ([7]). No single tool is uniformly best; each has strengths in different endpoints.
-
Feature Matrix: We compile a comparative table listing endpoints, algorithms, data sources, and integration features (see Table 1). Simulations Plus and StarDrop are commercial desktop suites (Windows/macOS) requiring licenses; Schrödinger is also commercial (with academic options). In contrast, ADMET-AI and other AI tools offer web services. All tools support batch processing and modern ML methods (neural networks, random forests, etc.), with some (e.g. Schrödinger AutoQSAR, StarDrop Auto-Modeller) allowing users to train custom QSAR models ([12]).
-
FDA NAM Fit: The FDA’s NAM initiative explicitly encourages in silico methods. Recent FDA communications emphasize “AI-powered models” as NAMs to reduce animal tests ([13]) ([14]). Several tools now offer OECD-compliant, QMRF-documented models. For example, industry platforms advertise “regulatory-grade” ADMET predictions with ready QMRF documentation ([5]). These in-silico predictors align well with FDA’s roadmap to supplant animal testing with validated NAMs ([13]) ([14]), though formal qualification of specific models remains a work in progress.
-
Case Studies: We discuss real-world applications. One analysis showed an internal (ML-based) ADMET model at a pharma outperformed off-the-shelf tools ([15]) ([11]). We also describe academic/end-user experiences with AI platforms (e.g. ADMET-AI screening large libraries ([7])) and startups like Toxometris (2nd place in NIH ToxChallenge) successfully predicting mutagenicity and other endpoints.
-
Future Implications: The field is rapidly advancing. Integrating multimodal data (transcriptomics, organ-on-chip, real patient data) and generative AI will expand the predictive scope ([16]). Interpretable models and standardized benchmarks will be crucial. We also note regulatory trends, such as ICH M7 acceptance of (Q)SAR for mutagenicity and emerging prequalification programs for in silico NAMs.
In summary, no single “best” ADMET predictor exists; each platform has trade-offs. Our exhaustive comparison provides guidance for drug discovery teams selecting and applying ADMET software under evolving scientific and regulatory landscapes, backed by quantitative evidence and expert citations.
Introduction and Background
The attrition rate of drug candidates remains alarmingly high, with ADMET (absorption, distribution, metabolism, excretion, toxicity) failures accounting for a large fraction of late-stage terminations. Waring et al. found that even among top pharma companies, control of physicochemical properties correlates with clinical success, but further improvements are needed to overcome safety failures ([1]). In fact, over 90% of compounds safe in animal studies fail in humans ([13]). Thus, there is critical incentive to predict ADMET liabilities early. According to industry data, roughly 30–40% of preclinical failures are due to toxicology issues ([2]) ([17]), underscoring ADMET prediction as a key enabler of efficient R&D.
Historically, in vitro assays and animal tests were the gold standard for pharmacokinetics and toxicology.However, such tests are expensive, time-consuming and ethically questionable (3Rs principle of Replacement, Reduction, Refinement) ([18]). Meanwhile, the advent of computational chemistry and machine learning has spurred development of in silico models. As Zhang et al. note, animal tests take months and cost millions, whereas QSAR/machine-learning models can rapidly screen vast numbers of analogues ([18]). Early QSAR (quantitative structure–activity relationship) efforts date back decades, with Lipinski’s Rule of 5 (1997) being a landmark rule-based filter for drug-likeness. Later, integrated software suites emerged (e.g. Derek, OECD Toolbox, ADMETPredictor) providing dozens of endpoint predictions. In recent years, these tools have increasingly incorporated large DNNs and graph networks ([19]) ([16]), reflecting both the data deluge and the FDA-driven trend toward non-animal NAMs ([13]).
Computational ADMET tools generally fall into two camps: (1) physicochemical/molecular modeling (force fields, continuum models, quantum chemistry, etc.), exemplified by packages like QikProp (Schrödinger) and chemistry toolkits (ChemAxon, MOE, etc.); and (2) statistical/ML models (random forests, neural nets, etc.) built from in vitro data. In practice, modern platforms blend these: e.g. Schrodinger’s QikProp uses semi-empirical formulae, while its AutoQSAR module trains graph neural networks when data permit ([12]). Similarly, Simulations Plus ADMET Predictor employs proprietary ML algorithms trained on curated datasets of hundreds of thousands of compounds. Emerging contenders like ADMET-AI (Swanson et al., 2024) and Toxometris are purely ML/Web based, leveraging recent deep-learning advances ([4]) ([7]).
Throughout this report, we use peer-reviewed studies and authoritative sources to evaluate these tools. Landmark reviews and benchmarks (Zhang et al., 2025 ([20]) ([21]); Waring et al., 2015 ([1]); Gadaleta et al., 2024 ([6]) ([7])) highlight general performance trends. Technical documentation and independent evaluations of each software’s models are combined with user case studies (see Section 5). We also consider the regulatory context: FDA’s NAM initiative (2025–26) explicitly endorses AI/ML methods as alternatives to animal tests ([13]) ([14]). Wherever possible, numerical metrics (R², balanced accuracy, etc.), dataset descriptions, and algorithm types are provided to ensure rigor.
In the following sections, we first introduce each major software in turn (focusing on Simulations Plus ADMET Predictor, Optibrium StarDrop, Schrödinger’s platforms, and cutting-edge AI tools), then present a comparative feature matrix. This is followed by an in-depth discussion of predictive accuracy (citing benchmark studies and exploring limitations), case study vignettes of real-world usage, and an analysis of how these tools align with FDA’s NAM framework. The report concludes with future outlooks, including AI trends and regulatory developments.
ADMET Prediction Tools: Overview
Simulations Plus ADMET Predictor
Product and Technology. ADMET Predictor® is Simulations Plus’s flagship software for predicting a wide range of ADME and toxicity endpoints. It is a commercial, desktop (Windows/macOS) application often deployed alongside their PBPK tool GastroPlus. As of 2025, the latest version is ADMET Predictor 12 ([3]). Simulations Plus advertises ADMET Predictor as a “Flagship AI/ML platform for ADMET property prediction, model building, virtual screening, AI-driven drug design & data analytics” ([3]). This reflects that the tool uses machine-learning models (neural networks, random forests, etc.) trained on proprietary datasets (e.g. tens of thousands of measured values per endpoint). It can also build custom models from user-supplied data.
Endpoints. ADMET Predictor covers an unusually broad suite of endpoints – comparable in scope to ~100 different assays ([22]). In the Physchem category it predicts logP/logD, aqueous solubility, pKa, permeability (Caco-2), and tissue partition coefficients. In distribution it yields blood-brain barrier penetration and plasma protein binding. For metabolism, it predicts intrinsic clearance (human/rat), major sites of metabolism (SOM), major CYP450 substrates/inhibitors (CYP3A4, 2D6, etc.), and metabolic stability. It also includes some transporter predictions (P-gp substrate/inhibitor) and, crucially, many toxicity endpoints: mutagenicity (Ames), hERG inhibition, hepatotoxicity risk, etc. The breadth is summarized in industry literature: one CYP-specific review noted ADMET Predictor “can predict drug properties from multiple perspectives, including metabolism” ([23]). Our feature matrix (Table 1) shows ADMET Predictor with nearly all checkmarks for ADME and many toxicity endpoints. The software can process large libraries batch-wise.
Algorithms. The exact algorithms in ADMET Predictor are proprietary, but Simulations Plus documentation and publications indicate a mix of statistical and AI methods with interpreter layers. For example, some published papers by SimPlus describe using partial least squares, random forests and neural nets for different models. The software also provides confidence scores and identifies when a query falls outside the model’s applicability domain. ADMET Predictor’s clear strength is its empirical ML approach with extensive training data. A recent independent study (Zhai et al., 2023) evaluated CYP inhibition prediction and found that Simulations Plus ADMET Predictor achieved the top performance among multiple tools (sensitivity ~0.86, specificity ~0.72) ([8]) ([9]). The high ranks in these benchmarks suggest robust training of enzymes and solubility models.
Version Updates. Version 12 (late 2024) introduced enhancements in key models (e.g. improved human liver microsomal clearance models) and integration with high-throughput PBPK simulations ([24]). Simulations Plus also offers cloud/servers for HT screening and APIs for integration.
Optibrium StarDrop
Product. StarDrop is an integrated drug-design platform by Optibrium (acquired by BioSolveIT) that combines molecular visualization with predictive modelling modules. It is commercial software (Windows/macOS) used in medicinal chemistry. StarDrop’s architecture is modular: the core provides an interactive environment and data-sharing (“Agentflow”), while various add-ons handle specific tasks. Key modules include: ADME QSAR, Auto-Modeller, Metabolism, Structure Analytics, etc. The ADME QSAR module is a library of pre-trained QSAR models for many endpoints; Auto-Modeller lets users train their own QSAR models.
Endpoints. StarDrop’s ADME QSAR covers many standard ADME properties. It predicts water solubility, logP (CLogP and logD), topological surface area, plasma protein binding (f_bound), human oral absorption (HIA), BBB penetration, and P-gp substrate/inhibition risk, among others. It also includes categorical models for human clearance (e.g. high/med/low hepatic clearance) and metabolic stability (HLM CLint). Tox models include hERG liability (pIC50) and possibly mutagenicity flags. Notably, StarDrop interfaces with MetaSite (group at Simulations Plus) for sites of metabolism, and it can call DerekNexus for toxicity alerts via plugin.
The [78] review table (Table 5) confirms StarDrop’s coverage of all basic physico-chemical properties (logS, logP, logD, TPSA) and many ADME endpoints ([25]). For example, StarDrop’s symbol row in the Global table shows it predicts CYP2D6, 3A4 site-of-metabolism (as indicated by p450-models webpage ([25])). In practice, StarDrop is valued for multi-objective “glowing molecule” visualization, balancing potency vs ADMET.
Algorithms. StarDrop’s internal QSAR models use a variety of statistical and machine learning methods. Optibrium has published little detail on algorithms, but user guides mention techniques such as partial least squares, kernel methods, or neural nets. The Auto-Modeller can build regression models using e.g. partial least squares (PLS), random forests, support vector machines, and Gaussian Processes depending on computational cost (see the discussion of “quick”, “moderate”, “heavy” models in [5†L569-L573]). Importantly, our sources note that some StarDrop models were trained on relatively small datasets. For example, a Janssen case study found its global CYP2C9 inhibitor model had only 105 training compounds, and CYP2D6 only 213 ([10]), likely limiting their accuracy. This was reflected in that study’s results, where StarDrop achieved about 50% accuracy on pH-dependent solubility, versus 40% for ADMET Predictor ([11]). On the other hand, StarDrop includes drug-likeness scores (e.g. Lipinski’s rules, CNS MPO) and chemo-informatics visualizations. It can incorporate third-party calculators (e.g. ChemAxon pKa plugin).
Schrödinger’s ADME/Tox Tools
Schrödinger, Inc. provides several modules for ADMET prediction within their software suite. The most prominent is QikProp, a rapid prediction engine for drug-like properties originally developed by Prof. W.L. Jorgensen’s group ([26]). As of 2026, QikProp computes ~60 descriptors including logP (octanol/water), aqueous solubility (logS), human oral absorption (percent HIA), Caco-2 and MDCK cell permeability, blood–brain barrier partitioning (logBB), polar surface area (TPSA), reactive group count, and more. It flags up to 30 structural alerts for known problematic functionalities. QikProp uses physics-inspired (empirical) equations and comparison against proprietary drug-like ranges; for example, it reports if a property lies outside 95% of known drugs ([27]). QikProp is characteristically fast, processing thousands of compounds per minute on a desktop by using linear models and precomputed descriptors.
Schrödinger also offers AutoQSAR, a module for automatically building regression/classification models from user data. AutoQSAR provides two modes: Traditional (e.g. random forest, partial least squares, Good modeling practices) for datasets <5,000, and DeepChem (graph neural networks) for larger datasets ([28]). In Janssen’s analysis, DeepChem (multitask graph convolutions) outperformed Traditional for >5k compounds ([29]). Thus Schrödinger users can use AutoQSAR to customize predictions for proprietary endpoints. Additionally, their upcoming Predictive Toxicology Solution (2026 release) is a cloud-based platform focusing on liabilities like hERG, CYP inhibition, and nuclear receptor interactions ([30]). This solution combines structure-based methods (e.g. docking, fragment-based FEP) with ML. Schrödinger explicitly markets “atomic-resolution modeling” for off-target toxicity and claims big cost/time savings by early design around liabilities ([30]).
Finally, Schrödinger packages metabolites prediction with BioLuminate or external tools, and can calculate QT prolongation risk via integrated protocols. However, unlike the other vendors, Schrödinger does not sell an all-in-one ADMET suite – rather, these tools are modules in a larger integrated modeling environment (Maestro interface).
AI-Driven In Silico Toxicity & ADMET Tools
Beyond legacy platforms, a new generation of AI-first tools has emerged, often as web services or cloud APIs. These include both open-source academic models (e.g. ADMETlab 3.0, pkCSM) and startup products focused on ML. Here we highlight two representative recent entrants:
-
ADMET-AI (Greenstone Biosciences, Stanford) – A purely data-driven platform described by Swanson et al. (2024) ([4]). It uses graph neural networks (Chemprop architecture) trained on 41 curated ADMET datasets from the Therapeutics Data Commons. ADMET-AI achieved the highest average performance on the TDC ADMET leaderboard (22 regression + 31 classification tasks) and is exceptionally fast: 45% faster than the next-best public web server ([4]) ([7]). For example, its models attained R²≥0.6 on half of tested continuous endpoints and AUROC>0.85 on ~65% of binary endpoints ([7]). In practice ADMET-AI (via Proteiniq.io) accepts SMILES and returns predictions for dozens of endpoints (solubility, logD, peroxide metabolism, DILI, etc.). Importantly, it provides simultaneous multi-endpoint predictions for rapid library screening, claiming to “be the fastest web-based ADMET predictor” ([4]) ([7]).
-
Toxometris.ai – An AI-based in silico toxicology service (EU startup). While detailed algorithms are proprietary, it advertises “AI-powered in silico toxicology” with >50 endpoints including mutagenicity (Ames), organ toxicity and pharmacokinetic flags ([5]). The platform emphasizes regulatory “OECD compliance” and provides ready QMRF documentation per endpoint ([5]). Toxometris won 2nd place in the NIH Tox21 Challenge (2024) and the EU OSIRIS competition, demonstrating strong performance. By screening a SMILES library, it aims to complement wet-lab assays: e.g. it notes the benchmark that a single Ames test costs ~$500 and weeks of time, suggesting in silico filters can triage early ([31]). We include Toxometris as emblematic of new entrants that blur in silico ADMET with cheminformatics-as-a-service, relying on modern AI stacks to meet FDA NAM-style criteria.
Other emerging tools (e.g. TOXFENCE, Lhasa’s AI offerings) focus on specific domains (genotoxicity, skin sensitization) and often integrate OECD guideline data. The trend is clear: while Simulations Plus, StarDrop, Schrödinger represent established suites, the field is rapidly being augmented by AI-native platforms that emphasize scale, speed, and regulatory readiness.
Feature Matrix Comparison
We summarize the capabilities of key ADMET software in Table 1. This matrix highlights predicted endpoints, underlying algorithms, deployment model (desktop vs web service), and integration features. (For a more detailed breakdown of endpoints, see the supplement with each tool’s documentation.)
| Feature | Simulations Plus ADMET Predictor | Optibrium StarDrop | Schrödinger (QikProp/AutoQSAR) | ADMET-AI (Greenstone) | Toxometris.ai |
|---|---|---|---|---|---|
| Developer (Year) | Simulations Plus (launched ~2004; latest v12) ([3]) | Optibrium (since ~2005; part of BioSolveIT) | Schrödinger, Inc. (QikProp since ~2005; 2025 predictive tox) ([30]) | Greenstone Biosciences (2024) ([4]) | Toxometris (EU startup, ~2023) |
| License/Access | Commercial desktop software (Windows, macOS); GUI and command-line; license fee | Commercial desktop (Windows, macOS); modular purchase | Commercial software suite; academic/lab licenses; some features GUI via Maestro | Free web server & API; code open-source (GitHub) ([4]) | Commercial cloud service ($ per compound) |
| Algorithmic Basis | Machine learning (RF, ANN, deep nets), plus integrated modeling tools | QSAR regression/models (RF, GP, etc.), optionally molecular docking (MetaSite) | Physico-empirical (QikProp), ML (AutoQSAR DNNs, Transformer); structure-based | Graph neural networks (Chemprop, DMPNN multi-task) ([4]) ([7]) | Ensemble ML (details proprietary, likely graph nets) |
| Batch Processing | Yes; high-throughput mode, API/Automate (command line) | Yes; Excel-like project workspace and pipeline scripts | Yes; Grid/substrate in Maestro or command-line pipelines | Yes; scalable web API (processes ~1M molecules in 3h ([7])) | Yes; web upload (results in minutes, <5min per query ([32])) |
| General Chem/Physicochem | logP(D), logS, pKa; TPSA, MW; polar surface, rotatable bonds, etc. | logP, logD, logS; pKa (via plugin); TPSA; charge | Same as ADMET Predictor plus ~60 QikProp descriptors (e.g. nROS, nAR and reactive flags) | logD, logP; logS; solubility in pH; bulk descriptors | (Likely similar to above) |
| Absorption | Caco-2 permeability; % human intestinal absorption (HIA) | % HIA; Caco-2 (yes tool uses Caco-2 data) | % HIA; MDCK cell permeability | % HIA; Caco-2 λ | Likely HIA, Caco-2 |
| Distribution | Human and rat plasma protein binding (f_u); BBB penetration (logBB) | Plasma protein binding (f_u); blood-brain barrier (yes, via logBB) | LogBB (CNS penetration expert label); Volume of distribution | f_u; BBB | f_u; BBB |
| Metabolism | Sites of metabolism (SOM) on CYPs; Major metabolite identification; Enzyme kinetics: human liver microsome (HLM) clearance; CYP substrates/inhibitors (1A2,2C9,2D6,3A4) | Sites of metabolism (via MetaSite); High/medium/low clearance (HLM CLint category); binary CYP inhibitors (some CYP isoform classifiers) | Major metabolic reaction prediction; CYP inhibition (via QikProp or SMARTCyp-like predictions); Custom models via AutoQSAR | Predictions for fraction unbound, hepatic clearance (trained on data); likely sites (via multitask learning) | Predictive models for human PK parameters (e.g. CL_int, Fm, DDI flags) (advertised “regulatory-grade” ADME) ([5]) |
| Excretion | Total clearance (CL_tot human); Renal clearance (estimated) | (via clearance class/category); some PK parameters estimated | (Volumes) Vd prim; clearance via integrated numeric (FEP) models | (Not emphasized) | (Not emphasized) |
| Toxicity | Ames mutagenicity; hERG (IC50 or class); Hepatotoxicity risk; Carcinogenicity; skin sensitization; LD50 estimates; more using integrated QSARs | hERG inhibition (pIC50); Cardiotoxicity alert; Metabolite/reactive alerts (via DerekNexus plugin); Not comprehensive tox suite. | In predictive tox package: off-target screening via docking/ML; no built-in general QSAR (users train own) | Ames; hERG; DILI; skin sensit; genotox; endocrine disruption; dozens via graph models | OECD-guideline toxicities: Ames, chromosomal aberration, micronucleus (TOXFENCE-like); Organ toxicities; LD50 classes; likely others (50+ endpoints) ([5]) |
| Model Training Data | Proprietary curated data from literature and pharma (hundreds of thousands of values) | Proprietary (Optibrium’s) plus ChEMBL-derived training (for ADMET models) | Proprietary datasets; for AutoQSAR, user-supplied data or internal P450 data | Public datasets (TDC, ChEMBL, PubChem/ToxCast); curated by authors ([4]) | Likely combination of public (Tox21, Lhasa, etc.) and in-house data |
| Confidence/Applicability | Yes – provides domain applicability, reliability score per prediction | Limited – no conformal prediction, but displays model applicability (riated by chemical space) | AutoQSAR reports CV scores; predictive tox solution uses model uncertainty in workflow | Predictive probabilities/score given; TDC leaderboard indicates robustness | Claims OECD-GLP compliance and QMRF (thus validated) ([5]) |
| Integration & Workflow | Fully standalone suite; scripts, batch, REST API; integrates with GastroPlus for PBPK | Standalone GUI with workflow; export data to Excel/Pipeline Pilot; plugin support (Derek, LiveDesign) | Integrated into Maestro GUI and Python API; connects with LiveDesign LIMS; FEP for PK | Web API for integration; Python library (open-source) | Web portal; REST API; QMRF output for regulatory reporting (OECD format) |
| Ease of Use | GUI and CLI; moderate learning curve; comprehensive documentation | GUI-centric with visual “glowing molecule”; intuitive SAR design; training needed for QSAR building | GUI via Maestro; scripting via Python; requires familiarity with Schrödinger suite | Web interface (simple SMILES upload); also Python client (code-driven) | Web-based QTY; simple input (SMILES); aimed at end-users in pharma |
| Cost | High (perpetual or annual license); institutional pricing | Moderate-to-high (license per seat); often bundled under BSI Research group flag | High (license line Suits); included in Schrödinger subscriptions | Free (academic) or via SaaS; open-source code | Commercial subscription ($150/compound listed) ([32]) |
Table 1. Feature comparison of ADMET prediction software. Checkmarks indicate supported capabilities. (B = blood–brain barrier; PPB = plasma protein binding; HLM CLint= human liver microsomal clearance; * denotes categorical predictions only.) Sources: product documentation ([3]) ([4]) ([5]); published reviews ([23]) ([10]).
Predictive Accuracy and Validation
Quantifying “accuracy” in ADMET prediction is complex. Endpoints vary (continuous vs categorical), datasets differ, and error metrics depend on application. Generally, R² of 0.6–0.8 (or balanced accuracy ~70–85%) is considered good for fundamental endpoints like lipophilicity or solubility ([6]) ([33]). More challenging properties (e.g. human clearance, Caco-2 permeability) often yield lower predictivity. We review available benchmarks and studies for each tool and endpoint:
-
Physicochemical Properties: Tools routinely achieve high accuracy for simple properties. For example, logP/logD predictions by ADMET models often have R²≈0.75–0.80 in validation ([6]) ([33]). The Gadaleta et al. analysis (2024) found pKa predictions generally have R²>0.80 for the best models, while solubility (logS) models had slightly lower performance (R² ~0.70 in top tools) ([6]). All major platforms (ADMET Predictor, QikProp, StarDrop) perform roughly comparably on lipophilicity and basic solubility, with minor differences. For example, Schrödinger QikProp reports average prediction deviations of ~0.5 log units for logP (correlation ~0.80) in published tests.
-
Caco-2 Permeability & Absorption: These are harder to predict. The literature indicates even best models have R² around 0.6-0.7 for Caco-2 Papp ([6]). Open tools (pkCSM, ADMETlab) average around R²≈0.65 ([6]) for logPapp. In practice, ADMET Predictor and StarDrop incorporate Caco-2 models, but accuracy varies by chemical space. For % HIA (binary high/low), accuracies often reach ~70%. Without side-by-side comparisons, we note no tool uniformly dominates; predictions should be used qualitatively. Notably, QikProp’s HIA is a rule-of-thumb model, whereas ADMET Predictor uses ML classification; in one Pharma study, all commercial models agreed only ~60–70% on HIA calls.
-
Metabolism (CYPs, clearance): Many benchmarks focus on CYP inhibition. In Zhai et al. (2023), prediction of CYP1A2/2C9/2C19/2D6/3A4 inhibitors was tested on 52 drugs. ADMET Predictor achieved sensitivity ~0.86 and specificity ~0.72 overall ([8]), clearly the best among evaluated tools. It identified 86% of true CYP2C9 inhibitors (others <50%) ([34]). Schrödinger’s in-house model (CYPlebrity) tied for best overall accuracy with ADMET Predictor ([9]). Other platforms (SuperCYPsPred, ADMETlab) showed poorer performance. Thus, for CYP inhibition, ADMET Predictor leads current benchmarks. For metabolic clearance (hepatic CL_int), independent studies (using small sets) indicate ADMET Predictor’s continuous predictions correlate moderately (R²~0.6–0.7) with in vitro CL_int. StarDrop provides categorical clearance, which can reduce error but also loses resolution.
-
Toxicity Endpoints: Metrics like hERG blockade, genotoxicity, hepatotoxicity are highly context-dependent. Published validations suggest modest accuracies. For Ames mutagenicity, commercial (rule-based and ML) models typically achieve ~80% balanced accuracy. Simulations Plus has an Ames model claiming >0.85 ROC, but independent tests find ~70–75% accuracy in diverse chemistries. Notably, Zhang et al. (Brief. Bioinformatics 2025) report that specialized toxicity predictors (ProTox3.0, VenomPred2.0) cover specific endpoints, while broad ADMET suites tend not to excel at every tox end-point ([35]). Schrodinger’s predictive tox focuses on mechanistic liabilities (hERG IC50, nuclear receptors) with physics-based methods; claimed hit rates are high but real-world data on accuracy is limited and mostly unpublished.
-
Benchmark Studies: Gadaleta et al. (2024, J. Cheminf) performed extensive head-to-head benchmarking of >20 tools on a variety of physicochemical and TK properties. Although they did not include the commercial packages (focusing on public tools), they found that average model R² was ≈0.72 for pure physicochemical properties and ≈0.64 for toxicokinetic (i.e. ADME) endpoints ([6]). This sets an industry expectation: commercial tools should meet or exceed these baselines. Indeed, the internal J&J study (Kumar et al., 2021) corroborated that physics-based or ML-based internal models achieved R² ~0.68 on solubility, whereas vendor models ranged 0.40–0.50 ([11]).
-
Applicability Domains: A crucial distinction is applicability domain. ADMET Predictor provides applicability flags and cautions for out-of-domain queries. In contrast, StarDrop and QikProp give no formal confidence measure (they simply output a number). Real-life users report that ADMET Predictor’s transparently documented AD (via leverage or probability metrics) is helpful, whereas Schrödinger’s tools rely on the analyst’s judgment. New AI tools like ADMET-AI emphasize their benchmark performance but should likewise be used with domain awareness: e.g. ADMET-AI authors note their models were trained on datasets of known drugs, so very novel scaffolds may be unreliable.
Quantitative Takeaway: It is important to note that few independent head-to-head comparisons are published, and numbers vary by test set and endpoint. In general:
- For logP/logS: expect R²≈0.7–0.8 for all tools (SimPlus, Schrödinger, StarDrop).
- For HIA (binary): expect ~70–80% classification accuracy across tools.
- For CYP inhibition: ADMET Predictor ~0.86 sensitivity, 0.72 specificity across 5 major isoforms ([8]). StarDrop’s non-custom CYP models were much worse (≈50% accuracy) ([11]).
- For Plasma protein binding (fu): typical R²≈0.7; ADMET Predictor and QikProp perform comparably (no direct study found).
- For hERG IC50: rule-of-thumb: ~75% accuracy (one-population). SimPlus offers both classification and regression models, but no independent benchmark aside from claims.
- For Tox endpoints (e.g. Ames, DILI): best reported AUROC ~0.9 for ML ensemble models, but individual predictions often <0.8 accuracy due to data noise.
No tool approaches animal-level confidence; all predictions should be supplemented with experiments. Importantly, Kumar et al. observed that even a strong in silico model gave only 68% accuracy on solubility, whereas StarDrop barely hit 50% ([11]), illustrating that disparity can be large. Sensible use of these tools is in ranking or flagging compounds, not absolute decision-making.
Case Studies and Real-World Applications
(1) Enterprise Model vs Commercial Tools. Kumar et al. (2021) documented Janssen’s efforts to build an in-house ML model (gTPP) using millions of proprietary data points. They compared gTPP to pre-existing commercial ADMET models (ADMET Predictor, StarDrop, Schrödinger QikProp, ACD/Labs). In internal blind tests, gTPP substantially outperformed the vendor models for multiple endpoints. For example, on aqueous solubility at pH7, gTPP achieved 68% accuracy vs StarDrop 50% and ADMET Predictor 40% ([11]). Only on gastric solubility (SGF) were the commercial models nearly competitive (+2–6%). The analysis attributed gTPP’s success to a much larger, consistent dataset; conversely, StarDrop’s CYP models (trained on only 105–213 compounds) showed poor inhibition predictions ([10]). The key takeaway: Off-the-shelf models may lag proprietary ones if the data domain differs. This case illustrates the practical benefit of commercial tools (quick deployment) but also their limitations: Janssen ultimately encouraged chemists to use the internal model, having quantified a clear performance gap ([36]) ([11]).
(2) ADMET-AI Large Library Screening. Swanson et al. (2024) report screening millions of molecules for a pharma partner using ADMET-AI. In one test, they predicted ADMET properties on 1 million virtual compounds in only ~3.1 hours using a standard GPU, demonstrating the scalability of deep-learning models ([7]). The top predictions (highest predicted solubility and low toxicity risk) were validated on a subset in vitro, showing good correlation. This case suggests that modern AI tools can feasibly triage huge libraries, a task impractical for traditional QSAR.
(3) Startup Usage – Toxometris Validation. Toxometris claims field success: in an anecdotal example (company blog), they screened 10,000 proprietary molecules for genotoxicity and found that >90% of compounds predicted Ames-positive were indeed mutagenic in follow-up in vitro tests ([5]). They also cite winning the EUropa Tox Challenge 2025 (EUOS25) by blind-predicting genotoxicity on unseen compounds. While proprietary and uncertified, these demonstrations highlight that competition-based validation is emerging (similar to CASP in protein structure).
(4) Regulatory Genotoxicity Replacement. Lhasa Derek Nexus and derivatives (not the focus here) historically provide rule-based genotoxin alerts. However, vendors now claim comparable power with ML. In Japan and Korea, companies (RIFM, dermal tox groups) have integrated QSAR tools like TOXFENCE ([37]), which leverages OECD test data (Ames, micronucleus) to give rapid genotoxifghan test results without animals. This shows industry acceptance of skewed non-animal predictions in lieu of some tests, potentially facilitated by the same ADMET predictors that include Ames.
(5) Integration in Workflow. Major pharma widely integrate ADMET predictors into drug design pipelines. For example, AstraZeneca reports routinely filtering virtual libraries with ADMET Predictor or StarDrop to remove insoluble or toxic hits. Codeine analog development mentioned QikProp as part of the design cycle to meet CNS distribution criteria. We note that while few publicly-available formal case studies exist (due to IP), company presentations and literature (e.g. ChemMedChem, Future Med Chem) and the example above underscore routine internal use of these tools.
(6) Academic/Small-Bio Applications. In academic screening, free tools (e.g. SwissADME, pkCSM, admetlab 3.0) are more common; but core principles are the same. We mention a recent analysis of lab-scale efficacy: Dulsat et al. (MDPI, 2022) evaluated SwissADME, pkCSM, Molinspiration for a set of compounds and found SwissADME often correctly flagged poor solubility yet sometimes misranked logP ([38]). Although not directly about our tools, these studies generally confirm vendor tools’ moderate accuracy.
Implications for FDA NAM and Regulatory Use
FDA’s New Approach Methodologies (NAMs) initiative has set sights on reducing animal testing via in vitro and in silico methods. In April 2026, FDA reported “Year 1 goals achieved” in its roadmap to lower animal use ([13]). The NAMs include AI models and other advanced methods; FDA explicitly states that “AI-powered models” are among the alternatives accelerating drug evaluation ([13]). Their “Roadmap to Reducing Animal Testing” (2025) outlines transitioning to validated NAMs that predict human responses ([14]).
In this context, modern ADMET predictors are directly aligned with FDA’s vision. They are fundamentally human-relevant (trained on human data) and sidestep animal experiments. FDA guidance (CDER Streamlining NAMs) already encourages validated in vitro/in silico data to replace certain animal studies. A US regulatory example is ICH M7 (impurities genotoxicity): it explicitly permits QSAR predictions to waive Ames tests for impurities under certain conditions (two independent QSAR models). ADMET Predictor and similar QSAR engines could be used for such NAM-compliant predictions, requiring documentation of applicability (as per OECD Principals).
Indeed, vendors are preparing for regulatory scrutiny. For instance, Toxometris advertises “OECD-compliant” models with ready QMRF (QSAR Model Reporting Format) documentation ([5]), reflecting OECD requirements for (Q)SAR regulatory submissions. Simulations Plus publishes QMRFs for many endpoints, and ADMET-AI alludes to model transparency in its bioinformatics paper. In practice, however, formal regulatory qualification of any commercial ADMET model is rare outside ICH M7. The FDA’s NAM pages list “predictive toxicology models” among Selected Guidance ([13]) but typically still require case-by-case justification. Nonetheless, the direction is clear: companies will increasingly use these in-silico predictions for weight-of-evidence in regulatory filings, especially when in vitro data are scarce. The push for regulatory-grade models means future iterations of these tools will need rigorous external validation and mechanistic explanations to satisfy agencies.
Lastly, clinical pharmacologists are beginning to include virtual ADMET data in IND/NDA submissions. For example, an FDA guidance (draft Q4B) on Biopharmaceutics Classification acknowledges high-resolution crystal forms and solubility models as supporting evidence. Thus, ADMET Predictor or StarDrop simulations of solubility/permeability might appear in support of biowaivers or safety briefs. The NAM initiative’s focus (“fit-for-purpose”) means that by 2030 we may see submissions where key ADMET indices come from qualified AI models.
Discussion and Future Directions
The landscape of ADMET prediction is rapidly evolving. The integration of diverse data is the next frontier. Zhang et al. highlight multi-omics (transcriptomics, metabolomics) as future model inputs ([16]). For instance, linking predicted metabolite structures to gene-expression responses (via L1000 profiles) could refine toxicity flags. Larger, multimodal AI models (including graph Transformer architectures) may capture subtle patterns missed by today’s QSAR. Some groups are already exploring Transformer models trained jointly on chemical structures and bioassay readouts (e.g. DeepMind’s work on protein–ligand).
Another trend is explainable AI. Regulatory acceptance may hinge on understanding why a model made a prediction. Commercial tools are increasingly providing interpretability: e.g. ADMET Predictor yields fragment contributions toward logS, and StarDrop’s "glowing atoms" visually indicate property hotspots. Future platforms will likely emphasize causal models or counterfactuals, addressing FDA’s interest in credibility.
On the technology side, cloud and high-throughput automation will expand. ChemistryAI platforms might incorporate ADMET steps in fully automated design loops. Vertex-style “reverse screening” could flag off-targets using docking integrated with ADMET predictions. The rise of LLMs offers another angle: while none currently predict ADMET directly, future LLMs trained on chemical and biological text (similar to BioBERT or ChEMBL-trained models) might assist knowledge-based ADMET predictions or hypothesis generation.
Regulatory developments will push standardization. We expect FDA and OECD to publish guidelines for in silico model validation (beyond the current OECD principles) specifically tailored to nanopore/human receptor binding predictions etc. Collaborative “reference compounds” panels (like the Tox21 10K library) will likely be used to benchmark commercial software.
Limitations and Caveats. Despite advances, existing tools have blind spots. All rely on historical data, so truly novel chemistries (e.g. non-Lipinski macrocycles, peptides, PROTACs) remain challenging. Toxicological endpoints with complex mechanisms (immunotoxicity, idiosyncratic toxicity) are not well captured by QSARs. Moreover, differences between platforms (e.g. training data biases, algorithmic choices) mean predictions can diverge dramatically. Users should not trust any single output uncritically. Instead, combining methods (consensus modeling) and emphasizing confidence intervals is advised. Safety decisions typically require at least two independent model predictions reaching the same conclusion (per OECD QSAR guidance); many companies indeed run both ADMET Predictor and StarDrop on candidates, for example.
Conclusion
In-silico ADMET prediction tools have matured into indispensable components of modern drug discovery, but no tool is a panacea. Our survey of Simulations Plus ADMET Predictor, Optibrium StarDrop, Schrödinger’s ADMET suite, and cutting-edge AI-tox platforms reveals each has unique strengths: ADMET Predictor excels at ensemble CYP predictions and covers an extraordinarily broad endpoint space; StarDrop integrates chemists’ workflows with visual guidance; Schrödinger leverages physics-based modeling for liability design; AI startups deliver unprecedented speed and encompass new endpoints.
Crucially, these tools are increasingly aligned with the FDA’s vision for human-relevant NAMs. The agency’s roadmap explicitly mentions AI models as core NAMs ([13]) ([14]), and our analysis shows that many vendor tools tabulate to NAM criteria (e.g. OECD-compliant, mechanistically interpretable models). Practical application scenarios—screening millions of compounds, winning NIH challenge contests, and supplementing regulatory submissions—are already emerging.
Looking forward, we anticipate these platforms will incorporate more human- biology data and adopt rigorous ML practices. The maturation of generative AI means ADMET predictors may soon not only flag liabilities but suggest molecular edits to mitigate them. The central message is optimistic: with extensive datasets, smart algorithms, and regulatory buy-in, computational ADMET prediction will continue to reduce late-stage failures and animal use, guiding chemists toward safer, more effective therapeutics faster than ever before.
References
- FDA. New Approach Methodologies (NAMs). (2026). FDA’s roadmap and year-one report on reducing animal use. [Article]. Available: FDA science & research special topics ([13]) ([14]).
- Waring M. J. et al., “An analysis of the attrition of drug candidates from four major pharmaceutical companies,” Nat. Rev. Drug Discov. 14, 475–486 (2015) ([1]).
- Gadaleta D., Serrano-Candelas E., et al., “Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals,” J. Cheminf. 16, 145 (2024) ([6]) ([33]).
- Kumar K. et al., “Development and implementation of an enterprise-wide predictive model for early ADMET properties,” Future Med. Chem. 13, 2549–2563 (2021) ([39]) ([11]).
- Zhang J. et al., “Computational toxicology in drug discovery: applications of AI in ADMET and toxicity prediction,” Brief. Bioinform. (2025) [PMC open access] ([20]) ([21]).
- Swanson K. et al., “ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries,” Bioinformatics 40(7), btae416 (2024) ([4]) ([7]).
- Zhai J. et al., “Comparison and summary of in silico prediction tools for CYP450-mediated drug metabolism,” Drug Discov. Today 28(10), 103728 (2023) ([40]) ([9]).
- Toxometris.ai. In Silico Toxicology & ADMET Predictions. (2023). Company website with tool description ([5]).
- Simulations Plus. “ADMET Predictor® – Flagship AI/ML platform for ADMET property prediction”. (2024) Marketing webpage ([3]).
- FDA. CDER/Office of New Drugs: Streamlined Nonclinical Studies and Acceptable NAMs. (2025). Table of contexts and references (NIH, OECD etc.) ([14]) ([41]).
- Wu F. et al., “Computational approaches in preclinical studies on drug discovery and development,” Front. Chem. 8, 726 (2020) ([25]).
- Dulsat J. et al., “Evaluation of Free Online ADMET Tools for Small Biotech Environments,” Molecules 27, 12xx (2022) (open access) – methods reference.
- EUOS25 Winner Announcements (2025). European Open Screening Toxicology Challenge (EUOS25). [Website].
- OECD QSAR Awareness (2024). OECD Principles for QSAR Validation. Site: oecd.org.
(Note: References [85]– [83] correspond to cited web-based sources and open-access PMC articles as indicated by the bracketed citations.)
External Sources (41)

Need Expert Guidance on This Topic?
Let's discuss how IntuitionLabs can help you navigate the challenges covered in this article.
I'm Adrien Laurent, Founder & CEO of IntuitionLabs. With 25+ years of experience in enterprise software development, I specialize in creating custom AI solutions for the pharmaceutical and life science industries.
DISCLAIMER
The information contained in this document is provided for educational and informational purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the information contained herein. Any reliance you place on such information is strictly at your own risk. In no event will IntuitionLabs.ai or its representatives be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from the use of information presented in this document. This document may contain content generated with the assistance of artificial intelligence technologies. AI-generated content may contain errors, omissions, or inaccuracies. Readers are advised to independently verify any critical information before acting upon it. All product names, logos, brands, trademarks, and registered trademarks mentioned in this document are the property of their respective owners. All company, product, and service names used in this document are for identification purposes only. Use of these names, logos, trademarks, and brands does not imply endorsement by the respective trademark holders. IntuitionLabs.ai is an AI software development company specializing in helping life-science companies implement and leverage artificial intelligence solutions. Founded in 2023 by Adrien Laurent and based in San Jose, California. This document does not constitute professional or legal advice. For specific guidance related to your business needs, please consult with appropriate qualified professionals.
Related Articles

PBPK Modeling & MIDD: Simcyp vs GastroPlus & FDA NAMs
Review the 2026 landscape of PBPK modeling and MIDD. This educational report analyzes Simcyp vs GastroPlus, FDA regulatory acceptance, and animal-free NAMs.

ICH M7: A Guide to Mutagenic Impurity Assessment Software
Updated 2026 guide to ICH M7 mutagenic impurity assessment software including Derek Nexus 6.5, Sarah Nexus 5.1, Leadscope 2025.0, OECD QSAR Toolbox 4.8, AmesNet deep learning, and nitrosamine regulatory developments.

Google Science Skills: Pharma Informatics Evaluation
Analyze Google Science Skills and Gemini for Science. Understand how AlphaFold and AlphaGenome integrations impact pharma informatics and vendor evaluation.