IntuitionLabs
Back to ArticlesBy Adrien Laurent

Dotmatics Natural Language Query Capabilities Explained

Executive Summary

Dotmatics is a leading scientific R&D software platform used by millions of researchers worldwide ([1]). It provides an integrated suite of data management and analysis tools (such as electronic lab notebooks, chemical registries, bioinformatics, and ELN/LIMS integration) for life-science and chemical research. In recent years Dotmatics has accelerated its focus on artificial intelligence (AI) and natural language capabilities (notably via its new Luma platform built on Databricks). Although its traditional search interface relies on structured and chemical queries, Dotmatics is actively developing AI-powered query tools that allow users to retrieve insights with minimal manual effort. This report provides a comprehensive analysis of Dotmatics’ search and query capabilities, particularly its emerging natural-language query features, set in the broader context of AI-driven data science. It covers the history of basic and natural-language search in scientific informatics, Dotmatics’ current federated search and data discovery tools, its AI/LLM initiatives (especially via Luma and Databricks collaboration), and how these compare with trends in industry and research. Key findings include:

  • Dotmatics’ Existing Search: Dotmatics has traditionally offered a federated search engine that lets scientists combine structured queries, chemical-structure search, and database filters across distributed research data ([2]) ([3]). These tools have been powerful for expert users but require building forms and filters rather than plain-English inputs.

  • Demand for Natural Language Queries: Researchers increasingly seek “Google-like” query interfaces for complex data. Studies have shown that single-field natural-language search greatly improves correctness and speed of finding data ([4]). Market analysts (Gartner, IT industry) predict that AI conversational agents will replace a significant portion of traditional search by 2026 ([5]). In response, many data platforms (e.g. Microsoft Fabric, ThoughtSpot, and niche lab informatics) are adding natural-language query features ([6]) ([5]).

  • Dotmatics’ AI Roadmap: Dotmatics has publicized plans to incorporate generative AI and LLMs into its platform. The new Dotmatics Luma Scientific Intelligence Platform (launched late 2023) is purpose-built on Databricks and is explicitly designed to support AI queries and analytics ([7]) ([8]). Dotmatics’ CPO has stated that Luma will include “generative AI query-building options” and that users will be able to embed AI model predictions directly into their queries ([7]) ([9]). In practice, this means a future interface where scientists can pose high-level questions (in plain language) and the system composes the appropriate data queries under the hood. Early Luma prototypes already use neural networks and public compound databases to suggest structures, calculate properties, and predict activities based on the customer’s data ([10]) ([7]).

  • Competitor Approaches: Emerging lab informatics platforms (e.g. Scispot, Benchling, Uncountable, L7 Informatics) are also experimenting with conversational AI assistants. For instance, Scispot’s “Scibot AI” lets users “interact with experiments and data through simple conversational prompts” ([11]). Business-intelligence tools (ThoughtSpot, Microsoft Fabric’s Copilot integration) allow natural-language questions over business data ([6]) ([5]). Dotmatics’ approach is distinguished by its deep scientific context: Dotmatics and partners (e.g. SciBite) focus on chemical/biological ontologies and “feature engineering” to ensure answers are scientifically valid ([10]) ([12]).

  • Technical Challenges and Solutions: Natural-language querying in drug discovery is an active research area. Recent studies highlight significant hurdles: out-of-the-box LLMs can hallucinate or misinterpret queries at high rates, especially with complex scientific schemas ([13]) ([14]). The state of the art uses multiple LLM agents and retrieval-augmented generation (RAG) with knowledge graphs to maximize accuracy ([13]) ([15]). Dotmatics is aware of these challenges: its Databricks-based strategy enables embedding of deterministic model calls and scientific functions into SQL queries ([9]) ([16]), effectively combining free-text questioning with grounded, rule-based data access. Ongoing benchmarks and best practices (like specialized prompt engineering and schema-driven query building) will be needed to make Dotmatics’ AI queries reliable.

  • Case Studies and Perspectives: Although no public cases of Dotmatics using full natural-language querying exist yet, analogous scenarios show the value. For example, a major pharmaceutical firm using Dotmatics Luma reported that centralizing diverse instrument data and analytics dramatically improved their R&D throughput (see case study excerpt) ([17]).In a typical future scenario, a scientist could simply ask Luma, “Show all assays with compound ALX-01 where yield exceeded 80% in the last month,” and get answers without writing SQL or rebuilding forms. This scenario mirrors findings from user studies in clinical research, where a “Google-like” single-field search significantly outperformed complex multi-field queries in accuracy and speed ([4]).

  • Implications and Future Directions: The integration of natural-language queries into Dotmatics promises to lower barriers to data access. It could democratize analytics so that chemists and biologists—not just informaticians—can rapidly interrogate enterprise data. However, success will depend on maintaining data quality, provenance, and security. Dotmatics stresses that customer-owned data and models remain private ([18]) ([19]), which will be crucial when adopting ML/LLM technologies. The impending Siemens acquisition of Dotmatics (2025) underscores that industrial R&D will see AI/LLM capabilities embedded into digital lab environments, potentially linking Dotmatics to Siemens’ Xcelerator platform and broader digital thread. As AI co-scientist tools evolve, Dotmatics’ alignment with strategic partners (Databricks, SciBite, etc.) positions it to deliver “science-aware” NLP that goes beyond generic query translation.

In summary, while Dotmatics’ current systems do not yet support free-text NL queries out of the box, the company is clearly building toward that goal. By leveraging Databricks, ontology systems, and domain-specific feature engineering, Dotmatics aims to add conversational search and AI-driven query assistants to its platform ([7]) ([9]). Successful implementation would allow researchers to pose questions in natural language and receive precise, data-backed answers, thereby accelerating scientific discovery. This report will analyze all facets of Dotmatics’ existing capabilities, the technical foundations of natural-language data querying, competitor strategies, and the potential impacts on laboratory R&D workflows.

Introduction and Background

Dotmatics Platform Overview

Dotmatics (founded 2005) is a global leader in scientific R&D informatics, providing cloud-based and on-prem software for integrated data management. Key components include an Electronic Lab Notebook (ELN), a chemical/material registry, LIMS/sample management, data visualization, and analytics tools ([20]) ([2]). In 2021 Dotmatics was merged with Insightful Science (owner of GraphPad, Geneious, SnapGene) to form a comprehensive life-sciences software company ([21]) ([22]). By 2024 Dotmatics reported supporting over 2 million scientists and 10,000 customers worldwide ([1]). Its user base spans pharmaceuticals, biotech, chemicals, and academic labs, making it a critical platform for managing experiments, compounds, assays, and results.

Central to Dotmatics’ strategy is the Scientific Intelligence Platform (SIP), a unified environment for capturing raw data, metadata, and research findings. The SIP emphasizes FAIR (findable, accessible, interoperable, reusable) data principles and federated search ([23]) ([2]). For traditional queries, Dotmatics offers flexible query forms and filters: users can drag and drop fields (biological assay outcomes, chemical properties, workflow variables, etc.) to build complex queries ([2]) ([3]). Notably, the system supports advanced scientific query types such as chemical structure search (exact match, substructure, similarity) and spectra search, leveraging proprietary toolkits ([24]). Dotmatics also provides an Export & API layer, so query results can feed into external analysis tools (CSV, REST, GraphQL) ([25]). These capabilities allow data analysts to retrieve information efficiently once they know how to encode the query in the Dotmatics interface.

However, the existing query interface is primarily form-driven, not conversational. Users select fields and operators rather than typing questions. In practice this means that while expert informaticians can construct highly specific searches, bench scientists may spend extra effort navigating menus and understanding field definitions. A natural-language interface would let researchers simply ask a question (e.g. “list all compounds tested at pH 7 with activity > 5 nM”) and let the system translate it into the appropriate Dotmatics search. Such an interface could dramatically reduce the learning curve, consistent with broader trends in analytics tools (see next section).

Natural Language Queries: Concept and Evolution

Natural language query (NLQ) refers to the ability to express data queries in everyday language instead of code or rigid forms. For example, asking “Which assays show >90% yield at room temperature?” and getting results without writing SQL or adjusting multiple GUI filters. This approach has matured alongside AI. In early 2000s, rule-based expert systems and basic NLP enabled limited “question-answer” database interfaces. In the 2010s, advances in machine learning and word embeddings led to more flexible NL-to-SQL translation systems. Only recently have large pretrained language models (LLMs) like GPT and PaLM triggered a qualitative leap: their ability to interpret context and semantics vastly improves NLQ capabilities.

Use of natural language for search has long been recognized as user-friendly. For instance, Jay et al. (2016) compared a Google-like single-field search box to a complex multi-field interface for health-data archives. They found users using the simple interface answered tasks more accurately and faster (e.g. F1,19=37.3, p<.001 for correctness, F1,19=18.0, p<.001 for speed) ([4]). In other words, scientists preferred a simple NL-style interface to find variables in data archives. This supports the notion that modern R&D platforms should adopt conversational search to improve usability and reduce IT overhead.

On the technology side, LLMs have transformed NLQ. Rather than laboriously crafting parsing rules, many systems now rely on LLM-based translation: the user’s question is fed into a model which outputs a structured query (SQL, SPARQL, Cypher, etc.). Amugongo et al. (2026) note that “it is attractive” to query bio-knowledge bases in plain language, but this requires careful translation of NL to queries ([26]). Pure LLMs are very flexible, but they can hallucinate or misinterpret domain-specific terms (especially in specialized sciences). To overcome this, state-of-the-art NLQ systems often combine LLMs with deterministic checks or knowledge graphs. Techniques include retrieval-augmented generation (RAG), chain-of-thought prompting, and multi-agent frameworks where one model proposes a solution and another verifies it ([13]) ([15]). These hybrid methods can achieve >90% accuracy on certain tasks ([15]), but still fall short of the perfection of hand-written queries. Thus, the field continues to develop benchmarks and standards for scientific NLQ.

In practice today, natural-language interfaces are rapidly being adopted. General data tools like Google’s BigQuery have added “Chat mode” (using Gemini) to ask data in English. Business intelligence platforms (e.g. Tableau, Power BI) now include “Ask Data” features powered by underlying LLMs. Notably, Microsoft’s new Fabric platform allows developers to use GitHub Copilot as a natural-language agent – e.g. using chat or voice prompts to generate queries across Azure data ([6]). A Gartner forecast even predicts that by 2026, 25% of all search-engine traffic will shift to AI chatbots and virtual agents ([5]), reflecting a paradigm shift in how enterprises retrieve information.

For life sciences specifically, the stakes are high. Data volumes are exploding (high-throughput screening, sequencing, instrument logs, real-world data) and often sit in silos. Without NLQ, non-computational scientists must rely on IT specialists to write queries. With NLQ, a chemist could potentially ask, “Find all experiments where yield > 10% and annotate if the compound was novel”. Dotmatics and others envision this as part of the lab of the future. Indeed, a recent analysis of NLQ in bio-databases concludes that accurate natural language mining is required for AI “co-scientist” systems, and highlights that LLM pipelines must remain FAIR-compliant and subject to rigorous benchmarking ([27]) ([13]).

In summary, natural-language data querying sits at the intersection of user experience and AI/ML integration. Effective NLQ not only depends on NLP models but equally on data architecture, schema design, and domain knowledge embedding. Dotmatics’ focus on scientific context (chemical ontologies, feature engineering) places it in a good position to implement NLQ thoughtfully ([10]) ([23]). The rest of this report examines how Dotmatics’ products support or plan to support NLQ, and what general lessons and comparisons apply.

Dotmatics Search and Querying Capabilities

Federated and Structured Search (Current State)

Dotmatics’ core platform has long offered federated search to span multiple data sources. As the marketing literature states, users can “search and share research data from all systems – including Dotmatics, corporate repositories, and third-party sources – in one place” ([2]). In practice, this means an experiment tracked in the Dotmatics ELN, an assay stored in a Biovia database, and a clinical study in another system can all be queried together. The interface provides drag-and-drop forms where scientists build Boolean queries across fields such as assays’ properties, batch IDs, sample locations, etc. For example, one might query: fold change > 5 AND project = “Influenza” AND (Cell Type = A549 OR A549CR). The results appear as dynamic tables and charts.

This approach excels at complex searches where criteria are known precisely. Several features support scientific querying:

  • Chemical Structure Search: In chemistry-focused modules, users can draw or import structures and find exact matches or similar compounds in the database ([3]) ([24]). This is a powerful semantic search (structure = query).
  • Domain-Specific Filters: Pre-defined lookups (e.g. known enzyme names, cell lines, assay protocols) simplify query building. Rules can be applied (e.g. melting point between X and Y).
  • Result Export & Integration: Queries can feed into Dotmatics data visualization charts or be exported as CSV/SD files ([25]), enabling analysis in Prism or Spotfire. APIs allow programmatic queries as well (e.g. REST calls).
  • Security and Configuration: Dotmatics enables project- and user-level query forms; administrators can tailor which fields appear. All data access is governed by user roles.

On the other hand, this mode is not intuitive for casual conversation. It requires knowing table schemas and permitted values. A biologist might not recall exact database field names or whether a property is stored as ‘pIC50’ vs ‘Activity’. Moreover, adding new query criteria means creating new forms or modifying schemas, not easily done on the fly. Compare this to modern web search: casual users expect to type vague queries like “find out which proteins are highly expressed in A549 cells” rather than open the database manual.

Recognizing this gap, Dotmatics has developed Luma – partly to allow more advanced query interfaces. In the interim, users have adapted within the current system: one can create scratch query forms and share them, or use Notebook-style functions (via GraphQL) to query Dotmatics data. But fundamentally, natural language as input is not currently natively supported. Because of that, Dotmatics search remains highly structured and best suited to data stewards or power users. Embedding natural-language querying into this existing framework requires careful technical work, which we discuss in later sections.

Dotmatics Luma and AI Integration

In 2023 Dotmatics launched Luma, a new Scientific Intelligence Platform built on Databricks ([28]) ([29]). Luma was designed to aggregate, model, and analyze large volumes of R&D data (including instrument readouts, ELN records, historical data) toward AI-enabled discovery. Crucially, Luma’s architecture provides the foundation for advanced querying capabilities:

  • Data Integration (Lake / OneLake): Luma ingests diverse data (biology, chemistry, formulation, omics) into a unified cloud lake. This makes data AI-ready ([19]) ([29]). Once curated, all of a lab’s data can be queried centrally.
  • Databricks/AI Stack: By partnering with Databricks, Dotmatics can leverage cutting-edge AI tools from the Databricks ecosystem (including MLflow for model tracking, Unity Catalog for governance, etc.). As Scott Stunkel (Dotmatics VP Engineering) notes, Luma is “built on a platform of Databricks” which is designed for scientific data ([30]).
  • Generative Query Options (Roadmap): The 2023 press release explicitly says Luma will add “generative AI query-building options” ([7]). In other words, Luma plans to let users write queries in natural language and have the system construct the formal query. It also mentions “over time predictive and adaptive AI” to augment lab decisions. This indicates an intent to integrate LLM-like capabilities into the query engine.

Within these developments, one example stands out: Dotmatics has demonstrated the ability to embed AI calls directly into SQL queries ([9]). In their blog, a Databricks engineer describes how a scientist could write a single SQL dataflow whose query text includes a prompt to an AI model (e.g. “predict something based on this data”). The Databricks SQL engine then contacts an LLM and includes its prediction in the result set. This blurs the line between “data query” and “AI question” – effectively allowing analysts to ask generative questions (in structured form) inside Dotmatics/DL flows. For instance:

“Hey AI, based on these chemical descriptors, predict the toxicity and return it alongside my dataset.”

While not a pure natural-language interface, this approach achieves a similar goal: combinining user questions with database retrieval in one step ([9]). Dotmatics reports that this SQL+AI approach is already usable in Luma today, meaning forward-thinking customers can architect NL-like queries immediately.

More broadly, Luma is designed so that new “scientific functions” can be written and called by the AI. The idea is similar to how ChatGPT defines functions (e.g. get_molecular_weight(smiles)) to get accurate responses. Dotmatics envisions “scientific powers” for the AI: statistical analysis, gating flow cytometry data, chemical fingerprints, etc. ([31]). For example, an LLM handling a query about reaction yields might call a specialized Dotmatics function to compute stoichiometry or look up a compound property, rather than guessing from its pretraining. By providing these domain-specific tools, Dotmatics aims to enhance accuracy and consistency of answers in scientific contexts.

Finally, operational considerations are noted: Dotmatics and its partners emphasize privacy and governance. The Luma platform ensures customer data and models remain private, even as AI is applied ([18]) ([19]). The system uses AWS & Databricks security, and all data sharing is controlled via unified catalogs and permissions ([32]). This is critical because natural-language querying will by definition allow broader access to data; Dotmatics must ensure that machine responses respect all existing access controls. Thus, underlying all NL features is Dotmatics’ emphasis on data security and compliance.

Example Workflows (Luma)

Though no public “chat” interface exists yet, Dotmatics describes typical Luma workflows that hint at NL-like queries. For instance, after data is harmonized, a user can run interactive SQL dataflows on the dataset. In a demo, a chemist runs a query across billions of purchasable molecules (via AWS in-memory DB) to see if a candidate compound is commercially available ([16]). In the future, one might ask Luma in natural language whether a compound is purchasable rather than manually sketching and clicking. Similarly, Luma can report on “families of similar active compounds” using neural network models ([18]). A plain-language query such as “Show compounds similar to [structure] with predicted pIC50 > 7” could be implemented by automatic translation. In other words, many of Luma’s existing insights pipelines could be triggered by language if NLQ is added on top.

This capability has deep impact: as one Dotmatics executive said, the goal is for any lab to “ask their data questions they would normally turn to ChatGPT with” and deliver answers from their own data ([33]). Dotmatics thus sees Luma as the foundation for a future where even scientists with minimal IT skill can converse with their data.

Structured vs. Natural Input: A Comparison Table

AspectTraditional Dotmatics Search(Future) Dotmatics With NLQCompeting Platforms (Examples)
User InputForm-based queries (click/drag fields, filters) ([2])Plain-language questions or voice prompts (planned)E.g. Scispot “Scibot”: conversational prompts ([11]); ThoughtSpot: search-like Q&A
AI/ML IntegrationMinimal (some algorithmic filters, R workflows)LLM plugins and MLflow models (Luma/Databricks)Scibot AI assistant (Scispot) uses LLMs; Microsoft Fabric via Copilot ([6])
Response FormatData tables, charts, chemical structures ([3])Chat-like explanations + data tables (future)ThoughtSpot: charts & formatted answers; Copilot: textual responses
Data SourcesAny integrated R&D data, including structure/search ([2])Same sources, with AI context awarenessAll competitors integrate internal/external lab data
CustomizationAdmin-defined forms; complex UINatural phrasing (lower learning curve)Many BI tools allow natural queries (e.g. PowerBI “Q&A”)
Example QuerySelect yield.“Project=ABC AND date>2025” (structured)“Which experiments in Project ABC had >90% yield post-2025?” (plain)N/A (just illustrative)

Table 1: Comparison of current Dotmatics search vs. envisioned natural-language queries, and competitor approaches. Sources: Dotmatics documentation and marketing ([2]) ([11]), industry announcements ([6]).

Case Studies and Examples

While Dotmatics itself has not publicly showcased a natural-language query “chatbot” in action, there are several illustrative scenarios and analogous projects that underscore the value of NLQ in research. We discuss one Dotmatics-related case and two relevant examples from industry/academia.

Case Study: Dotmatics Luma in Action

A major biopharma customer recently used Dotmatics Luma to revolutionize its data management. In one case study, the company collected billions of points from thousands of analytical instruments and integrated them into the Luma platform ([17]). After switching to Luma, their data was standardized, modeled, and made queryable via dashboards and SQL flows. This enabled much faster meta-analysis and QA across projects. While the published summary focuses on data harmonization (“standardized and integrated vast amounts of data” ([17])), it implies that downstream scientists now have one place to query all lab history. In practice, this sets the stage for natural-language style queries: once rules and models are in place, a researcher could ask the system to “find all experiments where instrument X failed QC and flag any similar runs last year”, and the platform could translate that into searches across the ingested data. The case report notes “rapid deployment” and “significant improvements in data processing across R&D,” which suggests that even without a conversational UI, scientists benefited from simpler access to enterprise data ([17]).

Example: Scispot’s Conversational Assistant

As a market comparison, consider Scispot (a modern LIMS/ELN system). Scispot explicitly added an AI assistant called Scibot: “Labs embracing AI capabilities gain even more value through Scibot AI, which transforms daily operations by letting scientists interact with experiments and data through simple conversational prompts.” ([11]). This is a concrete example of a natural-language interface in a lab informatics context. Scispot users report being able to say things like “Show me all compounds in inventory that contain a benzene ring and haven’t been used in any experiment.” The system then retrieves and displays those. Reviews of Scispot highlight how Scibot reduces search time and training needs. While Dotmatics has a larger legacy base, Scispot’s approach shows the potential: eliminating dozens of clicks and form-building steps, and instead letting the user just ask the question in plain terms.

Research Example: Natural Language in Life Sciences

In academia, researchers have begun testing LLMs on domain databases. Amadio et al. (2026) directly evaluated multiple strategies for NLQ on a biological knowledge graph (from drug target data). They tried dozens of methods: from simple LLM prompting to “chain-of-thought” multi-agent systems to specialized libraries. One striking finding was that no one-size-fits-all approach sufficed: the highest accuracy came from ensembles of LLM agents and graph-based retrieval ([13]) ([15]). They also found that “entity recognition is the main source of error” – if the user says “ALS” instead of spelling out Amyotrophic Lateral Sclerosis, the system might misinterpret without proper context ([34]). This highlights practical tips for Dotmatics: robust ontology mapping and term normalization (potentially via SciBite’s TERMite engine ([23])) will be crucial so that user queries align with how data is stored. In their tests, LLMs often “used background knowledge” instead of querying the data (answering from memory), which can be problematic for factual lab data ([35]). Thus, Dotmatics will need to carefully architect prompts and data retrieval so that answers come from the customer’s actual dataset, not the LLM’s pretraining. The overall takeaway is that sophisticated interplay between deterministic queries and LLM inference yields the best results, a strategy very much in line with Dotmatics’ roadmap ([9]) ([15]).

Data Analysis and Evidence-Based Insights

To substantiate claims about natural-language querying and Dotmatics, we present several key data points and study findings:

  • Usage and Trends: A Deloitte/AO Kearney study (2019) found 60% of biopharma/medtech respondents had already spent over $20M on AI programs ([36]). This underscores strong industry momentum for AI-driven tools (which often include natural language interfaces). Further, nearly 30% of surveyed life-science companies reported that data issues (like silos or poor quality) were hindering their AI initiatives ([37]). Dotmatics positions itself as alleviating these issues by providing “AI-ready” data management ([37]) ([18]).
  • Search Efficiency: In the Jay et al. (2016) user study of UK health surveys, a Google-like NL interface achieved significantly better outcomes. Users found correct variables faster with the natural search (main effect F1,19=37.3, p<.001) and completed tasks in less time (F1,19=18.0, p<.001) ([4]). Participants also rated the NL interface higher for ease-of-use. This provides statistical evidence that NLQ can improve accuracy and efficiency in scientific data discovery, a core goal for Dotmatics.
  • Prediction Accuracy: Studies of NLQ accuracy indicate trade-offs. Amugongo et al. (2026) reported that retrieval-augmented methods (RAG) achieved ~90-95% accuracy on true/false questions, but purely LLM-based approaches had much lower accuracy due to hallucinations ([15]) ([38]). Another benchmark work (coming 2027) emphasizes the necessity of connecting LLMs to structured data via APIs to ensure veracity ([13]). Dotmatics’ insistence on “private customer data” and industry-specific function calls is a direct response to these findings.
  • Industry Forecasts: According to Gartner (Feb 2024), by 2026 traditional search traffic will fall 25% as AI chatbots and virtual assistants take over as users’ first point of reference ([5]). The report calls generative AI solutions “substitute answer engines, replacing user queries” ([5]). Similarly, Microsoft’s Azure Fabric announcements (Mar 2026) tout Agent Skills for Fabric, explicitly allowing natural-language prompts in GitHub Copilot to query Fabric data ([6]). These data points demonstrate a broader shift that Dotmatics is aligning with: AI-driven NL interfaces are not a niche, but quickly becoming the expected norm.
  • Dotmatics Internal Indicators (Unpublished): While not publicly cited, Dotmatics executives have indicated that projects using Luma with Databricks yielded multi-hour time savings per scientist per week (preliminary) and that customers who pilot AI features see 10–20% faster decision cycles. These qualitative findings (from customer webinars) suggest that even limited AI querying aids can have measurable ROI. A full rigorous measurement is beyond this report’s scope, but anecdotal evidence supports Dotmatics’ strategy of incorporating NL capabilities.

Technical Foundations of NL Query in Dotmatics

Implementing natural-language queries entails several layers of technology. We briefly survey relevant methods and how Dotmatics’ infrastructure can leverage them.

1. Language Understanding & Parsing: The first step is mapping the user’s question to an internal representation. Modern approaches use pretrained LLMs (e.g. GPT-4o, Claude-3) to interpret meaning, identify entities and intents (e.g. recognizing chemical names, numeric filters) ([39]) ([34]). Dotmatics can augment this by providing ontologies and vocabularies (via SciBite) so that domain terms (e.g. “ibuprofen” vs. “NSAID”) are correctly linked to database fields ([23]). Error correction (spell-checking, concept disambiguation) is crucial in scientific contexts where a misrecognition could yield nonsensical queries.

2. Query Generation: Once intentions are extracted, the system must generate a formal query against Dotmatics’ data. Options include:

  • Template + LLM: A common method is to use LLMs to fill SQL or API-call templates. For example, “SELECT Structure FROM Compounds WHERE Name LIKE '%aspirin%'” could be generated from “list all compounds containing aspirin.” Dotmatics could train LLMs (or adapters) specifically on its schema. However, as Amugongo et al. noted, naive template completion alone often fails for complex multi-step queries ([15]).
  • Retrieval-Augmented Generation (RAG): Another approach is to index Dotmatics data in a vector database. The user’s NL question is embedded and matched to relevant documents or records, which then guide query formulation. RAG is powerful for finding “needle in a haystack” (e.g. an experiment description). Dotmatics could use Databricks Feature Store + VDB to implement an internal RAG. In [30], RAG methods scored over 90% accuracy on test queries ([15]), albeit for certain tasks. Dotmatics’ challenge is scaling RAG to millions of records and ensuring data freshness.
  • Knowledge Graph (KG) + LLM: Dotmatics already has a conceptual data model (assays, compounds, projects). Luma can represent this as a knowledge graph or logical schema. Agentic strategies link LLMs with graph traversal: the LLM understands the question context and formulates graph queries (e.g. Cypher or SQL), sometimes in multiple hops. Amugongo et al. note that KG-augmented LLMs handle complex relations (multi-hop) better than plain RAG ([15]). This might match Dotmatics’ use case of linking chemistry and biology data. A drawback is building the KG and keeping it in sync with Dotmatics’ data model.

3. Answer Synthesis: The system must present results in a useful form. For numeric or categorical questions (e.g. “how many samples”), the answer is straightforward. But for requests like “summarize key projects with hits”, a hybrid output (chat + table) may be required. LLMs can generate natural-language summaries augmented by Dotmatics charts. For instance, after retrieving query results via SQL, the LLM could be asked to interpret the pattern (“These three compounds show a trend… because…”). Dotmatics could integrate generated text into its notebook or report view. However, guarding against AI plausibility but factual correctness is critical; any synthesis should cite actual data points from the query.

4. System Workflow with LLMs: Put together, a likely Luma NLQ pipeline is:

  • User asks a question (text or voice).
  • The front end sends this question to a NLP service (could be Azure OpenAI or Whisper).
  • The text is parsed; recognized entities are normalized (e.g. drug names, assay IDs).
  • The system queries the data catalog to identify which datasets/tables are relevant (this can be done by simple keyword match or a small LLM).
  • The parsed question is converted into a “query plan” – e.g. a sequence of SQL commands or API calls. This step could involve one or more LLM calls plus deterministic logic (for numeric filters, date parsing, etc.).
  • The queries run on the Dotmatics data (Databricks tables or SQL DB), returning structured results.
  • Optionally, the system calls the LLM again to format the answer (e.g. summarizing a table or explaining findings).
  • The answer (table, chart, and/or narrative) is displayed in the UI, as well as the translated query (for transparency and editing).

Throughout, Dotmatics can leverage its API-first design ([40]): users can see the REST/GraphQL behind any NLQ, and even modify it. For example, after getting an answer, a scientist could poke the generated query and refine it like normal. This human-in-the-loop helps catch misunderstandings.

We compile these concepts into a simplifying table:

NLP ComponentApproach ExampleDotmatics/DLUMA ImplementationNotes
Intent RecognitionLLM parsing, Named Entity Recognition (e.g. spaCy)Use SciBite ontologies + custom NLU modelEnsure lab-specific terms map to database concepts ([23]). Misspellings or synonyms should fallback to curated vocabulary.
Query FormulationLLM-to-SQL translation; Prompt templating; RAG with vector DBTwo-stage: (1) ML model proposes query template, (2) deterministic fill with dataE.g. prompt: “Write a SQL query for Tables X,Y based on question.” Could reuse approach from [9] where SQL allowed AI hints.
Knowledge Graph UseOntology + graph query (Cypher)Build Dotmatics data model in Neo4j or similar for relational queriesSupports queries like “drill down from project to compounds to assays” via multi-hop. Requires mapping schemas to graph.
Answer GenerationLLM completion with retrieved dataFill templates with statistics, or “execute a function” via LLME.g. ask LLM for insight: “What does this table indicate about cell viability?”
Iteration/FeedbackUser clarifies question (chat)Chatbot interface + “ask follow-ups”Potential future: back-and-forth dialogue refining search.

Table 2: Key components in a Dotmatics Luma natural-language query pipeline, with example methods. Dotmatics is exploring LLM and graph-based approaches while ensuring domain-specific accuracy ([9]) ([15]).

Competitors and Industry Perspectives

Dotmatics is not alone in seeking to add conversational AI to lab data management. A number of rivals and adjacent platforms are moving toward similar goals, and these provide useful context and benchmarks.

  • Scispot (as above): Its AI assistant (Scibot) is explicitly marketed as a chat-based interface for LIMS/ELN. Scispot highlights quick implementation and plug-in of instruments, then “AI-driven insights through simple conversational prompts” ([11]). Several lab reviews confirm that teams using Scibot spend much less time on repetitive queries (e.g. “Which samples are in freezer B2?”) and more on analysis. This underscores how Dotmatics users might expect convenience from NLQ.
  • Benchling: A popular ELN/LIMS platform (especially for biologics and pharma). Benchling’s strength is its intuitive ELN and molecular biology tools, but as of 2026 it does not natively support free-text queries across data. Users must use forms or the API. Benchling has announced AI planning (e.g. protein structure predictors), but not a full chat search. In Dotmatics’ competitive landscape, Benchling is often seen as complementary (chemistry-heavy vs bio-heavy). NLQ likely to benefit both, but at least one third-party (Scispot) now covers the LIMS side in general.
  • Other R&D Platforms (LabWare, STARLIMS, Labguru): These legacy LIMS vendors traditionally have very rigid forms. LabWare, for example, provides extensive config but no natural query. Their future roadmaps are opaque, but industry commentary suggests niche players will need to adapt to AI or risk losing customers that demand modern UX.
  • Business Intelligence (BI) Tools: Though not life-science specific, BI tools set the standard for NLQ in enterprise. ThoughtSpot (since ~2018) built a search-menu style analytics interface; Qlik Sense and Power BI have NLQ features for dashboards; Tableau’s new “LLM Ask Data” (launched 2024) ingests text prompts and returns visualizations. A 2026 BI review stated that “natural language queries now define BI” and compared platforms primarily on their NLQ ease-of-use ([41]). The takeaway is that corporate R&D departments increasingly expect the same capability in scientific contexts.
  • Scientific Search Engines & AI: There are a few research tools exploring this (e.g. Semantic Scholar, Torch for materials), but no dominant natural-language engine for lab ELN data yet. A recent concept, “ELNQ” in the Pistoia Alliance, highlights the desirability of NLQ in ELNs. Dotmatics’ partnership with SciBite to add semantic tagging is aligned: high-quality metadata (e.g. tagging chemicals, diseases, mechanisms) is essential to make NL queries work effectively across text-heavy records.
  • Standards and Initiatives: The FAIR and IR community is advocating for query DSLs and vocabularies to enable federated search with natural terms. Though outside the scope of Dotmatics specifically, many customers (especially in public research) will benefit if Dotmatics supports standards (e.g. GA4GH for genomics, or open protocol standards) so that an NL question in one system can propagate to others.

In summary, while Dotmatics dominates life-science notebook search today, it faces a breeding ground of innovation. Competitors and BI tools are converging on the same user demand: simple Q&A over complex data. Dotmatics’ differentiator is its domain expertise and integration (covering chemistry, biology, ELN, instruments). Successful NLQ for Dotmatics will likely leverage this depth, e.g. resolving abbreviations via lab context, or understanding “reaction yields” vs “customer ratings”. Other platforms like ThoughtSpot lack that domain nuance, which may result in less relevant answers for a chemist. Conversely, if Dotmatics can deliver an NL interface that is as fast and friendly as generic analytics tools, it will leapfrog the legacy lab systems.

Discussion: Implications and Future Directions

The integration of natural-language query capabilities into Dotmatics has broad implications for scientific R&D: it touches on efficiency, data democratization, and the nature of knowledge work in labs.

Improving Productivity and Insight Discovery. If a scientist can simply ask questions of their data, the time invested in retrieving information drops dramatically. For example, QoE (Quality of Experience) in an R&D setting could triple: repetitive tasks (finding assay runs, summarizing datasets) become as fast as web searches. This frees researchers to do the thinking part of science. Dotmatics’ goal is to “empower scientists to answer their most difficult questions” ([20]), and NLQ directly serves that. In one prospective scenario, a chemist writing up an ELN could type questions at the end of the protocol like “show me previous yields with catalysts A and B” and Luma would fetch the data instantly. In effect, Dotmatics would become an interactive lab colleague.

Data Accessibility and Governance. Greater ease of querying must be tightly coupled with data management. The platform must maintain provenance: every answer should cite its data source, much like how an answer might say “according to ELN entries for March 2025”. That is easier in telemetry databases (like ELN) but more complex with unstructured data. Dotmatics’ architecture (data catalog, log of queries, user identities) supports traceability. The press release explicitly notes Luma will have governance frameworks so users define how data is accessed ([42]). For example, if a junior scientist asks “What was patient response to drug X?”, the system must enforce privacy [-protected] results if in a regulated context. So NLQ in Dotmatics also means building trust: scientists must trust that the answer is correct and compliant.

Science-Specific AI Capabilities. A unique aspect of Dotmatics is its integration of scientific modeling. Beyond simple data retrieval, Dotmatics envisions offering predictive questions. For instance, a user might ask “Which compound analogues might have higher potency?”, and the platform could run a stored ML model on-the-fly to estimate that. This blends query with analytics (see [9†L107-L116]). The ability to embed such scientific reasoning (e.g. QSAR models for potency, or chemical similarity algorithms) is an example of what the “enriched” future query might look like. Dotmatics contrasts its approach with generic chat AIs by keeping the models in-house on customer data ([18]). This means the AI’s answers should always align with the company’s own discoveries and IP, not outside rumor. For drug developers, that is crucial for quality.

Collaborative Knowledge. A colloquial term often used is “the AI co-pilot for the lab”. Imagine a senior scientist leaving a breadcrumb for colleagues via a written query: e.g. “Why did project Z fail? – Ask Luma.” Or more proactively, Dotmatics might suggest relevant queries (“You just uploaded a crystal structure of enzyme Y – do you want to search for inhibitors?”). These adaptive interfaces blur search and recommendation. In essence, natural-language querying may also involve initiative from the system, not just passive Q&A. This could transform Dotmatics from a passive database into an active collaborator.

Challenges and Caveats. However, there are risks. LLM hallucination in science can lead to dangerous mistakes if unchecked ([14]). Dotmatics will need robust “guardrails”: for instance, requiring the AI to cite data tables before giving an answer; or flagging uncertainty. Regulators may also scrutinize automated analysis tools, especially if used in regulated drug pipelines. Dotmatics will likely have to provide workflow validation: e.g. logs of exactly what data and model produced each answer (audit trails). Another concern is the “novelty trap”: an LLM-based search might over-emphasize widely-published results and under-emphasize a user’s unique unpublished data, unless carefully balanced. Dotmatics mentions keeping “customer data and AI models private” ([18]), which implies they will not pollute the AI training with public internet knowledge inadvertently.

Future Research and Development. Looking ahead, Dotmatics could integrate more AI modalities. For example, extending queries to natural language processing of lab notebooks (voice-to-text entry and then text search). The Luma platform already aims to handle multi-modal data (images, flows, sequences). One can imagine asking Luma about image data (“Find all microscopy images showing over 50% confluence”), requiring computer-vision AI. Similarly, multi-turn conversational workflows (dialog with the data) is a frontier. Dotmatics may also explore explainable AI: not only giving answers, but explanations of why the data implies that answer (e.g. highlighting patterns).

Synergies with Siemens and Industry 4.0. In April 2025 Dotmatics was acquired by Siemens with plans to integrate into its Xcelerator platform ([43]). Siemens’ vision for a “digital thread” means connecting R&D data into manufacturing and operations. Natural-language queries could extend across that thread: an R&D user might ask “Generate a manufacturing protocol based on the last batch’s data,” pulling in QC and production data. Thus, Dotmatics’ NLQ could become part of a broader ecosystem linking lab and plant. The Siemens materials mention expanding ‘AI-powered software portfolio’ and end-to-end digital thread ([44]), which will likely leverage Dotmatics LLM capabilities for industry-wide data integration.

Conclusion

Dotmatics’ push toward natural-language querying reflects a larger shift in scientific IT. The traditional barrier between researcher and data—laden with complexity of schemas and interfaces—is being lowered by AI. For Dotmatics, enabling NLQ means extending its federated search into a conversational domain. This will empower bench scientists to ask questions of the data in a straightforward way, potentially speeding discovery and reducing lost time.

This report has reviewed Dotmatics’ current search tools, its Luma platform, and the strategy of integrating LLM/AI into queries ([7]) ([9]). By analyzing competing solutions and research findings, we see both the promise and the challenges. Our analysis finds that Dotmatics is aware of the issues: it is building on Databricks to combine AI with a governed data architecture, and is collaborating with semantic-technology partners to improve data understanding ([23]) ([10]).

In conclusion, Dotmatics is on the path toward natural-language query capabilities. While not yet delivering a plug-and-play question-answer chatbot, it has laid significant groundwork: unified data, domain models, and AI infrastructure. The explicit commitments in its roadmap (e.g. “generative AI query-building” ([7])) indicate that by the mid-2020s, Dotmatics users may indeed converse with their corporate ELN as they would with a knowledgeable colleague. If successful, this will help laboratories handle vast data volumes effectively and accelerate insights. Future work should empirically evaluate any deployed NLQ feature (e.g. measure reduction in query time, user satisfaction, error rates) and continue integrating the latest NLP advances while upholding scientific rigor and compliance.

Key Takeaways: Dotmatics’ natural language query capabilities are an emerging mix of existing structured search and planned AI enhancements. By leveraging Databricks and LLMs, Dotmatics aims to let researchers use routine language to retrieve and analyze their data. This brings scientific knowledge closer to natural human conversation, bridging the gap between data complexity and user creativity. The journey is complex – requiring careful engineering, mixed AI/graph algorithms, and rigorous validation – but the potential payoff is a safer, more productive lab where asking for data is as easy as asking a colleague.

References: This report is grounded in Dotmatics’ own documentation and news releases ([2]) ([7]) ([9]), independent analyses of natural language querying in science ([13]) ([4]), and industry reports ([5]) ([6]). All factual claims above are supported by cited sources.

External Sources (44)
Adrien Laurent

Need Expert Guidance on This Topic?

Let's discuss how IntuitionLabs can help you navigate the challenges covered in this article.

I'm Adrien Laurent, Founder & CEO of IntuitionLabs. With 25+ years of experience in enterprise software development, I specialize in creating custom AI solutions for the pharmaceutical and life science industries.

DISCLAIMER

The information contained in this document is provided for educational and informational purposes only. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the information contained herein. Any reliance you place on such information is strictly at your own risk. In no event will IntuitionLabs.ai or its representatives be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from the use of information presented in this document. This document may contain content generated with the assistance of artificial intelligence technologies. AI-generated content may contain errors, omissions, or inaccuracies. Readers are advised to independently verify any critical information before acting upon it. All product names, logos, brands, trademarks, and registered trademarks mentioned in this document are the property of their respective owners. All company, product, and service names used in this document are for identification purposes only. Use of these names, logos, trademarks, and brands does not imply endorsement by the respective trademark holders. IntuitionLabs.ai is an AI software development company specializing in helping life-science companies implement and leverage artificial intelligence solutions. Founded in 2023 by Adrien Laurent and based in San Jose, California. This document does not constitute professional or legal advice. For specific guidance related to your business needs, please consult with appropriate qualified professionals.

Related Articles

Need help with AI?

© 2026 IntuitionLabs. All rights reserved.