ComptoxAI is a new data infrastructure and toolkit designed to enable computational and artificial intelligence (AI) research in predictive toxicology. It is a free, public, and open-source resource developed by the Romano Lab (University of Pennsylvania) with contributions from the US EPA, and is supported by NIH grant funding.
At its core, ComptoxAI features a large, graph-formatted knowledge base, initially implemented in Neo4j, that rigorously aggregates and describes entities and relationships relevant to computational toxicology. This multimodal graph-formatted knowledge base integrates data from a diverse array of public third-party databases, including AOP-DB, DSSTox, Drugbank, Hetionet, PubChem, Reactome, and Tox21.
The platform provides diverse classes of users, including biomedical researchers, public health and regulatory officials, and the general public, with multiple interfaces for access and analysis. Users can access the knowledge base via:
- A Python package for programmatic access and data analysis.
- A REST web API for integration into other applications.
- A browser-based graphical interface for simplified data query and visualization.
- Direct access via the Cypher query language to a public copy of the Neo4j database.
Key capabilities demonstrated by ComptoxAI include the use of a “shortest path” module to identify mechanistic links between chemical exposure and disease, an “expand network” module to identify communities linked to toxicity, and a quantitative structure–activity relationship (QSAR) dataset generator. The goal is to rapidly answer complex questions about toxicology that are infeasible using previous technologies and data resources.