Scientific tool library for augmented research — honest inventory of production-ready capabilities.
Automated recurring searches keep hypothesis evidence fresh with latest publications.
Runs every 6 hours via systemd · API Status
The Forge provides computational tools that agents invoke during debates to strengthen arguments with evidence:
Each tool execution is logged for reproducibility and cost tracking.
Real Inventory: 700 tools currently available. This is our honest, working tool library — not aspirational vaporware.
Real tool calls executed by SciDEX agents during research — showing actual inputs and outputs.
Get protein-protein interactions from STRING DB.
gene_symbol=NR1D1, score_threshold=700, max_results=20
Search PubMed and return paper details with PMIDs. [2026-04-27] Fix: Already has term/search_query/t
query=Test Test, max_results=3
Query Allen Brain Atlas for ISH gene expression data across brain regions.
gene_symbol=MAPT
Data endpoint: /api/forge/analytics
Tool calls by hour (UTC) — success / errors
Actual tool call volume grouped by tool type
Search PubMed and return paper details with PMIDs. [2026-04-27] Fix: Already has term/search_query/terms aliases. All errors from 2026-04-21 burst are from older server version.
Search Semantic Scholar for papers.
Search OpenAlex for scholarly works with rich metadata.
Search ClinicalTrials.gov for clinical trials. [2026-04-27] Fix: Added condition=None alias for query parameter; agents were calling with condition='...'. Root cause: parameter name mismatch.
Get gene annotation from MyGene.info.
Fetch the abstract for a PubMed article.
Get median gene expression across GTEx v10 tissues.
Convenience function: search PubMed + Semantic Scholar + trials for a topic. [2026-04-27] Fix: Already has query/input/term/max_results aliases. Errors from 2026-04-21. Critical fix: pathway_flux_pipeline NameError prevented module import.
Get pathway associations from Reactome database.
Get protein-protein interactions from STRING DB.
Get comprehensive protein annotation from UniProt.
Query Allen Brain Atlas for ISH gene expression data across brain regions.
Get disease associations for a gene from Open Targets.
Extract figures from a scientific paper by PMID. Returns figure captions, image URLs, and descriptions via PMC BioC API, Europe PMC full-text XML, or open-access PDF extraction. Use when you need to see visual evidence (pathway diagrams, heatmaps, microscopy) from a cited paper.
Get clinical genetic variants from ClinVar.
Search across multiple paper providers (PubMed, Semantic Scholar, OpenAlex, CrossRef) with unified deduplication and local caching.
Query Allen Brain Cell Atlas for cell-type specific gene expression.
Get disease associations for a gene from DisGeNET.
Search NIH RePORTER for funded research projects on a topic.
Run pathway enrichment via Enrichr API.
Get gene associations for a disease from DisGeNET.
Get AlphaFold protein structure prediction info.
Query Human Protein Atlas for protein expression across tissues and cells.
Get GWAS genetic associations from NHGRI-EBI GWAS Catalog.
Get functional enrichment for a gene list from STRING DB.
Get drug compounds targeting a specific gene/protein from ChEMBL database.
Query KEGG for pathways involving a gene.
Query MSigDB gene set membership for a gene via Enrichr genemap. [2026-04-27] Fix: Already had max_results alias (OK). Errors from older server version; current code handles these.
Get compound information from PubChem. [2026-04-27] Fix: Added gene_symbol=None, max_results=None as accepted params (ignored); agents calling with wrong parameters.
Query BrainSpan for developmental gene expression across brain regions and ages.
Query EMBL-EBI Expression Atlas for differential expression experiments.
Get mouse models and phenotypes from Mouse Genome Informatics (MGI).
Get disease annotation from MyDisease.info.
Query InterPro for protein domain and family annotations.
Search for disease-associated methylation changes.
Query DGIdb for drug-gene interactions and druggability information.
Query gnomAD for population variant frequency data for a gene.
Ingest a list of paper dicts into the local PaperCorpus cache.
Get comprehensive gene annotation from Ensembl REST API.
Run the full paper review pipeline: 1. Fetch paper metadata via paper_cache.get_paper 2. Extract named entities (gene, protein, disease, pathway, phenotype, brain_region, cell_type, drug) via LLM 3. Cross-reference each entity against the knowledge graph (knowledge_edges table) to find edge counts 4. Find related hypotheses (by entity/gene match) 5. Find related knowledge gaps (by entity match) 6. Flag novel findings (entities with 0 KG edges) 7. Generate structured review summary via LLM 8. Write result to paper_reviews table
Query OMIM for Mendelian disease phenotypes associated with a gene.
Search for drug information using open pharmacological databases.
Query EBI Proteins API for disease-associated protein variants.
Find diseases similar to a query disease based on shared gene associations.
Automated pipeline that searches PubMed for new papers related to top hypotheses and updates evidence
Query EBI QuickGO for Gene Ontology annotations of a gene.
Query CZ CELLxGENE Discover for single-cell gene expression data.
Search Europe PMC for biomedical literature with rich metadata.
Query BioGRID for protein-protein and genetic interactions.
Search NCBI GEO for gene expression and genomics datasets.
Search Pathway Commons for pathways, interactions, and complexes involving a gene.
Query STITCH for chemical interactions with a gene/protein.
Query Bgee for gene expression across anatomical structures including brain regions.
Query Open Targets for RNA expression across 100+ tissues for a gene.
Query PharmGKB for pharmacogenomics drug-gene relationships.
Query Agora for AMP-AD Alzheimer's Disease multi-omic target scoring.
Query ClinGen for gene-disease validity classifications.
Query COSMIC (Catalogue of Somatic Mutations in Cancer) for gene mutation data.
Search Human Phenotype Ontology (HPO) for phenotype terms and gene associations.
Query EBI IntAct for experimentally validated molecular interactions.
Get transcription factor binding sites from JASPAR database.
Query Monarch Initiative for disease-gene-phenotype associations.
Query Allen Mouse Brain Aging Atlas for age-stratified gene expression.
Query Ensembl for regulatory features in the genomic neighborhood of a gene.
Query HGNC for authoritative gene nomenclature and family classification.
Query Open Targets Genetics for GWAS loci linked to a gene via L2G scoring.
Search RCSB Protein Data Bank for experimental protein structures.
Query NIH Pharos TCRD for drug target development level and druggability data.
Query WikiPathways for biological pathways containing a gene.
Retrieve protein-ligand binding affinity data from BindingDB.
Query EBI Complex Portal for experimentally validated protein complexes.
Search ENCODE for epigenomics experiments targeting a gene/TF.
Query GTEx v8 for cis-eQTLs: genetic variants that regulate gene expression in brain.
Query IMPC for standardized mouse knockout phenotypes across biological systems.
Query JensenLab DISEASES for text-mining gene-disease confidence scores.
Fetch the NCBI gene summary and RefSeq description for a gene.
Query OmniPath for directed signaling interactions and PTMs involving a gene.
Query Open Targets Platform for drugs linked to a gene target.
Query Open Targets for drug tractability and modality assessments.
Retrieve UniProt-curated protein features for a gene: PTMs, active sites, binding sites.
Search ChEMBL for small molecules, drugs, and chemical probes by name.
Query GTEx v8 for splicing QTLs (sQTLs): variants that affect RNA splicing in brain.
Find gene sets containing this gene across 12 cross-database libraries via Enrichr.
Query Open Targets for mouse model phenotypes associated with a gene.
Start a stateful PaperCorpus search session. Returns the first page; subsequent pages can be fetched by calling again with incremented page.
Query Alliance of Genome Resources (AGR) for cross-species gene orthologs.
Search EBI BioStudies for transcriptomics, proteomics, and functional genomics datasets.
Search EBI Ontology Lookup Service (OLS4) for disease, phenotype, GO, or chemical terms.
Annotate genetic variants using Ensembl Variant Effect Predictor (VEP).
Get articles that cite a specific paper via Europe PMC.
Query GWAS Catalog for all traits associated with a specific genetic variant.
Get age-related methylation changes (epigenetic clock) for a gene.
Get cell type marker genes from PanglaoDB single-cell RNA database.
Query ProteomicsDB for protein abundance across human tissues.
Retrieve publication metadata from CrossRef by DOI.
Query OpenGWAS for PheWAS (phenome-wide) associations of a genetic variant.
Get CpG island information and literature for a gene region.
Cross-reference a compound across chemistry/pharmacology databases via UniChem.
The 1000 Genomes Project sequenced genomes from 2,504 individuals across 26 populations,
The 10x Genomics Spatial Research data portal provides access to publicly available
4D Nucleome Network data portal for nuclear organization and dynamics.
Access 3D genome organization data including Hi-C, ChIP-seq, and imaging.
Query ABC Atlas (Allen Brain Cell Atlas) and MERFISH spatial transcriptomics data.
How to use the Adaptyv Bio Foundry API and Python SDK for protein experiment design, submission, and results retrieval. Use this skill whenever the user mentions Adaptyv, Foundry API, protein binding assays, protein screening experiments, BLI/SPR assays, thermostability assays, or wants to submit protein sequences for experimental characterization. Also trigger when code imports `adaptyv`, `adaptyv_sdk`, or `FoundryClient`, or references `foundry-api-public.adaptyvbio.com`.
Non-profit plasmid repository for molecular biology research.
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
AlgaeBase is a comprehensive database of taxonomic and nomenclatural
The Allen Brain Atlas is a comprehensive collection of gene expression and neuroanatomical
Query the Allen Brain Atlas API for in-situ hybridisation (ISH) or microarray expression energies across brain structures for a given gene symbol. Use when a hypothesis targets a specific brain region or cell population and needs grounding in region-specific expression data (adult mouse/human brain atlases).
Query Allen Brain Atlas for ISH expression data across brain regions.
The Alliance of Genome Resources (AGR) is a federated database of model organism
Access predicted protein structures from AlphaFold DB.
Fetch AlphaFold protein-structure prediction metadata for a gene symbol or UniProt accession — confidence scores, PDB file URL, and 3D viewer link. Use when a hypothesis hinges on a protein's fold, domain architecture, or druggable-pocket geometry, when structural plausibility of a mechanism must be checked, or when a downstream artifact needs a structure reference before docking or mutagenesis reasoning.
AlphaGenome - AI-powered genome annotation and prediction.
Comprehensive QTL and association data for livestock species including
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
Annotate a specific claim (quote) on a SciDEX page using the W3C TextQuoteSelector.
Comprehensive database of commercially available antibodies with unique identifiers
APID: Agile Protein Interactomes DataServer
Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.
Uniformly processed RNA-seq data from GEO, SRA, and other sources.
EMBL-EBI's gene expression database.
Comprehensive Python library for astronomy and astrophysics. This skill should be used when working with astronomical data including celestial coordinates, physical units, FITS files, cosmological calculations, time systems, tables, world coordinate systems (WCS), and astronomical data analysis. Use when tasks involve coordinate transformations, unit conversions, FITS file manipulation, cosmological distance calculations, time scale conversions, or astronomical data processing.
ATCC is one of the world's premier biological resource centers, with 3,800+
Comprehensive strain-linked information on bacterial and archaeal biodiversity.
Benchling R&D platform integration. Access registry (DNA, proteins), inventory, ELN entries, workflows via API, build Benchling Apps, query Data Warehouse, for lab data management automation.
Access to Bgee - database for retrieval and comparison of gene expression patterns
Search scientific papers and retrieve structured experimental data extracted from full-text studies via the BGPT MCP server. Returns 25+ fields per paper including methods, results, sample sizes, quality scores, and conclusions. Use for literature reviews, evidence synthesis, and finding experimental details not available in abstracts alone.
BiGG Models is a database of genome-scale metabolic network reconstructions
Access to BindingDB - public database of measured binding affinities between
BioCarta provides curated biological pathway maps and data.
BioConda is a channel for the conda package manager specializing in bioinformatics
Bioconductor provides tools for the analysis and comprehension of high-throughput
BioContainers is a community-driven project providing Docker/Singularity containers
BioCyc is a collection of 19,000+ Pathway/Genome Databases (PGDBs) for model
Protein-protein, genetic, and chemical interactions from model organisms.
EMBL-EBI's repository for biological imaging data.
Query curated and non-curated systems biology models.
BioPlex: High-quality protein-protein interaction network from human cells.
Access NCBI BioProject database: genomics, transcriptomics, and other omics project metadata.
Comprehensive molecular biology toolkit. Use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez). Best for batch processing, custom bioinformatics pipelines, BLAST automation. For quick lookups use gget; for multi-service integration use bioservices.
Search and retrieve preprints from bioRxiv and medRxiv.
Database for metadata about biological samples from EBI.
Placeholder for BioSearch API - a cell type and tissue search service.
Unified Python interface to 40+ bioinformatics services. Use when querying multiple databases (UniProt, KEGG, ChEMBL, Reactome) in a single workflow with consistent API. Best for cross-database analysis, ID mapping across services. For quick single-database lookups use gget; for sequence/file manipulation use biopython.
EBI database for multi-omics studies and supporting data.
bio.tools is the ELIXIR registry of software tools and databases for life sciences,
Sequence similarity search against NCBI databases.
European project mapping epigenomes of primary hematopoietic cells.
BMRB is the international repository for NMR spectroscopy data of biological
Large-scale database of published functional neuroimaging experiments with
BrainSpan Atlas of the Developing Human Brain - transcriptional atlas of
Brassica Database - Genomic data for Brassica species.
Access to BRENDA - comprehensive enzyme information system with kinetic data,
BV-BRC (Bacterial and Viral Bioinformatics Resource Center) provides comprehensive
Combined Annotation Dependent Depletion - Deleteriousness scores for variants.
CancerHotspots - resource for statistically significant recurrent mutations
Hierarchical classification of protein domain structures.
Carbohydrate-Active enZymes Database - classification and annotation
Access to cBioPortal for Cancer Genomics - comprehensive cancer genomics database
Comprehensive characterization of 1,000+ cancer cell lines with genomics,
CellMarker is a comprehensive database of cell type markers across human and
NCI-60 cancer cell line database with genomics and drug response data.
The Cell Ontology provides a structured controlled vocabulary for cell types
Access the Cellosaurus knowledge resource on cell lines.
Query CZ CELLxGENE Discover 'Where is My Gene' API for single-cell cell-type expression data across human tissues.
Query the CELLxGENE Census (61M+ cells) programmatically. Use when you need expression data across tissues, diseases, or cell types from the largest curated single-cell atlas. Best for population-scale queries, reference atlas comparisons. For analyzing your own data use scanpy or scvi-tools.
Access to the Chan Zuckerberg CELLxGENE Data Portal - the largest standardized
CGD provides curated genomic and biological information for Candida species,
ChEBI is a freely available dictionary of molecular entities focused on 'small' chemical compounds.
Access to ChEMBL database of bioactive drug-like small molecules.
Return ChEMBL drug compounds targeting a specified gene/protein with ChEMBL IDs, activity types, activity values, and pChEMBL potency. Use when a Domain Expert is assessing druggability, cataloguing tool compounds or clinical candidates, or auditing chemical matter for a proposed target.
Drug compounds and bioactivity data for a gene target from the ChEMBL database of bioactive molecules.
Searchable database of 1B+ purchasable chemical compounds.
Search and retrieve chemical information from ChemSpider (Royal Society of Chemistry).
CircBank: Comprehensive database of human circular RNAs.
Database of circular RNAs (circRNAs) identified from RNA-seq data.
Google quantum computing framework. Use when targeting Google Quantum AI hardware, designing noise-aware circuits, or running quantum characterization experiments. Best for Google hardware, noise modeling, and low-level circuit design. For IBM hardware use qiskit; for quantum ML with autodiff use pennylane; for physics simulations use qutip.
Comprehensive citation management for academic research. Search Google Scholar and PubMed for papers, extract accurate metadata, validate citations, and generate properly formatted BibTeX entries. This skill should be used when you need to find papers, verify citation information, convert DOIs to BibTeX, or ensure reference accuracy in scientific writing.
Clinical Interpretations of Variants in Cancer.
Get CIViC curated clinical variant interpretations for a gene.
Clinical Genome Resource - Curated gene-disease validity, dosage sensitivity,
Query ClinGen for curated gene-disease validity classifications.
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
Write comprehensive clinical reports including case reports (CARE guidelines), diagnostic reports (radiology/pathology/lab), clinical trial reports (ICH-E3, SAE, CSR), and patient documentation (SOAP, H&P, discharge summaries). Full support with templates, regulatory compliance (HIPAA, FDA, ICH-GCP), and validation tools.
Access to 400,000+ clinical trials worldwide via the ClinicalTrials.gov v2 API.
Search ClinicalTrials.gov for clinical trials related to genes, diseases, or interventions. Returns NCT IDs, status, phase, conditions, interventions, enrollment, and sponsor info.
Clinical variant database from NCBI with pathogenicity annotations.
Return clinical variants from NCBI ClinVar for a given gene — variant names, clinical significance, conditions, and review status. Use when a Skeptic or Falsifier is probing whether genetic variants in a candidate gene have established clinical significance, or when verifying pathogenicity claims.
Fetch clinical genetic variants from NCBI ClinVar. Returns pathogenicity, review status, and associated conditions.
Constraint-based metabolic modeling (COBRA). FBA, FVA, gene knockouts, flux sampling, SBML models, for systems biology and metabolic engineering analysis.
Open source natural products database.
Predict coiled-coil regions in proteins.
CollecTF - database of experimentally validated transcription factor binding
Complex Portal is a manually curated database of stable macromolecular complexes
CompTox provides access to EPA's chemistry, toxicity, and exposure data for
Run a multi-perspective Mind Council deliberation on any question, decision, or creative challenge. Use this skill whenever the user wants diverse viewpoints, needs help making a tough decision, asks for a council/panel/board discussion, wants to explore a problem from multiple angles, requests devil's advocate analysis, or says things like "what would different experts think about this", "help me think through this from all sides", "council mode", "mind council", or "deliberate on this". Also trigger when the user faces a dilemma, trade-off, or complex choice with no obvious answer.
ConsensusPathDB: Molecular functional interaction database integrating
Identify functionally important regions by conservation analysis.
Comprehensive Resource of Mammalian protein complexes.
Catalogue of Somatic Mutations in Cancer - comprehensive cancer mutation database.
Comprehensive cotton research database with genomics, genetics, breeding,
CRAN is the primary repository for R packages, hosting 20,000+ packages
Utilities for CRISPR guide RNA design and analysis.
Search and retrieve metadata for scholarly works via DOI.
Search for preprints (bioRxiv/medRxiv) via CrossRef API.
Access to CTD - manually curated information about chemical-gene/protein
Large-scale cancer drug sensitivity resource with small molecule profiling
Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
Search 78 public scientific, biomedical, materials science, and economic databases via REST APIs. Covers physics/astronomy (NASA, NIST, SDSS, SIMBAD), earth/environment (USGS, NOAA, EPA), chemistry/drugs (PubChem, ChEMBL, DrugBank, FDA, KEGG, ZINC, BindingDB), materials (Materials Project, COD), biology/genomics (Reactome, UniProt, STRING, Ensembl, NCBI Gene, GEO, GTEx, PDB, AlphaFold, InterPro, BioGRID, Gene Ontology, dbSNP, gnomAD, ENCODE, Human Protein Atlas, Human Cell Atlas), disease/clinical (COSMIC, Open Targets, ClinicalTrials.gov, OMIM, ClinVar, GDC/TCGA, cBioPortal, DisGeNET, GWAS Catalog), regulatory (FDA, USPTO, SEC EDGAR), economics/finance (FRED, World Bank, US Treasury), demographics (US Census, Eurostat, WHO). Use when looking up compounds, genes, proteins, pathways, variants, clinical trials, patents, economic indicators, or any public database API query.
Pythonic wrapper around RDKit with simplified interface and sensible defaults. Preferred for standard drug discovery including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly.
NCBI's dbGaP is a repository for individual-level phenotype, exposure, genotype, and
Comprehensive database of experimentally verified PTMs from multiple sources.
Database of genetic variation in humans and other organisms.
Database of genomic variation and phenotype in humans with CNVs and rare diseases.
Molecular ML with diverse featurizers and pre-built datasets. Use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks. Best for quick experiments with pre-trained models, diverse molecular representations. For graph-first PyTorch workflows use torchdrug; for benchmark datasets use pytdc.
NGS analysis toolkit. BAM to bigWig conversion, QC (correlation, PCA, fingerprints), heatmaps/profiles (TSS, peaks), for ChIP-seq, RNA-seq, ATAC-seq visualization.
Query the Cancer Dependency Map (DepMap) for cancer cell line gene dependency scores (CRISPR Chronos), drug sensitivity data, and gene effect profiles. Use for identifying cancer-specific vulnerabilities, synthetic lethal interactions, and validating oncology drug targets.
Cancer Dependency Map - CRISPR screens and drug sensitivity data.
Query DGIdb (Drug-Gene Interaction database) for druggability categories and known drug-gene interactions for a human gene, aggregating evidence from >40 sources including DrugBank, PharmGKB, TTD, and ChEMBL. Use when a domain expert needs a druggability summary for a target, when competitive-landscape assessments need to enumerate existing drugs against a gene, or when a hypothesis proposes targeting a gene whose prior pharmacology must be checked.
Access to DGIdb - comprehensive database of drug-gene interactions and druggable
Query DGIdb for drug-gene interactions and druggability categories. Aggregates DrugBank, PharmGKB, TTD, ChEMBL, and clinical guideline data.
Comprehensive database of structural variation in healthy human genomes.
Extract cognitive patterns and thinking fingerprints from any text. Use this skill when the user wants to analyze how someone thinks, understand cognitive style, profile writing or speech patterns, compare thinking styles between people, asks "what's my thinking style", "analyze how this person reasons", "cognitive profile", "thinking pattern", "DHDNA", "digital DNA", or wants to understand the mind behind any text. Also trigger when the user provides text and wants deeper insight into the author's reasoning patterns, decision-making style, or cognitive signature.
DictyBase is the model organism database for Dictyostelium discoideum,
Diffusion-based molecular docking. Predict protein-ligand binding poses from PDB/SMILES, confidence scores, virtual screening, for structure-based drug design. Not for affinity prediction.
DIP: Database of Interacting Proteins
Database of disease-associated DNA methylation patterns.
The Disease Ontology provides a standardized classification of human diseases,
Get genes associated with a disease from DisGeNET with association scores.
Get disease associations for a gene from DisGeNET with scores and supporting PMIDs.
Database of gene-disease associations from GWAS, literature, and animal models.
Return disease associations for a human gene from DisGeNET with DisGeNET score, evidence counts, source databases, and supporting PMIDs. Use when a hypothesis links a gene to a disease that must be cross-checked against curated gene-disease aggregations, when an evidence auditor needs a non-GWAS view of genetic association breadth, or when a skeptic wants to surface competing disease contexts for a target.
Database of Intrinsically Disordered Proteins (IDPs) and regions (IDRs).
DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution.
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
Access to DrugBank - comprehensive drug and drug target database with detailed
Return drug information (brand/generic names, manufacturer, route, pharmacological class, MoA indicators) for a queried drug name via OpenFDA and PubChem (open alternatives to the proprietary DrugBank API). Use when a Domain Expert wants regulatory-labelled drug metadata or when a hypothesis references a named drug and needs class/indication grounding.
Online drug compendium with drug-target interactions and pharmacology data.
DSMZ is one of Europe's largest biological resource centers with 80,000+
Mock tool for deprecated_tool_detector test
Mock
Mock
Mock
Mock
Mock
Mock
Mock tool for deprecated_tool_detector test
Mock
Mock
Mock
Mock
Mock tool for deprecated_tool_detector test
Mock
Mock
Query the EBI eQTL Catalog for tissue-specific expression quantitative trait loci.
Comprehensive database of small molecule metabolites found in Escherichia coli K-12.
EcoCyc is a comprehensive database of E. coli K-12 biology, including metabolic
EcoGene - database for Escherichia coli K-12 genome and proteome sequences
eggNOG is a database of orthologous groups and functional annotation across
ELM: Eukaryotic Linear Motif resource
Electron Microscopy Data Bank - cryo-EM structure database.
Chemical structure search and supplier database with 10M+ compounds.
Massive-scale standardized microbiome analysis across diverse ecosystems worldwide.
Electron Microscopy Public Image Archive (EMPIAR) - public resource for
Access ENCODE (Encyclopedia of DNA Elements) data.
Gene set enrichment analysis using 200+ annotation libraries.
Run pathway or term enrichment over a gene list against an Enrichr library (e.g. GO Biological Process) and return ranked enriched terms with p-values and overlapping genes. Use when a hypothesis or experiment yields a gene set that needs functional interpretation, when a statistician wants multi-test-corrected enrichment evidence, or when comparing candidate gene lists before an artifact is promoted.
Gene set enrichment against GO Biological Process. Enter a gene list to find enriched pathways.
Retrieve multi-source phenotype associations for a gene via Ensembl REST.
Ensembl Plants provides genomic data for plant species.
Genome annotation, variation, and comparative genomics from Ensembl.
ENTEx is an ENCODE project providing comprehensive molecular characterization
Comprehensive toolkit for protein language models including ESM3 (generative multimodal protein design across sequence, structure, and function) and ESM C (efficient protein embeddings and representations). Use this skill when working with protein sequences, structures, or function prediction; designing novel proteins; generating protein embeddings; performing inverse folding; or conducting protein engineering tasks. Supports both local model usage and cloud-based Forge API for scalable inference.
Phylogenetic tree toolkit (ETE). Tree manipulation (Newick/NHX), evolutionary event detection, orthology/paralogy, NCBI taxonomy, visualization (PDF/SVG), for phylogenomics.
Integrated database resource for eukaryotic pathogens and related organisms.
Comprehensive resource for nucleotide sequence data and metadata.
EVA is an open-access database of genetic variation data from all species.
Search and retrieve literature from Europe PMC (formerly UK PubMed Central).
ExAC was a large-scale exome sequencing project that aggregated data from
Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientific data formats.
Expression Atlas provides information on gene and protein expression across
FAIRsharing is a curated, informative registry of three types of resources:
Functional Annotation of the Mammalian Genome (FANTOM) - comprehensive
Query FinnGen R10 (500K Finnish cohort) for fine-mapped genetic loci associated with a disease or trait.
FishBase is the world's largest fish database with 35,000+ species, containing
Parse FCS (Flow Cytometry Standard) files v2.0-3.1. Extract events as NumPy arrays, read metadata/channels, convert to CSV/DataFrame, for flow cytometry data preprocessing.
Public repository for flow cytometry data and analysis.
Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.
Access to FlyBase - comprehensive database for Drosophila genetics and genomics.
FooDB is the world's largest food composition database with 28,000+ foods,
AI-powered clinical variant interpretation and classification platform.
Comprehensive genomics database for fungal pathogens and model organisms.
The Galaxy Tool Shed is a repository of tools and workflows for the Galaxy platform.
Provides information about rare and genetic diseases from NIH/NCATS.
Large-scale pharmacogenomic database screening cancer cell lines with
GenBank is NCBI's primary nucleotide sequence database containing DNA and RNA
Gene Curation Coalition (GenCC) - authoritative resource for gene-disease
High-quality gene annotations from the GENCODE project.
GeneCards - the human gene database integrating genomic, transcriptomic,
Look up any human gene — returns full name, summary, aliases, and gene type from MyGene.info.
GENENAMES - HUGO Gene Nomenclature Committee (HGNC) database of
Query QuickGO (EBI) for Gene Ontology annotations.
Structured, controlled vocabulary for gene and gene product attributes.
Generate or edit images using AI models (FLUX, Nano Banana 2). Use for general-purpose image generation including photos, illustrations, artwork, visual assets, concept art, and any image that is not a technical diagram or schematic. For flowcharts, circuits, pathways, and technical diagrams, use the scientific-schematics skill instead.
Information about genetic tests and testing laboratories.
This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
Query Genomics England PanelApp for disease gene panel memberships.
Gene Expression Omnibus (GEO) - NCBI's gene expression repository.
Comprehensive geospatial science skill covering remote sensing, GIS, spatial analysis, machine learning for earth observation, and 30+ scientific domains. Supports satellite imagery processing (Sentinel, Landsat, MODIS, SAR, hyperspectral), vector and raster data operations, spatial statistics, point cloud processing, network analysis, cloud-native workflows (STAC, COG, Planetary Computer), and 8 programming languages (Python, R, Julia, JavaScript, C++, Java, Go, Rust) with 500+ code examples. Use for remote sensing workflows, GIS analysis, spatial ML, Earth observation data processing, terrain analysis, hydrological modeling, marine spatial analysis, atmospheric science, and any geospatial computation task.
Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between datasets, dissolving boundaries, clipping data, calculating areas/distances, reprojecting coordinate systems, creating maps, or converting between spatial file formats.
Gene Expression Profiling Interactive Analysis - cancer vs normal expression.
This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.
Fast CLI/Python queries to 20+ bioinformatics databases. Use for quick lookups: gene info, BLAST searches, AlphaFold structures, enrichment analysis. Best for interactive exploration, simple queries. For batch processing or advanced BLAST use biopython; for multi-database Python workflows use bioservices.
Submit and manage protocols on Ginkgo Bioworks Cloud Lab (cloud.ginkgo.bio), a web-based interface for autonomous lab execution on Reconfigurable Automation Carts (RACs). Use when the user wants to run cell-free protein expression (validation or optimization), generate fluorescent pixel art, or interact with Ginkgo Cloud Lab services. Covers protocol selection, input preparation, pricing, and ordering workflows.
Analyze and engineer protein glycosylation. Scan sequences for N-glycosylation sequons (N-X-S/T), predict O-glycosylation hotspots, and access curated glycoengineering tools (NetOGlyc, GlycoShield, GlycoWorkbench). For glycoprotein engineering, therapeutic antibody optimization, and vaccine design.
Database of carbohydrate structures from multiple sources.
GlyGen is a comprehensive glycan and glycoprotein database from NCBI/NIH.
Genome Aggregation Database - Population genetics and allele frequencies.
Query gnomAD (GRCh38) for population variant-frequency data for a gene — loss-of-function and missense variants with allele frequencies plus gene-level constraint scores (pLI, oe_lof, oe_mis). Use when a hypothesis depends on gene dosage, haploinsufficiency, or rare-variant burden, when a skeptic or falsifier wants to check whether a claimed pathogenic variant is too common to be causal, or when constraint must inform druggability and safety assessments.
Global Natural Products Social Molecular Networking platform.
Comprehensive database of genome and metagenome sequencing projects.
g:Profiler is a widely-used functional enrichment analysis tool that performs
Perform functional enrichment analysis using g:Profiler (ELIXIR).
USDA database for Triticeae (wheat tribe) genomics and genetics.
Gramene is a curated, open-source, data resource for comparative functional
Chimera-checked 16S rRNA gene database for bacterial and archaeal taxonomy.
High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
Genotype-Tissue Expression (GTEx) project data access.
Return median gene expression (TPM) across GTEx v10 tissues — 54 tissue types including 13 brain regions — for a given gene symbol, sorted by expression level. Use when tissue specificity matters (e.g. microglial TREM2, brain-restricted SNCA) or when a hypothesis makes an implicit tissue-of-action claim that must be checked.
Guide to Pharmacology (IUPHAR/BPS) - expert-curated resource of
NHGRI-EBI Catalog of human genome-wide association studies.
Genome-wide association study hits from the NHGRI-EBI GWAS Catalog. Query by gene or trait.
Return GWAS genetic associations (SNPs, risk alleles, p-values, trait names, study IDs) from the NHGRI-EBI GWAS Catalog for a gene or trait. Use when a Skeptic is probing genetic support for a candidate target or a Theorist needs SNP-level evidence for a mechanism claim.
Access to 100+ integrated biological databases via Harmonizome.
Official gene symbol nomenclature and gene information.
HIPPIE - Human Integrated Protein-Protein Interaction rEference
Lightweight WSI tile extraction and preprocessing. Use for basic slide processing tissue detection, tile extraction, stain normalization for H&E images. Best for simple pipelines, dataset preparation, quick tile-based analysis. For advanced spatial proteomics, multiplexed imaging, or deep learning pipelines use pathml.
Comprehensive metabolite database with chemical, clinical, and biological data.
NIH Human Microbiome Project portal for multi-omic microbiome data.
HOCOMOCO - Homo sapiens comprehensive model collection of transcription
Query the Human Phenotype Ontology (HPO) for phenotype-gene and disease-phenotype links.
Human Protein Reference Database - curated proteomic resource.
The Human Cell Atlas is an international consortium creating comprehensive reference maps
Standardized vocabulary for phenotypic abnormalities in human disease.
Protein expression across human tissues and cell types from the Human Protein Atlas. Includes subcellular localisation.
The Human Protein Atlas (HPA) provides comprehensive data on protein expression
Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
Structured hypothesis formulation from observations. Use when you have experimental observations or data and need to formulate testable hypotheses with predictions, propose mechanisms, and design experiments to test them. Follows scientific method framework. For open-ended ideation use scientific-brainstorming; for automated LLM-driven hypothesis testing on datasets use hypogenic.
ICGC provides access to cancer genomics data from multiple international projects.
Identifiers.org provides persistent, resolvable identifiers for life science data.
Image Data Resource (IDR) - public repository for reference image datasets
The Immune Epitope Database (IEDB) is a free resource for epitope-related data
Portal for accessing reference human epigenome data from international consortia.
Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
API wrapper for IMG/M (Integrated Microbial Genomes & Microbiomes).
Comprehensive database for microbial genome analysis and annotation from JGI.
IMGT is the international reference database for immunogenetics and immunoinformatics.
ImmGen generates a comprehensive database of gene expression and regulation in the
ImmPort is a repository of immunology data from NIH-funded research, providing
Integrated database of human immunology studies with standardized analysis
The IMPC is generating and phenotyping knockout mouse strains for every protein-coding
Create professional infographics using Nano Banana Pro AI with smart iterative refinement. Uses Gemini 3 Pro for quality review. Integrates research-lookup and web search for accurate data. Supports 10 infographic types, 8 industry styles, and colorblind-safe palettes.
Curated database of innate immunity genes, pathways, and interactions.
Protein-protein and molecular interaction database.
InterPro - Protein sequence analysis & classification.
iRefIndex - Consolidated protein-protein interaction database integrating
Comprehensive toolkit for preparing ISO 13485 certification documentation for medical device Quality Management Systems. Use when users need help with ISO 13485 QMS documentation, including (1) conducting gap analysis of existing documentation, (2) creating Quality Manuals, (3) developing required procedures and work instructions, (4) preparing Medical Device Files, (5) understanding ISO 13485 requirements, or (6) identifying missing documentation for medical device certification. Also use when users mention medical device regulations, QMS certification, FDA QMSR, EU MDR, or need help with quality system documentation.
Predict intrinsically disordered regions in proteins.
JASPAR - transcription factor binding profile database.
Provides access to JGI's genome database including fungal, plant, algal, and microbial genomes.
jPOST is a proteomics data repository developed in Japan that provides unified access
Query KEGG Disease database for disease entries and their causal/associated genes.
Access to KEGG pathways, genes, compounds, and disease information.
Electronic lab notebook API integration. Access notebooks, manage entries/attachments, backup notebooks, integrate with Protocols.io/Jupyter/REDCap, for programmatic ELN workflows.
This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.
Latch platform for bioinformatics workflows. Build pipelines with Latch SDK, @workflow/@task decorators, deploy serverless workflows, LatchFile/LatchDir, Nextflow/Snakemake integration.
Create professional research posters in LaTeX using beamerposter, tikzposter, or baposter. Support for conference presentations, academic posters, and scientific communication. Includes layout design, color schemes, multi-column formats, figure integration, and poster-specific best practices for visual communication.
Legume Information System - Genomic data for legume crops.
Access to LINCS L1000 gene expression data and CLUE drug repurposing platform.
Access to LIPID MAPS - comprehensive lipid database.
Search LIPID MAPS for lipid structure, classification, and biological roles.
LitCovid is NCBI's curated literature hub for tracking up-to-date scientific
Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).
Unified interface for searching scientific literature across multiple sources.
LncBase - database of experimentally validated and computationally predicted
LncIPedia: Long non-coding RNA database and annotation resource.
Database of experimentally validated lncRNA-disease associations.
Leiden Open Variation Database (LOVD) is a flexible, freely available tool for
Comprehensive maize research database with genomics, genetics, breeding,
MalaCards: Human disease database integrating 150+ sources.
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
Integrates human genetic data with model organism phenotypes and variants.
API wrapper for MassBank, a community database of mass spectra.
MassIVE (Mass Spectrometry Interactive Virtual Environment) is a community-driven
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
MATLAB and GNU Octave numerical computing for matrix operations, data analysis, visualization, and scientific computing. Use when writing MATLAB/Octave scripts for linear algebra, signal processing, image processing, differential equations, optimization, statistics, or creating scientific visualizations. Also use when the user needs help with MATLAB syntax, functions, or wants to convert between MATLAB and Python code. Scripts can be executed with MATLAB or the open-source GNU Octave interpreter.
Low-level plotting library for full customization. Use when you need fine-grained control over every plot element, creating novel plot types, or integrating with specific scientific workflows. Export to PNG/PDF/SVG for publication. For quick statistical plots use seaborn; for interactive plots use plotly; for publication-ready multi-panel figures with journal styling, use scientific-visualization.
Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering.
Database of peptidases (proteases, proteinases) and their inhibitors.
Access to MeSH - NLM's controlled vocabulary for indexing biomedical literature.
MetaboLights - metabolomics experiments and data repository.
Access metabolomics data, studies, and metabolite information.
Search the Metabolomics Workbench for public metabolomics studies.
MetaCyc - comprehensive database of experimentally validated metabolic
Gene annotation and pathway enrichment analysis.
Get cross-species methylation conservation data for a gene.
Get developmental stage-specific methylation dynamics for a gene.
Search for differential methylation between two conditions or cell states.
Search for DNA methylation studies and data for a specific gene.
Compare methylation patterns across multiple tissues for a gene.
Get tissue-specific methylation patterns for a gene.
Database of DNA methylation in human cancer.
Access to METLIN - Scripps metabolite database with MS/MS spectra.
Access to MGI - comprehensive database of mouse genetic, genomic, and biological data.
EMBL-EBI's metagenomics analysis platform (formerly EBI Metagenomics).
Comprehensive database of microbial genomes including bacteria and archaea.
Provides access to microbiome studies, sample data, and analysis tools.
MINT focuses on experimentally verified protein-protein interactions.
Primary repository for microRNA sequences and annotations.
miRDB is a database for microRNA target prediction and functional annotations.
miRTarBase is a database of experimentally validated microRNA-target interactions.
Human mitochondrial genome database of polymorphisms and mutations.
Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model.
ModBase is a database of annotated comparative protein structure models,
Database of RNA modification pathways and modified nucleosides.
Run and analyze molecular dynamics simulations with OpenMM and MDAnalysis. Set up protein/small molecule systems, define force fields, run energy minimization and production MD, analyze trajectories (RMSD, RMSF, contact maps, free energy surfaces). For structural biology, drug binding, and biophysics.
Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.
Community-driven mass spectral repository for metabolomics.
Query Monarch Initiative for disease-gene-phenotype associations from OMIM, ClinVar, HPO
Integrated cross-species gene-phenotype and disease data.
Comprehensive disease ontology integrating multiple disease terminologies.
The Mouse Cell Atlas is a comprehensive reference atlas of mouse cell types across
MSigDB is a collection of annotated gene sets from the Broad Institute, widely used
Tool for checking and formatting sequence variant descriptions according to HGVS.
Chemical and drug annotation service covering PubChem, ChEMBL, DrugBank, and more.
Disease annotation service covering MONDO, Disease Ontology, UMLS, and more.
High-performance gene query web service covering 27,000+ species.
Variant annotation service covering dbSNP, ClinVar, dbNSFP, COSMIC, and more.
Database of genome assemblies from NCBI.
Look up a variant in NCBI dbSNP for allele frequencies and annotations.
Query gene information from NCBI Gene database using Entrez eUtils.
Retrieve NCBI Gene Reference Into Function (GeneRIF) linked publications.
GTR provides access to information about genetic tests for inherited and somatic
Look up NCBI Medical Subject Headings (MeSH) descriptors and synonyms.
Search the NCBI Sequence Read Archive (SRA) for public sequencing datasets.
Access NCBI Taxonomy database - the comprehensive taxonomy database for all organisms
BioPortal is a repository of biomedical ontologies developed by the National Center
The GDC provides a unified data repository for cancer genomics programs including
Curated collection of biomolecular interactions and cellular processes.
National Cancer Institute's comprehensive cancer terminology system.
NDEx: Network Data Exchange for sharing biological networks.
Predict N-glycosylation sites in proteins.
NetPath: Curated resource of signal transduction pathways in humans.
Comprehensive toolkit for creating, analyzing, and visualizing complex networks and graphs in Python. Use when working with network/graph data structures, analyzing relationships between entities, computing graph algorithms (shortest paths, centrality, clustering), detecting communities, generating synthetic networks, or visualizing network topologies. Applicable to social networks, biological networks, transportation systems, citation networks, and any domain involving pairwise relationships.
Provides quantitative electrophysiological measurements from literature
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
World's largest collection of digitally reconstructed neuron morphologies.
Neuropixels neural recording analysis. Load SpikeGLX/OpenEphys data, preprocess, motion correction, Kilosort4 spike sorting, quality metrics, Allen/IBL curation, AI-assisted visual analysis, for Neuropixels 1.0/2.0 extracellular electrophysiology. Use when working with neural recordings, spike sorting, extracellular electrophysiology, or when the user mentions Neuropixels, SpikeGLX, Open Ephys, Kilosort, quality metrics, or unit curation.
Human protein knowledge platform - comprehensive annotations for human proteins.
OBIS is a global open-access data system for marine biodiversity, containing
OFFSIDES - database of drug side effects mined from FDA Adverse Event Reporting
The O-GlcNAc Database catalogs O-linked β-N-acetylglucosamine (O-GlcNAc)
Orthologous MAtrix database for comparative genomics.
Microscopy data management platform. Access images via Python, retrieve datasets, analyze pixels, manage ROIs/annotations, batch processing, for high-content screening and microscopy workflows.
Online Mendelian Inheritance in Man - catalog of human genes and genetic disorders.
OmniPath: Comprehensive database of protein-protein interactions, signaling,
Query OmniPath for post-translational modification (PTM) interactions.
OncoKB - Precision Oncology Knowledge Base with therapeutic implications of
Access to 200+ biomedical ontologies from EMBL-EBI.
Comprehensive scholarly database covering 240M+ works, open access.
Search OpenAlex for scholarly works with rich citation metadata, concept tags, and open-access status across 250M+ records, with sortable relevance/date/citation-count rankings. Use when PubMed or Semantic Scholar coverage is insufficient, when an evidence auditor needs citation-graph context for provenance, or when a replicator wants to find the broadest set of papers citing or cited by a target work.
FDA adverse event reports and drug information.
Query the FDA Adverse Event Reporting System (FAERS) via openFDA for a drug's post-market adverse-event signals, ranked by report frequency, with optional filtering to a specific MedDRA reaction term. Use when assessing safety liabilities for drugs being considered for repurposing in neurodegeneration, when a falsifier needs real-world pharmacovigilance signals to challenge a safety claim, or when a domain expert needs CNS-toxicity or ARIA-like signals surfaced before scoring.
Query FDA Adverse Event Reporting System (FAERS) via openFDA API.
Self-hosted, open-source alternative to Google NotebookLM for AI-powered research and document analysis. Use when organizing research materials into notebooks, ingesting diverse content sources (PDFs, videos, audio, web pages, Office documents), generating AI-powered notes and summaries, creating multi-speaker podcasts from research, chatting with documents using context-aware AI, searching across materials with full-text and vector search, or running custom content transformations. Supports 16+ AI providers including OpenAI, Anthropic, Google, Ollama, Groq, and Mistral with complete data privacy through self-hosting.
Return gene-disease associations from Open Targets Platform (GraphQL) for a given gene symbol, resolved to Ensembl via MyGene.info, with disease name, EFO id, and Open Targets association score. Use when a Domain Expert or Theorist is probing target-disease links or when a debater wants a quantitative association score beyond raw PMID counts.
Get top-scored target genes for a disease from Open Targets Platform.
Disease associations and therapeutic evidence for a gene from Open Targets Platform, scored across multiple evidence sources.
Target-disease evidence and drug discovery platform.
Query Open Targets for target safety liability evidence.
Search Open Targets Platform for targets, diseases, drugs, and overall associations.
Official Opentrons Protocol API for OT-2 and Flex robots. Use when writing protocols specifically for Opentrons hardware with full access to Protocol API v2 features. Best for production Opentrons protocols, official API compatibility. For multi-vendor automation or broader equipment control use pylabrobot.
GPU-accelerate Python code using CuPy, Numba CUDA, Warp, cuDF, cuML, cuGraph, KvikIO, cuCIM, cuxfilter, cuVS, cuSpatial, and RAFT. Use whenever the user mentions GPU/CUDA/NVIDIA acceleration, or wants to speed up NumPy, pandas, scikit-learn, scikit-image, NetworkX, GeoPandas, or Faiss workloads. Covers physics simulation, differentiable rendering, mesh ray casting, particle systems (DEM/SPH/fluids), vector/similarity search, GPUDirect Storage file IO, interactive dashboards, geospatial analysis, medical imaging, and sparse eigensolvers. Also use when you see CPU-bound Python code (loops, large arrays, ML pipelines, graph analytics, image processing) that would benefit from GPU acceleration, even if not explicitly requested.
Access researcher profiles, publications, and affiliations via ORCID.
ORegAnno - Open Regulatory Annotation database of regulatory elements
Comprehensive dataset on rare diseases and orphan drugs from Orphanet.
Orphanet is the reference portal for rare diseases and orphan drugs.
OrthoDB is a comprehensive catalog of orthologs, genes inherited by speciation events,
Oxford Nanopore Technologies (ONT) provides long-read sequencing technology with
Get canonical cell type marker genes from PanglaoDB scRNA-seq database. Covers microglia, astrocytes, neurons, OPCs, DAM, oligodendrocytes.
Protein ANalysis THrough Evolutionary Relationships - protein classification system.
Search Paperclip MCP for biomedical papers (8M+ across arXiv, bioRxiv, PMC, OpenAlex, OSF). Uses hybrid BM25+embedding search with TL;DR summaries.
Ingest a list of paper dicts into the local PaperCorpus cache for persistent storage. Each paper needs at least one ID (pmid, doi, or paper_id).
Unified multi-provider paper search over PubMed, Semantic Scholar, OpenAlex, and CrossRef with deduplication and local caching. Use when you need maximum recall across sources with a single paginated API and automatic dedupe by canonical identifiers (DOI/PMID).
Search across PubMed, Semantic Scholar, OpenAlex, and CrossRef with unified results and local caching. Use providers param to filter to specific sources.
Start a stateful multi-page search session. Call again with incremented page param to fetch subsequent pages.
Retrieve figures (labels, captions, image URLs) for a paper identified by PMID or canonical paper_id, checking the DB cache first and falling back to PMC BioC, Europe PMC full-text XML, open-access PDF extraction, and deep-link references. Use when a debater needs visual evidence (fig captions, micrographs, schematics) to ground or challenge a claim.
Search 10 academic paper databases via REST APIs for research papers, preprints, and scholarly articles. Covers PubMed, PMC (full text), bioRxiv, medRxiv, arXiv, OpenAlex, Crossref, Semantic Scholar, CORE, Unpaywall. Use when searching for papers, citations, DOI/PMID lookups, abstracts, full text, open access, preprints, citation graphs, author search, or any scholarly literature query. Triggers on mentions of any supported database or requests like "find papers on X" or "look up this DOI".
Chat with your agent about projects, recommendations, and canonical papers in Paperzilla. Use when users ask for recent project recommendations, canonical paper details, markdown-based summaries, recommendation feedback, feed export, or Atom feed URLs.
All-in-one web toolkit powered by parallel-cli, with a strong emphasis on academic and scientific sources. Use this skill whenever the user needs to search the web, fetch/extract URL content, enrich data with web-sourced fields, or run deep research reports. Covers: web search (fast lookups, research, current info — prioritizing peer-reviewed papers, preprints, and scholarly databases), URL extraction (fetching pages, articles, academic PDFs), bulk data enrichment (adding fields to CSV/lists from the web), and deep research (exhaustive multi-source reports grounded in academic literature). Also handles setup, status checks, and result retrieval. Use this skill for ANY web-related task — even if the user doesn't mention 'parallel' or 'web' explicitly. If they want to look something up, fetch a page, enrich a dataset, investigate a topic, find academic papers, check citations, or review scientific literature, this is the skill to use.
PathBank is a comprehensive, visual database of human metabolic and signaling pathways.
Full-featured computational pathology toolkit. Use for advanced WSI analysis including multiplexed immunofluorescence (CODEX, Vectra), nucleus segmentation, tissue graph construction, and ML model training on pathology data. Supports 160+ slide formats. For simple tile extraction from H&E slides, histolab may be simpler.
Integrated resource of biological pathway and interaction data.
Pathway Commons is an aggregated resource that integrates pathway and molecular
PDBe-KB aggregates functional annotations and structural information for entries
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
Structured manuscript/grant review with checklist-based evaluation. Use when writing formal peer reviews with specific criteria methodology assessment, statistical validity, reporting standards compliance (CONSORT/STROBE), and constructive feedback. Best for actual review writing, manuscript revision. For evaluating claims/evidence quality use scientific-critical-thinking; for quantitative scoring frameworks use scholar-evaluation.
Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.
Compendium of peptides identified in mass spectrometry proteomics experiments.
Protein families database with domain annotations and alignments.
Search PGS Catalog (EMBL-EBI) for published polygenic risk score (PRS) models for a disease. Returns multi-SNP scoring models with variant counts, effect weight methods, publication DOI/year, and FTP download links. Covers 47+ Alzheimer disease PRS, 11+ Parkinson disease PRS, and hundreds of cognitive/brain trait models. Complements GWAS tools (single variants) with complete polygenic models ready for individual risk stratification. Essential for precision medicine analyses in neurodegeneration.
Query PharmGKB for pharmacogenomics drug-gene relationships. Returns clinical annotations linking genetic variants to drug response.
Access pharmacogenomics data: drug-gene interactions, clinical annotations, pathways.
IDG (Illuminating the Druggable Genome) drug target database.
Predicts transmembrane topology and signal peptides in proteins.
PhosphoGRID: Database of phosphorylation sites in yeast and human.
API wrapper for PhosphoSitePlus (PSP), the most comprehensive PTM database.
Build and analyze phylogenetic trees using MAFFT (multiple alignment), IQ-TREE 2 (maximum likelihood), and FastTree (fast NJ/ML). Visualize with ETE3 or FigTree. For evolutionary analysis, microbial genomics, viral phylodynamics, protein family analysis, and molecular clock studies.
Phytozome - Plant comparative genomics portal from JGI.
PlantCyc is a comprehensive database of plant metabolic pathways, enzymes,
PlantTFDB - Plant Transcription Factor Database covering 165 plant species.
PLAZA - comparative genomics platform for plant species. Integrates
Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
High-performance genomic interval operations and bioinformatics file I/O on Polars DataFrames. Overlap, nearest, merge, coverage, complement, subtract for BED/VCF/BAM/GFF intervals. Streaming, cloud-native, faster bioframe alternative.
PomBase is the model organism database for the fission yeast Schizosaccharomyces pombe.
Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill.
Create research posters using HTML/CSS that can be exported to PDF or PPTX. Use this skill ONLY when the user explicitly requests PowerPoint/PPTX poster format. For standard research posters, use latex-posters instead. This skill provides modern web-based poster design with responsive layouts and easy visual integration.
Access to PRIDE - proteomics data repository and resource.
Query the Precision Medicine Knowledge Graph (PrimeKG) for multiscale biological data including genes, drugs, diseases, phenotypes, and more.
PROSITE is a protein domain database from the Swiss Institute of Bioinformatics (SIB).
ProteomicsDB is a protein-centric in-memory database for exploring the
Integration with protocols.io API for managing scientific protocols. This skill should be used when working with protocols.io to search, create, update, or publish protocols; manage protocol steps and materials; handle discussions and comments; organize workspaces; upload and manage files; or integrate protocols.io functionality into workflows. Applicable for protocol discovery, collaborative protocol development, experiment tracking, lab protocol management, and scientific documentation.
Comprehensive genomics database for parasitic protozoa causing major diseases.
Calculate physical and chemical parameters of proteins.
Interactive protein topology and structure visualization tool.
PseudoCAP provides genome annotation and analysis tools for P. aeruginosa,
Predict alpha-helices, beta-strands, and coils from sequence.
PTMsigDB: Database of post-translational modification signatures.
Access chemical compounds, bioassays, and substance data from PubChem.
Find bioassay-confirmed active compounds against a protein target in PubChem.
Free access to PubMed/MEDLINE database (35M+ biomedical citations).
Search PubMed (NCBI E-utilities) and return paper details with PMIDs, titles, authors, journal, year, and DOI. Use when an agent needs citable literature evidence for a hypothesis, counter-evidence, or prior-art search — especially when a PMID is required downstream (cite_artifact actions, KG edge provenance, evidence audits).
Search PubMed for papers by keyword. Returns titles, authors, journals, PMIDs.
Extract standardized gene, disease, and variant mentions from PubMed via PubTator3.
PubTator Central is NCBI's text mining tool that automatically annotates
High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Python library for working with DICOM (Digital Imaging and Communications in Medicine) files. Use this skill when reading, writing, or modifying medical imaging data in DICOM format, extracting pixel data from medical images (CT, MRI, X-ray, ultrasound), anonymizing DICOM files, working with DICOM metadata and tags, converting DICOM images to other formats, handling compressed DICOM data, or processing medical imaging datasets. Applies to tasks involving medical image analysis, PACS systems, radiology workflows, and healthcare imaging applications.
Comprehensive healthcare AI toolkit for developing, testing, and deploying machine learning models with clinical data. This skill should be used when working with electronic health records (EHR), clinical prediction tasks (mortality, readmission, drug recommendation), medical coding systems (ICD, NDC, ATC), physiological signals (EEG, ECG), healthcare datasets (MIMIC-III/IV, eICU, OMOP), or implementing deep learning models for healthcare applications (RETAIN, SafeDrug, Transformer, GNN).
Vendor-agnostic lab automation framework. Use when controlling multiple equipment types (Hamilton, Tecan, Opentrons, plate readers, pumps) or needing unified programming across different vendors. Best for complex workflows, multi-vendor setups, simulation. For Opentrons-only protocols with official API, opentrons-integration may be simpler.
Materials science toolkit. Crystal structures (CIF, POSCAR), phase diagrams, band structure, DOS, Materials Project integration, format conversion, for computational materials science.
Bayesian modeling with PyMC. Build hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior checks, for probabilistic programming and inference.
Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.
Complete mass spectrometry analysis platform. Use for proteomics workflows feature detection, peptide identification, protein quantification, and complex LC-MS/MS pipelines. Supports extensive file formats and algorithms. Best for proteomics, comprehensive MS data processing. For simple spectral comparison and metabolite ID use matchms.
PyPI is the official repository of Python packages, hosting 500,000+ projects
Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.
Interact with Zotero reference management libraries using the pyzotero Python client. Retrieve, create, update, and delete items, collections, tags, and attachments via the Zotero Web API v3. Use this skill when working with Zotero libraries programmatically, managing bibliographic references, exporting citations, searching library contents, uploading PDF attachments, or building research automation workflows that integrate with Zotero.
IBM quantum computing framework. Use when targeting IBM Quantum hardware, working with Qiskit Runtime for production workloads, or needing IBM optimization tools. Best for IBM hardware execution, quantum error mitigation, and enterprise quantum computing. For Google hardware use cirq; for gradient-based quantum ML use pennylane; for open quantum system simulations use qutip.
Quantum physics simulation library for open quantum systems. Use when studying master equations, Lindblad dynamics, decoherence, quantum optics, or cavity QED. Best for physics research, open system dynamics, and educational simulations. NOT for circuit-based quantum computing—use qiskit, cirq, or pennylane for quantum algorithms and hardware execution.
Access 3D protein structures from the Protein Data Bank.
Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms.
Access to Reactome pathways, reactions, and biological processes.
Return Reactome pathways containing a given human gene, with pathway hierarchy and species info, resolved via UniProt cross-references. Use when a hypothesis invokes a specific pathway or when a domain expert needs to place a target in its curated pathway context before scoring druggability, feasibility, or mechanistic plausibility.
Look up biological pathways a gene participates in, from Reactome.
Search Reactome for pathways by name and return their constituent genes.
Reanalysis of public RNA-seq data with consistent pipeline.
RefSeq (Reference Sequence Database) provides curated, non-redundant reference
RegNetwork - database of transcriptional and post-transcriptional regulatory
RegulomeDB - Database of regulatory variants and their functional impact.
RegulonDB is a comprehensive database of transcriptional regulation in E. coli K-12.
Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan NSTC. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements.
Look up current research information using parallel-cli search (primary, fast web search), the Parallel Chat API (deep research), or Perplexity sonar-pro-search (academic paper searches). Automatically routes queries to the best backend. Use for finding papers, gathering research data, and verifying scientific information.
Convenience meta-tool that fans out a topic query to PubMed, Semantic Scholar, and ClinicalTrials.gov in one call and packages the results into a single research brief with total_evidence count, ready to feed to an LLM agent. Use when a Theorist or other agent wants a broad first-pass scan rather than a targeted single-source search.
Check if a PMID corresponds to a retracted paper via Retraction Watch.
Access to Rfam database - RNA families database of structural RNA alignments.
Query rat gene information, orthologs, QTLs, and strains.
Comprehensive rice research database with genomics, genetics, breeding,
Database and analysis platform for RNA 3D structures.
Comprehensive non-coding RNA sequence database.
NIH Roadmap Epigenomics Mapping Consortium data portal.
Rowan is a cloud-native molecular modeling and medicinal-chemistry workflow platform with a Python API. Use for pKa and macropKa prediction, conformer and tautomer ensembles, docking and analogue docking, protein-ligand cofolding, MSA generation, molecular dynamics, permeability, descriptor workflows, and related small-molecule or protein modeling tasks. Ideal for programmatic batch screening, multi-step chemistry pipelines, and workflows that would otherwise require maintaining local HPC/GPU infrastructure.
SABIO-RK is a biochemical reaction kinetics database containing kinetic data
SASDB - Small Angle Scattering Biological Data Bank for structural data
Standard single-cell RNA-seq analysis pipeline. Use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows. For deep learning models use scvi-tools; for data format questions use anndata.
ScanSite predicts protein-protein interaction sites, kinase phosphorylation sites,
Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.
Wrapper functions for free scientific APIs.
Creative research ideation and exploration. Use for open-ended brainstorming sessions, exploring interdisciplinary connections, challenging assumptions, or identifying research gaps. Best for early-stage research planning when you do not have specific observations yet. For formulating testable hypotheses from data use hypothesis-generation.
Evaluate scientific claims and evidence quality. Use for assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), or teaching critical analysis. Best for understanding evidence quality, identifying flaws. For formal peer review writing use peer-review.
Create publication-quality scientific diagrams using Nano Banana 2 AI with smart iterative refinement. Uses Gemini 3.1 Pro Preview for quality review. Only regenerates if quality is below threshold for your document type. Specialized in neural network architectures, system diagrams, flowcharts, biological pathways, and complex scientific visualizations.
Build slide decks and presentations for research talks. Use this for making PowerPoint slides, conference presentations, seminar talks, research presentations, thesis defense slides, or any scientific talk. Provides slide structure, design templates, timing guidance, and visual validation. Works with PowerPoint and LaTeX Beamer.
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
Core skill for the deep research and writing tool. Write scientific manuscripts in full paragraphs (never bullet points). Use two-stage process with (1) section outlines with key points using research-lookup then (2) convert to flowing prose. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), for research papers and journal submissions.
Biological data toolkit. Sequence analysis, alignments, phylogenetic trees, diversity metrics (alpha/beta, UniFrac), ordination (PCoA), PERMANOVA, FASTA/Newick I/O, for microbiome analysis.
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
API for the SCOP database (Structural Classification of Proteins).
RNA velocity analysis with scVelo. Estimate cell state transitions from unspliced/spliced mRNA dynamics, infer trajectory directions, compute latent time, and identify driver genes in single-cell RNA-seq data. Complements Scanpy/scVI-tools for trajectory inference.
Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy.
Statistical visualization with pandas integration. Use for quick exploration of distributions, relationships, and categorical comparisons with attractive defaults. Best for box plots, violin plots, pair plots, heatmaps. Built on matplotlib. For interactive plots use plotly; for publication styling use scientific-visualization.
SeaLifeBase is the marine counterpart to FishBase, covering 100,000+ species
Search SciDEX annotations by URI, user, tags, or motivation.
Search ClinicalTrials.gov (API v2) for clinical trials and return NCT IDs, titles, status, phase, conditions, interventions, sponsors, enrollment counts, and start/completion dates. Use when a hypothesis touches translational feasibility, a Domain Expert is sizing the competitive landscape, or a Skeptic is probing replicability of a drug/target program in humans.
Academic paper search and citation graph from Allen AI (200M+ papers).
Fetch papers by a given author.
Fetch author details including h-index, citation count, paper count.
Batch fetch details for multiple papers.
Fetch papers that cite the given paper (citation network).
Fetch only highly influential citations for a paper.
Fetch full details for a single paper by ID.
Fetch recommended similar papers for a given paper.
Fetch papers referenced by the given paper (reference network).
Search Semantic Scholar for scholarly papers and return enriched records with paperId, title, authors, year, citation count, abstract, TLDR summary, venue, and cross-walked PMID/DOI identifiers. Use when citation-weighted ranking, TLDR summaries, or influential-citation counts matter more than raw PubMed recency.
Access to SGD - comprehensive database for yeast genetics and genomics.
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.
Access to SIDER - database of marketed drugs and their recorded adverse drug reactions.
SignaLink is an integrated resource for studying signaling pathway cross-talk,
Predict signal peptides and cleavage sites in protein sequences.
Curated database of signaling pathways and regulatory interactions.
Comprehensive quality-checked and aligned ribosomal RNA sequence database.
Process-based discrete-event simulation framework in Python. Use this skill when building simulations of systems with processes, queues, resources, and time-based events such as manufacturing systems, service operations, network traffic, logistics, or any system where entities interact with shared resources over time.
Simple Modular Architecture Research Tool - protein domain annotation database.
SMPDB provides detailed, fully annotated pathway diagrams for small molecule
SOL (Solanaceae) Genomics Network - Genomic data for tomato, potato, pepper, etc.
Comprehensive soybean research database with genomics, genetics, breeding,
NCBI's Sequence Read Archive (SRA) is the largest publicly available repository of
Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.
Guided statistical analysis with test selection and reporting. Use when you need help choosing appropriate tests for your data, assumption checking, power analysis, and APA-formatted results. Best for academic research reporting, test selection guidance. For implementing specific models programmatically use statsmodels.
Statistical models library for Python. Use when you need specific model classes (OLS, GLM, mixed models, ARIMA) with detailed diagnostics, residuals, and inference. Best for econometrics, time series, rigorous inference with coefficient tables. For guided statistical test selection with APA reporting use statistical-analysis.
Access to STITCH - database of chemical-protein interactions.
Protein-protein interaction networks and functional associations.
Query STRING DB for a functional interaction network across a gene set.
Query STRING-DB for protein-protein interactions among a list of gene symbols, returning scored edges with evidence types for either a physical or functional network at a configurable confidence threshold. Use when a hypothesis invokes a protein interaction or pathway neighbourhood, when network-level plausibility must be assessed before promotion, or when building a small subgraph around a target for a debate artifact.
Find physical protein-protein interactions from the STRING database. Enter 2+ gene symbols.
Database of approved drugs with 3D structures, targets, and conformations.
SuperTarget - database of drug-target interactions with information on
Comprehensive database of lipid structures, nomenclature, and biology.
Use this skill when working with symbolic mathematics in Python. This skill should be used for symbolic computation tasks including solving equations algebraically, performing calculus operations (derivatives, integrals, limits), manipulating algebraic expressions, working with matrices symbolically, physics calculations, number theory problems, geometry computations, and generating executable code from mathematical expressions. Apply this skill when the user needs exact symbolic results rather than numerical approximations, or when working with mathematical formulas that contain variables and parameters.
Synthetic test tool
Synthetic test tool
Synthetic test tool
Tabula Sapiens is a benchmark single-cell transcriptomic atlas of ~500,000
TAIR is the primary database for the model plant Arabidopsis thaliana,
TarBase: Database of experimentally validated microRNA targets.
Predict subcellular localization of proteins.
TargetScan predicts biological targets of miRNAs by searching for the presence of
Transporter Classification Database - comprehensive classification of membrane transport systems.
TCGA generated comprehensive molecular profiles of more than 20,000 primary cancers
TTD is a database providing information about therapeutic protein and nucleic acid
Efficient storage and retrieval of genomic variant data using TileDB. Scalable VCF/BCF ingestion, incremental sample addition, compressed storage, parallel queries, and export capabilities for population genomics.
Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.
Phylogenetic timing database - evolutionary divergence times between species.
Predict transmembrane helices in protein sequences.
Tests all Forge tools to ensure they follow standards and can import correctly.
TOPCONS (Topology Consensus) combines multiple prediction methods to provide
Database of transmembrane protein topology annotations.
PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.
Guide for building Graph Neural Networks with PyTorch Geometric (PyG). Use this skill whenever the user asks about graph neural networks, GNNs, node classification, link prediction, graph classification, message passing networks, heterogeneous graphs, neighbor sampling, or any task involving torch_geometric / PyG. Also trigger when you see imports from torch_geometric, or the user mentions graph convolutions (GCN, GAT, GraphSAGE, GIN), graph data structures, or working with relational/network data. Even if the user just says 'graph learning' or 'geometric deep learning', use this skill.
Fine-tune a model artifact on a dataset artifact via GPU sandbox. Produces a new model version with parent lineage, code commit SHA, and eval metrics captured. Use dry_run=True to validate and estimate cost before launching.
This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.
Generate concise (3-4 page), focused medical treatment plans in LaTeX/PDF format for all clinical specialties. Supports general medical treatment, rehabilitation therapy, mental health care, chronic disease management, perioperative care, and pain management. Includes SMART goal frameworks, evidence-based interventions with minimal text citations, regulatory compliance (HIPAA), and professional formatting. Prioritizes brevity and clinical actionability.
Repository of phylogenetic trees and the data matrices used to generate them,
TreeFam (Tree families database) is a database of phylogenetic trees of gene families
TRRUST - Transcriptional Regulatory Relationships Unraveled by Sentence-based
Uberon is an integrated cross-species anatomy ontology covering animals and
Access genome assemblies, annotations, and sequence data.
World's largest biobank with deep genetic and phenotypic data from 500,000+ participants.
UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.
Federated search across multiple Forge tools for genes, drugs, diseases, and proteins.
Standard reference for protein post-translational modifications (PTMs) in mass spectrometry.
Retrieve comprehensive UniProt (Swiss-Prot) annotation for a human protein by gene symbol or accession — function, subcellular location, domains, post-translational modifications, interaction count, and disease associations. Use when a hypothesis needs authoritative protein-level grounding, when mechanism claims must be checked against curated biology, or when a downstream tool requires a canonical UniProt accession.
Comprehensive protein annotation from UniProt/Swiss-Prot: function, domains, subcellular location, disease associations.
Access protein sequence and functional information from UniProtKB.
Query the U.S. Treasury Fiscal Data API for federal financial data including national debt, government spending, revenue, interest rates, exchange rates, and savings bonds. Access 54 datasets and 182 data tables with no API key required. Use when working with U.S. federal fiscal data, national debt tracking (Debt to the Penny), Daily Treasury Statements, Monthly Treasury Statements, Treasury securities auctions, interest rates on Treasury securities, foreign exchange rates, savings bonds, or any U.S. government financial statistics.
General utility functions for biological data translation and conversion.
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.
VarSome is a knowledge-driven variant interpretation platform that aggregates
Database of T-cell receptor (TCR) sequences with known antigen specificities.
VectorBase is a NIAID Bioinformatics Resource Center providing genomic and
Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
Comprehensive database of virulence factors (VFs) from bacterial pathogens.
Bioinformatics resource center for viral pathogens.
Comprehensive virology resource with virus biology, replication cycles,
Run structured What-If scenario analysis with multi-branch possibility exploration. Use this skill when the user asks speculative questions like "what if...", "what would happen if...", "what are the possibilities", "explore scenarios", "scenario analysis", "possibility space", "what could go wrong", "best case / worst case", "risk analysis", "contingency planning", "strategic options", or any question about uncertain futures. Also trigger when the user faces a fork-in-the-road decision, wants to stress-test an idea, or needs to think through consequences before committing.
Comprehensive wheat research database with genomics, genetics, breeding,
Access to WikiPathways - community-curated biological pathway database.
Access to WormBase - comprehensive database for C. elegans genetics and genomics.
WoRMS is the authoritative taxonomic database for marine organisms, containing
XenBase is the model organism database for Xenopus laevis and Xenopus tropicalis
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
Comprehensive database of small molecule metabolites found in Saccharomyces cerevisiae.
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
Access research data, software, and publications from Zenodo.
Access to ZFIN - Zebrafish Information Network database.
Drug discovery database with millions of commercially available compounds.