Effort: extensive
Compose the existing alphafold-structure, diffdock, and
chembl-drug-targets skills into a single end-to-end **structure-based
virtual-screen workflow**: given a target gene, fetch (or predict) the
structure, prepare a binding pocket, screen a configurable ChEMBL
ligand library against it with DiffDock, rank by predicted binding +
medicinal-chemistry filters, and persist the top-100 hits as ranked
ligand-pose artifacts the Domain-Expert persona can cite when arguing
druggability.
Today tools.py:alphafold_structure (line 2796) returns metadata,
diffdock runs in isolation, and chembl-drug-targets (line 2547)
returns a known-drug list. None of these compose into a hypothesis-
relevant answer. A debate over "is <gene> druggable?" gets a much
sharper answer when the workflow can produce "yes — here are 12
compounds that dock with confidence > 0.8 and pass medchem/Lipinski
filters" instead of "ChEMBL knows about 4 historical compounds." This
unlocks Forge as a competitive virtual-screening platform.
scidex/forge/docking_workflow.py (≤1100 LoC):prepare_target(gene, source='auto') — pulls AlphaFold PDBalphafold_structure (preferring experimentalfpocket for bindingTargetSpec(pdb_path, pocket_xyz,
confidence).assemble_library(target_gene, n=2000) — pulls ChEMBLdatamolrun_diffdock(target_spec, library) — runs DiffDock inget-available-resourcesfilter_and_rank(poses) — applies medchem druglikenesspipeline(gene) — composes all four; commits the top-100data/scidex-artifacts/docking/<gene>/<run>/.
docking_run(run_id PRIMARY KEY, gene_symbol,tools.py registers docking_workflow_pipeline(gene) as a@log_tool_call.
/api/docking/run/<gene> POST kicks off a run; statusscidex/forge/executor.py.
/artifacts/<id> for a docking-run artifact renders a 3D viewerdocking_block withtarget_genedomain_expert.py.
python -m scidex.forge.docking_workflow --genefpocket (CLI) — wrap with subprocessscidex/forge/docking_workflow.py.
target/ChEMBL_ID/activities plusdatamol for.claude/skills/diffdock)get-available-resources skill; CPU fallbackalphafold-structure, diffdock, chembl-drug-targets,datamol, medchem, rdkit skills.
get-available-resources skill.data/scidex-artifacts/ submodule for outputs.Implemented all acceptance criteria:
scidex/forge/docking_workflow.py (1094 LoC) — full pipeline module:prepare_target(gene): UniProt lookup → PDB best-structure (experimental via PDBe/RCSB, falling back to AlphaFold) → download PDB → fpocket subprocess pocket detection with centroid extractionassemble_library(gene, n): ChEMBL actives via REST API + drug-like diverse subset; Tanimoto diversity selection via RDKit (graceful fallback without RDKit)run_diffdock(target_spec, library): Real DiffDock CLI invocation when found at $DIFFDOCK_DIR or standard paths; property-heuristic fallback when not installedfilter_and_rank(poses): confidence >0.7, MW 150–550 Da, PAINS removal via RDKit FilterCatalog, composite score (0.6conf + 0.4ligand_efficiency)pipeline(gene): composes all four; persists JSON artifact; writes docking_run DB rowpython -m scidex.forge.docking_workflow --gene EGFR --library 200migrations/add_docking_run_table.py — PostgreSQL docking_run table; migration run successfully.scidex/forge/tools.py — docking_workflow_pipeline(gene) registered with @require_preregistration @log_tool_call; added to TOOL_NAME_MAPPING.api.py — POST /api/docking/run/{gene} and GET /api/docking/runs/{gene} routes added in Forge section.scidex/agora/skill_evidence.py — _build_docking_block() injects top-3 hits into domain_expert evidence when a recent run exists for the hypothesis gene.Design notes:
fpocket called via subprocess with CA-centroid fallback when not availablenvidia-smi