Goal
Investigate and resolve a complete service outage where all 8 core SciDEX pages return HTTP status 0 (connection failure), indicating the web server is unreachable.
Acceptance Criteria
☑ Root cause identified and documented
☑ All 8 core pages return 200 or acceptable redirect (301/302)
☑ API /api/status responds with valid JSON
☑ Virtual environment restored for future systemd restarts
☑ Health check script added to detect/prevent future outages
Root Cause Analysis
The main checkout at /home/ubuntu/scidex/ suffered catastrophic working tree deletion:
- 14,513 files deleted from the working tree (including
api.py, tools.py, database.py)
- The virtual environment (
venv/) was completely destroyed
- The running uvicorn process had modules in memory but became unresponsive
pull_main.sh was not running to restore files
The uvicorn process loaded modules at startup time, but once the working tree was wiped,
the process became unable to serve requests properly. When it was killed, systemd tried
to restart it but the venv binary (
venv/bin/python3.12) no longer existed on disk,
causing restart failures.
Approach
Diagnosed: checked process state, port binding, file existence
Restored: git checkout HEAD -- . to restore all working tree files
Recreated: /usr/bin/python3.12 -m venv + pip install from requirements.txt
Verified: all 8 core pages return correct HTTP status
Prevention: added scripts/health_check_api.sh for automated recoveryWork Log
2026-04-17 06:20 PT — Slot 53 (glm-5)
- INVESTIGATION: Found uvicorn process running (PID 915175) but not responding to HTTP requests
- DIAGNOSIS: Main checkout working tree had 14,513 files deleted;
api.py, tools.py, database.py all missing
- DIAGNOSIS: Virtual environment at
venv/ completely destroyed
- DIAGNOSIS:
pull_main.sh not running to restore files
- FIX: Restored all files via
git checkout HEAD -- .
- FIX: Recreated venv with
/usr/bin/python3.12 -m venv and pip install requirements
- FIX: Killed stuck process; systemd restarted with healthy venv
- VERIFICATION: All 8 core pages return correct status codes:
-
/ → 302,
/exchange → 200,
/gaps → 200,
/graph → 200
-
/analyses/ → 200,
/atlas.html → 200,
/how.html → 301,
/pitch.html → 200
- VERIFICATION:
/api/status returns: 390 analyses, 685 hypotheses, 707K edges
- PREVENTION: Added
scripts/health_check_api.sh for automated detection and recovery
- RESULT: Service fully restored and hardened against future working tree deletion