Quest: Site Analytics — Visitor Engagement & Usage Intelligence
ID: q-site-analytics
Layer: Senate
Priority: 78
Status: active
Goal
Track how visitors use SciDEX — which pages they visit, how long they stay,
what content types attract engagement, and where they come from. Use these
signals to improve the site, prioritize content generation, and measure the
platform's reach.
What to track
Per-pageview (core)
| Field | Why it matters |
|---|
| path | Which pages are visited |
| referrer | Where visitors come from (search, direct, social, other sites) |
| user_agent | Browser/bot classification |
| session_id | Group pageviews into sessions (cookie or IP+UA hash) |
| timestamp | Time-series analysis |
| response_time_ms | Performance monitoring |
| status_code | Error rate tracking |
Per-session (derived)
| Metric | How to compute |
|---|
| pages_per_session | COUNT(pageviews) grouped by session_id |
| session_duration | MAX(timestamp) - MIN(timestamp) per session |
| bounce_rate | Sessions with only 1 pageview / total sessions |
| entry_page | First page of session |
| exit_page | Last page of session |
Content engagement (aggregated)
| Metric | Purpose |
|---|
| Views by page type | hypothesis / analysis / wiki / debate / notebook / gap / dashboard |
| Views by domain | Which research domains attract visitors |
| Search queries | What are people looking for? (from /api/search logs) |
| Time-on-page | Proxy for content quality |
| Return visitors | How many come back? |
Improvement signals
These stats directly inform what to build next:
- Low-traffic high-value pages → need better discovery/linking
- High-bounce pages → content or UX problem
- Popular search terms with no results → content gap
- Pages with long dwell time → successful content to replicate
- Referrer patterns → where to promote
- Bot vs human ratio → SEO health
- Error pages → broken links to fix
- Mobile vs desktop → responsive design priority
Architecture
Middleware (lightweight)
FastAPI middleware logs every request to
site_pageviews table. Minimal
overhead (~1ms): just INSERT, no blocking. Excludes static assets
(/static/
, .css,
.js, .ico) and health-check endpoints.
Tables
site_pageviews — raw event log (path, referrer, user_agent, session_id, ...)
site_sessions — materialized session aggregates (computed periodically)
site_daily_stats — daily rollups for trend charts
Dashboard page: /senate/analytics
- Today's stats: pageviews, unique sessions, top pages
- 7-day / 30-day trend charts (pageviews, sessions, bounce rate)
- Content breakdown: pie chart by page type
- Referrer breakdown
- Top pages table with views, avg time, bounce rate
- Search query cloud / table
- Geographic distribution (if IP → country mapping added later)
Privacy
- No PII stored — session_id is a hash of IP + user-agent (not reversible)
- No cookies required (server-side session inference)
- Compliant with minimal-tracking principles
Acceptance criteria
☐ Middleware records pageviews with <2ms overhead
☐ /senate/analytics shows today + 7d + 30d stats
☐ Page-type breakdown chart
☐ Top-10 pages table
☐ Referrer breakdown
☐ Bot filtering (exclude known crawlers from human stats)
☐ Daily rollup task keeps site_daily_stats fresh