Quest: Site Analytics — Visitor Engagement & Usage Intelligence

← All Specs

Quest: Site Analytics — Visitor Engagement & Usage Intelligence

ID: q-site-analytics Layer: Senate Priority: 78 Status: active

Goal

Track how visitors use SciDEX — which pages they visit, how long they stay,
what content types attract engagement, and where they come from. Use these
signals to improve the site, prioritize content generation, and measure the
platform's reach.

What to track

Per-pageview (core)

FieldWhy it matters
pathWhich pages are visited
referrerWhere visitors come from (search, direct, social, other sites)
user_agentBrowser/bot classification
session_idGroup pageviews into sessions (cookie or IP+UA hash)
timestampTime-series analysis
response_time_msPerformance monitoring
status_codeError rate tracking

Per-session (derived)

MetricHow to compute
pages_per_sessionCOUNT(pageviews) grouped by session_id
session_durationMAX(timestamp) - MIN(timestamp) per session
bounce_rateSessions with only 1 pageview / total sessions
entry_pageFirst page of session
exit_pageLast page of session

Content engagement (aggregated)

MetricPurpose
Views by page typehypothesis / analysis / wiki / debate / notebook / gap / dashboard
Views by domainWhich research domains attract visitors
Search queriesWhat are people looking for? (from /api/search logs)
Time-on-pageProxy for content quality
Return visitorsHow many come back?

Improvement signals

These stats directly inform what to build next:
  • Low-traffic high-value pages → need better discovery/linking
  • High-bounce pages → content or UX problem
  • Popular search terms with no results → content gap
  • Pages with long dwell time → successful content to replicate
  • Referrer patterns → where to promote
  • Bot vs human ratio → SEO health
  • Error pages → broken links to fix
  • Mobile vs desktop → responsive design priority

Architecture

Middleware (lightweight)

FastAPI middleware logs every request to site_pageviews table. Minimal
overhead (~1ms): just INSERT, no blocking. Excludes static assets
(/static/, .css, .js, .ico) and health-check endpoints.

Tables

  • site_pageviews — raw event log (path, referrer, user_agent, session_id, ...)
  • site_sessions — materialized session aggregates (computed periodically)
  • site_daily_stats — daily rollups for trend charts

Dashboard page: /senate/analytics

  • Today's stats: pageviews, unique sessions, top pages
  • 7-day / 30-day trend charts (pageviews, sessions, bounce rate)
  • Content breakdown: pie chart by page type
  • Referrer breakdown
  • Top pages table with views, avg time, bounce rate
  • Search query cloud / table
  • Geographic distribution (if IP → country mapping added later)

Privacy

  • No PII stored — session_id is a hash of IP + user-agent (not reversible)
  • No cookies required (server-side session inference)
  • Compliant with minimal-tracking principles

Acceptance criteria

☐ Middleware records pageviews with <2ms overhead
/senate/analytics shows today + 7d + 30d stats
☐ Page-type breakdown chart
☐ Top-10 pages table
☐ Referrer breakdown
☐ Bot filtering (exclude known crawlers from human stats)
☐ Daily rollup task keeps site_daily_stats fresh

File: q-site-analytics_spec.md
Modified: 2026-05-01 20:13
Size: 3.3 KB