System Design

Quality Telemetry Pipeline

This portfolio isn’t just a UI—it's a small quality platform. It collects evidence from CI, turns it into structured telemetry, and displays it as a live-style dashboard.

GitHub ActionsArtifact ingestion (ZIP)Schema-driven metricsAWS S3 cloud modeGraceful fallbackVercel-friendly

Architecture (high level)

The key idea: treat CI as an event source and artifacts as a transport for evidence.

QA repos (pytest/playwright/allure)
  └─ GitHub Actions workflow
       ├─ runs tests
       ├─ writes qa-metrics.json (schema)
       ├─ uploads artifact: qa-metrics (zip)
       └─ uploads evidence artifacts (optional)

Optional Cloud Mode (AWS)
  └─ GitHub OIDC → assume IAM role (no long-lived keys)
       └─ write latest.json to S3 (cost-controlled retention)

qa-portfolio (Next.js on Vercel)
  ├─ /api/quality (server)
  │    ├─ fetches recent workflow runs
  │    ├─ finds newest run containing qa-metrics artifact
  │    ├─ downloads artifact zip
  │    ├─ extracts qa-metrics.json
  │    └─ returns merged snapshot + telemetry (snapshot/live/cloud)
  └─ /dashboard (client)
       └─ renders KPIs + links + debug/observability

Failure Modes (and how the system responds)

This dashboard is intentionally built with production-style degradation. When upstream systems fail, the UI stays usable and the API returns a coherent payload.

GitHub API rate limits / outages
  • Impact: Live mode cannot fetch run metadata/artifacts.
  • Detection: debug panel + response notes; CI remains the source of truth.
  • Response: fall back to Snapshot mode (committed metrics.json).
Missing artifact / empty repo signal
  • Impact: Live scan may not find qa-metrics on the newest run.
  • Detection: debug fields show scan depth + matched run ID.
  • Response: scan back through recent runs, or degrade to Snapshot if needed.
AWS proxy down / token mismatch
  • Impact: Cloud mode cannot read metrics from AWS.
  • Detection: CloudWatch alarms (errors/p95) + access logs.
  • Response: fall back to Snapshot mode; Cloud mode still shows proof links.
S3 object missing / retention expired
  • Impact: AWS mode returns 404/NoSuchKey.
  • Detection: access logs + Lambda error alarm.
  • Response: fail closed (no secrets) + degrade to Snapshot mode.

Reliability / Fall Back

Live data is best-effort. If GitHub rate-limits or a repo has no artifact on the latest run, the API scans recent runs and still returns a coherent response. If live fetch fails entirely, the dashboard degrades to the committed snapshot.
Pattern: progressive enrichment + deterministic baseline

Data Contract

Metrics are schema-driven (see QUALITY_METRICS_SCHEMA.md). Workflows generate qa-metrics.json so the portfolio stays decoupled from individual frameworks.
Pattern: contract-first telemetry

Security / Evidence

The API exposes only safe metadata (run IDs/URLs, scan depth). Secrets never leave the server. Evidence artifacts (reports, junit xml) are linked, not embedded.
Pattern: least privilege + safe observability

Performance

The API caches responses briefly to reduce GitHub API calls. Live mode is designed to be Vercel-friendly (serverless execution, short compute, explicit no-store).
Pattern: cache + rate-limit aware design

Threat model (abuse cases + mitigations)

This is intentionally a read-only telemetry surface. The system is designed so that even if someone abuses the public endpoints, the blast radius stays small.

  • GitHub API rate-limit abuse: requests are cached briefly, and the API falls back to the committed snapshot.
  • Token exposure: GitHub token stays server-only; the client never receives it. Responses include only safe metadata (run URLs/IDs) and sanitized debug fields.
  • Artifact / ZIP attacks: artifacts are treated as untrusted input. Extraction is scoped to expected filenames and the metrics payload must validate against the schema.
  • Data leakage: no secrets, logs, or raw environment are embedded in the dashboard. Evidence is linked, not embedded.
  • Denial-of-service: live mode is best-effort; failures degrade gracefully to static mode rather than cascading.
Patterns: least privilege + untrusted input handling + graceful degradation

What this demonstrates

  • Cloud/platform automation: CI as an event source, artifacts as evidence, optional AWS S3 cloud ingestion.
  • Backend skills: API composition, schema contracts, caching, resilience to partial failure.
  • Security posture: least privilege patterns (server-only tokens, OIDC-ready cloud auth).
  • Product thinking: a dashboard that explains itself and links directly to proof (runs/reports).

Cloud deployment path (low cost)

Cloud mode is designed to be budget-friendly: S3-only storage (optional DynamoDB) with lifecycle retention. GitHub Actions can authenticate using OIDC (no long-lived AWS keys).