I Build and Operate Automation Systems
My focus is not just "writing tests" — it's building automation platforms that scale with engineering teams. That includes cloud infrastructure (IaC), CI/CD pipelines, telemetry/observability, performance budgets, security gates, and the QA automation layer that proves it works.
I prefer building systems that are easy to operate: clear ownership, clear runbooks, and metrics that make quality measurable.
Six Pillars
CI/CD Automation Gates
Multi-layer pipelines: lint → typecheck → unit → integration → E2E. Fast feedback via parallelization and smart retries (flake-aware). Every merge is gated.
- Parallel test execution
- Flake-aware retry logic
- Artifact publishing
- Automated rollback
Infrastructure as Code
Terraform modules with least-privilege IAM. GitHub OIDC federation — no long-lived keys. Cost-aware defaults and environment promotion gates.
- Terraform + HCL
- GitHub OIDC (no static keys)
- Multi-environment promotion
- Cost guardrails
Test Observability
Pass-rate and flake-rate trend tracking across repos. Quarantine workflow for flaky tests. Telemetry-first mindset — if you can't measure it, you can't improve it.
- Flake rate tracking
- Test quarantine workflow
- Coverage trend analysis
- Quality telemetry dashboard
Performance Budgets
Lighthouse CI budgets enforced in pipeline. P95/P99 thinking: define acceptable latency thresholds and enforce them before every deploy.
- Lighthouse CI integration
- P95/P99 latency budgets
- Bundle size monitoring
- Core Web Vitals tracking
Security Automation
OWASP-style scanning, dependency hygiene, secrets detection. Secure-by-default pipelines that fail on critical findings — not optional manual reviews.
- OWASP dependency scanning
- Secrets detection (pre-commit)
- WAF + rate limiting
- Least-privilege IAM
Operations & Maintainability
Runbooks, checklists, clear ownership. Environment drift management. Design for maintainability — the next person should be able to operate it.
- Runbooks for every system
- Incident triage playbooks
- Environment drift detection
- Documented architecture decisions
Reference Pipeline
PR / Commit └─► CI Pipeline (lint / typecheck / unit) └─► Integration tests (DB / services) └─► E2E + a11y + visual └─► Perf budgets (Lighthouse / load) └─► Security gates (deps / secrets / ZAP) └─► Publish artifacts (reports / screenshots) └─► Telemetry snapshot + dashboard └─► Alerts / triage playbooks
SLOs & Operational Targets
This portfolio is intentionally operated like a production system. These are the same signals senior cloud/platform teams look for: SLOs, SLIs, error budgets, and a repeatable incident drill loop.
99.9%
Dashboard Availability
Synthetic HTTP checks + uptime monitoring
Monthly< 24h
Telemetry Freshness
Time since last metrics update
Rolling99.9%
AWS Proxy Reliability
Lambda errors + API Gateway 4xx/5xx rates
Monthly< 500ms
P95 Response Time
API Gateway + Lambda duration percentiles
RollingPattern: measure → alert → drill → postmortem → fix. High-comp cloud roles are hired to hit SLOs under cost and security constraints.
Receipts, Not Buzzwords
Every security claim has evidence behind it. WAF configs, IAM policies, attack simulations, and threat models — designed for cloud/infrastructure reviewers and senior engineers.
WAF + Rate Limiting
CloudFront-scope Web ACL with rate-based rules. API Gateway stage throttling. Attack simulation script proves the controls work.
Evidence: waf-rate-limit.txt, attack simulation script, Terraform CloudFront+WAF module
IAM Least Privilege
Lambda has only s3:GetObject for a single key. DynamoDB operations limited to specific table and actions. No wildcard policies.
Evidence: IAM policy JSON, GitHub OIDC trust policy
Token Strategy
x-metrics-token shared secret for API auth. No long-lived AWS keys anywhere — GitHub OIDC federation for all CI/CD to AWS interactions.
Evidence: OIDC trust policy, token validation middleware
Threat Model
Documented abuse cases with mitigations: API scraping (token + rate limit), secrets exposure (server-only), untrusted artifact input (schema-validated), blast radius containment.
Pattern: least privilege + untrusted input handling + safe degradation
Incident Drills
Every failure mode has been tested. These aren't theoretical — each drill was executed and the response validated.
See It Running
Check the live dashboard or download the operational artifacts.