The purpose of this project is to provide a template with a comprehensive UI, using free tools and without sharing your (potentially private) code with third partiesβexcept GitHub π΅οΈ
- Demo Reports (E2E artifacts viewer):
https://holiber.github.io/kickstart/
Below is the table of features we plan to integrate. Each feature has a Tier number from 1 to 4:
- Tier 1: Immediate winsβlow maintenance and broadly reusable across backend/API/web/CLI.
- Tier 2: Still common, but requires a bit more wiring and baseline setup.
- Tier 3: Significant engineering effort; worth it if youβll reuse it across many repos.
- Tier 4: Ambitious/experimentalβhigher upkeep, more brittle, or niche.
| Id | Category | Description | Tools | Tier |
|---|---|---|---|---|
| ci-lint-format-typecheck | π§Ή Code Quality | Quality gates (lint/format/typecheck) β fast checks that fail PRs early | ESLint, Prettier, TypeScript tsc, (optional) Biome, GitHub Actions |
π’ Tier 1 |
| ci-test-run | π§ͺ Testing | Test execution in CI β run unit/integration tests reliably | Vitest/Jest, Node test runner, GitHub Actions | π’ Tier 1 |
| ci-test-reporter | π§ͺ Testing | Test reporting in PR checks β surface failures in GitHub UI | dorny/test-reporter, GH Actions Job Summary, JUnit reporters |
π’ Tier 1 |
| ci-test-metrics | π Metrics | Test count & duration tracking β #tests + total time + trend | Vitest/Jest JSON output, custom parser, GH Actions summary, artifact JSON | π’ Tier 1 |
| ci-coverage | π§ͺ Testing | Code coverage reporting β line/branch/function + PR delta | c8/Istanbul/nyc, lcov + report action, PR comment/check |
π’ Tier 1 |
| ci-coverage-gate | π§ͺ Testing | Coverage gate (diff/threshold) β enforce minimums | c8 + custom diff logic, lcov diff tooling |
π‘ Tier 2 |
| ci-artifacts-bundle | π Observability | Unified CI artifacts bundle β pack logs/reports/screenshots | GitHub Actions artifacts, structured folders, zip step | π’ Tier 1 |
| ci-summary-rich | π Observability | Rich CI summary for mobile review β single GitHub Actions summary with key links + inline highlights | GH Actions Job Summary (GITHUB_STEP_SUMMARY), Markdown generation script |
π’ Tier 1 |
| artifact-index-html | π Observability | Artifact index page β generate index.html that links screenshots/videos/traces/logs for one-tap review |
Static HTML generator + upload as artifact (or publish to GH Pages) | π’ Tier 1 |
| ci-cache | βοΈ CI | CI caching β speed up installs/builds/tests | actions/cache, pnpm/yarn/npm cache, build caches |
π’ Tier 1 |
| ci-workflow-timings | βοΈ CI | CI step timing observability β know whatβs slow in pipeline | GH Actions timings + custom summary, act locally |
π‘ Tier 2 |
| ci-nightly-full-suite | βοΈ CI | Nightly full suite β heavy checks run on schedule | GitHub scheduled workflows | π‘ Tier 2 |
| deps-inventory | π¦ Dependencies | Dependency inventory β direct/transitive counts + basic stats | pnpm list, npm ls, custom script, lockfile parsing |
π‘ Tier 2 |
| deps-hygiene | π¦ Dependencies | Dependency hygiene checks β unused deps, duplicates, policies | depcheck, pnpm dedupe, lockfile lint, custom allow/deny |
π‘ Tier 2 |
| deps-auto-update | π¦ Dependencies | Automated dependency update PRs | Dependabot (built-in) or Renovate (self-hosted) | π’ Tier 1 |
| deps-update-benchmark | π¦ Dependencies | Auto-update + benchmark validation β run metrics on update PR | Dependabot/Renovate + workflows that run full metric suite | π‘ Tier 2 |
| deps-update-labeling | π¦ Dependencies | Regression/improvement labeling on update PRs | Custom PR comment + labels via GitHub API | π Tier 3 |
| bundle-size-tracking | π¦ Build/Bundle | Bundle size tracking (per entry/chunk) | size-limit, webpack-bundle-analyzer, rollup-plugin-visualizer, source-map-explorer |
π’ Tier 1 |
| bundle-size-budget | π¦ Build/Bundle | Bundle budgets β fail when exceeding budget | size-limit + GH Action |
π’ Tier 1 |
| bundle-diff | π¦ Build/Bundle | Bundle diff (PR vs main) β what changed in size | size-limit PR comments, custom artifact comparison | π‘ Tier 2 |
| treeshaking-audit | π¦ Build/Bundle | Tree-shaking effectiveness audit β detect non-shakeable imports | bundler analyzer, sideEffects audits, ESM/CJS checks |
π Tier 3 |
| bundle-duplication | π¦ Build/Bundle | Duplicate code / dependency duplication detection | lockfile analysis, webpack stats, pnpm why, custom scripts |
π Tier 3 |
| build-hotspots | β‘οΈ Performance | Build pipeline hotspot profiling β which step/plugin is slow | webpack --profile, Vite debug logs, custom timers |
π‘ Tier 2 |
| build-cold-time | β‘οΈ Performance | Cold build time measurement β clean build duration | timed GH steps, hyperfine, custom scripts |
π’ Tier 1 |
| build-incremental-time | β‘οΈ Performance | Incremental build measurement β rebuild after change | watch mode + scripted edits, hyperfine |
π‘ Tier 2 |
| dev-server-startup | π§βπ» DevEx | Dev server startup time β command β ready | custom timing hooks, Vite/webpack logs, wait-on |
π‘ Tier 2 |
| prod-startup | β‘οΈ Performance | Production startup time β app start / server boot | node --perf-basic-prof, custom timing in entrypoint |
π‘ Tier 2 |
| hmr-latency | π§βπ» DevEx | HMR latency β change β browser updated | Playwright + file edit + measure, Vite HMR hooks | π Tier 3 |
| watch-rebuild-latency | π§βπ» DevEx | Watch rebuild latency β change β build finished | watch mode logs parsing, custom timers | π‘ Tier 2 |
| event-loop-blocking | β‘οΈ Performance | Event loop blocking detection β long tasks > X ms | perf_hooks, blocked-at, clinic doctor, custom tracing |
π‘ Tier 2 |
| cpu-profile-capture | β‘οΈ Performance | CPU profiling on demand β flamegraphs for regressions | node --prof, 0x, clinic flame, pprof |
π Tier 3 |
| memory-snapshots | β‘οΈ Performance | Memory snapshots / leak hints | heap snapshots, clinic heapprofiler, --inspect |
π Tier 3 |
| e2e-framework | π§ E2E | E2E test framework β browser automation | Playwright (recommended), optional WebdriverIO | π’ Tier 1 |
| e2e-artifacts | π§ E2E | E2E artifacts (trace/video/screenshots) β always upload for quick review | Playwright trace viewer, videos/screenshots as artifacts | π’ Tier 1 |
| e2e-demo-flow-video | π§ E2E | Recorded demo smoke flows β scripted βhappy pathsβ that always produce video/screenshots (ideal for mobile review) | Playwright projects, deterministic test data/demo mode, artifacts | π’ Tier 1 |
| visual-regression | π¨ Visual | Visual regression testing β screenshot comparisons on critical flows | Playwright toHaveScreenshot, optional Storybook snapshots |
π’ Tier 1 |
| golden-update-flow | π¨ Visual | Golden/baseline update workflow β accept new snapshots fast | Playwright update snapshots, PR with snapshot diffs, artifacts | π‘ Tier 2 |
| visual-diff-viewer | π¨ Visual | Visual diff visualization β easy review of diffs | Playwright HTML report, custom GH Pages gallery | π Tier 3 |
| browser-console-logs | π Observability | Browser console log capture β console errors/warns saved | Playwright listeners + artifact logs | π’ Tier 1 |
| network-capture | π Observability | Network capture (HAR/requests) β record requests for debugging | Playwright HAR, tracing, custom network logs | π Tier 3 |
| failed-page-snapshot | π§ E2E | Failure snapshot pack β screenshot + DOM snapshot + trace on fail | Playwright screenshot + DOM dump + trace | π’ Tier 1 |
| deterministic-replay | π§ E2E | Deterministic replay with mocked API β re-run UI actions w/ same API responses | Playwright route mocking, HAR replay, MSW, local stubs | π Tier 3 |
| human-like-e2e | π§ E2E | Human-like interaction simulation β delays, smooth mouse, type-by-type (brittle) | Playwright scripted βhumanizerβ layer | π΄ Tier 4 |
| ui-smoothness-telemetry | β‘οΈ Performance | UI smoothness telemetry during E2E β long tasks/FPS-ish signals (advanced) | PerformanceObserver, tracing, Chrome DevTools Protocol | π΄ Tier 4 |
| tui-testing | π₯οΈ CLI/TUI | TUI golden testing (video/snapshots) | charmbracelet/vhs, asciinema, snapshot text diffs |
π Tier 3 |
| tui-replay | π₯οΈ CLI/TUI | TUI interaction replay β scripted inputs + deterministic output (hard) | expect, pty harness, VHS tapes |
π΄ Tier 4 |
| logs-into-artifacts | π Observability | Console/test log collection β standardize logs to artifacts | GH Actions artifacts, structured logs, log scrubbing | π’ Tier 1 |
| metrics-history | π Metrics | Metrics history (time series) β store results per commit | github-action-benchmark, JSON in gh-pages or repo branch |
π‘ Tier 2 |
| pr-baseline-compare | π Metrics | PR vs baseline comparison β show deltas in PR | custom scripts, GH Checks / PR comments | π‘ Tier 2 |
| metrics-dashboard-pages | π Metrics | Metrics dashboard on GitHub Pages | static site generator + charts, gh-pages branch |
π‘ Tier 2 |
| readme-badges | π Metrics | README badges for key metrics | generate SVG badges in repo/pages (no external SaaS) | π‘ Tier 2 |
| changelog-automation | π Releases | Changelog automation β conventional commits β changelog | release-please, Changesets | π’ Tier 1 |
| release-orchestration | π Releases | Release automation β tags, GitHub Releases, publish packages | release-please / Changesets + GH Actions | π‘ Tier 2 |
| versioning-strategy | π§© Monorepo | Versioning strategy for monorepo/workspace | Changesets, semantic-release (self-contained), pnpm workspaces | π‘ Tier 2 |
| monorepo-task-runner | π§© Monorepo | Monorepo task orchestration β affected-only builds/tests | Turborepo / Nx (optional), pnpm workspaces | π Tier 3 |
| test-selection | βοΈ CI | Test selection (affected-only) β run only impacted tests | Nx/Turbo affected, custom git diff mapping | π Tier 3 |
| preview-envs | π Delivery | PR preview deployments β ephemeral env per PR (often external) | GitHub Pages (static) / external hosting (optional) | π΄ Tier 4 |
| e2e-against-preview | π Delivery | E2E against preview URL | Playwright against deployed preview | π΄ Tier 4 |
| docs-site | π Docs | Docs site generation/publish | Docusaurus/Typedoc + GH Pages | π Tier 3 |
| adr-template | π Docs | Architecture Decision Records (ADR) | Markdown template + index generator | π‘ Tier 2 |
| storybook | π Docs | Component workshop (Storybook) | Storybook + build/publish + optional visual tests | π Tier 3 |
| secrets-scan | π‘οΈ Security | Secret scanning in CI | Gitleaks / TruffleHog (run in GH Actions) | π‘ Tier 2 |
| vuln-scan | π‘οΈ Security | Dependency vulnerability scan (local-only) | npm audit/pnpm audit, OSV scanner |
π Tier 3 |
| license-compliance | π‘οΈ Security | License compliance checks | license-checker / pnpm licenses + allow/deny list | π Tier 3 |
| sbom | π‘οΈ Security | SBOM generation | CycloneDX/SPDX generators | π΄ Tier 4 |
| provenance-attest | π‘οΈ Security | Build provenance / attestations | GitHub attestations / SLSA-style (advanced) | π΄ Tier 4 |
| ai-agent-interface | π€ AI | Pluggable AI agent integration β provider-agnostic interface | custom abstraction + model adapters; run in GH Actions | π΄ Tier 4 |
| ai-sandbox-policy | π€ AI | AI sandbox/policy controls β limit permissions/cost/scope | GH token scopes, job permissions, budget enforcement | π΄ Tier 4 |
| multi-branch-benchmark | π€ AI | Multi-branch evaluation harness β compare competing solutions | consistent workflows + metrics + baseline branch | π΄ Tier 4 |
| token-cost-accounting | π€ AI | Token/cost accounting per PR | provider usage logs, GH artifacts, PR summary | π΄ Tier 4 |
| scorecard | π Metrics | Scorecard summary β single report: quality + perf + cost | custom generator to Markdown/HTML | π Tier 3 |
Lightweight but realistic client/server TypeScript monorepo for future CI + benchmarking experiments.
Whatβs inside
server: TypeScript HTTP server with Replicache pull/push endpoints, deterministic demo seed, and 2 demo bots.ui/web: Vite + React + TypeScript + TailwindCSS + shadcn/ui-style UI with 3 behaviors:- No sync (purely local)
- Full sync (push + pull)
- Pull-only (local edits donβt push; still pulls server bot changes)
Commands
From repo root:
npm installnpm run devβ run server + webDEMO_FREEZE_BOTS=1 npm run devβ freeze server bots (useful for e2e)npm run buildnpm run testnpm run lintnpm run format
Ports
- Server:
http://localhost:8787 - Web:
http://localhost:5173(proxies/apito the server)
We are going to create a demo project to have a real subject for our CI.
- Project Goal
- Build a realistic but lightweight clientβserver TypeScript project to run future CI/benchmark experiments (install/build/test/e2e).
- Support 3 data behaviors (no sync / full sync / pull only) inside one app.
- Provide a deterministic demo mode with predictable initial data (for e2e and local demos).
- Repository Structure
- Root repository contains:
/serverβ backend (TypeScript)./ui/webβ frontend (React + TypeScript).- (reserved)
/ui/tuiβ later, not now, but do not block future addition.
- Shared root files:
.editorconfig,.gitignore,README.md
- Shared lint/format setup (e.g. ESLint + Prettier) reusable by both apps.
- Package manager/runtime should be chosen to keep CI experiments easy (no unnecessary complexity).
- Root repository contains:
-
Tech Stack
Frontend (
ui/web)- Vite + React + TypeScript
- TailwindCSS
- shadcn/ui
- Themes: light/dark + default βsystemβ
- Mobile-first + responsive sidebar behavior
Backend (
server)- TypeScript HTTP server (any minimal framework is fine; prioritize simplicity/predictability)
- API endpoints required by Replicache (pull/push; auth/tenant can be simplified)
Sync
- Replicache as the primary clientβserver sync mechanism.
- Domain Model
- Project
id: stringname: string
- Todo
id: stringprojectId: stringtext: stringcompleted: booleancreatedAt: number (timestamp)updatedAt: number (timestamp)- Optional:
author:"client"|"bot"(useful for debugging/demo)
- Project
-
UI/UX Requirements
Layout
- Left sidebar + main content area.
- Mobile-first behavior:
- Desktop: sidebar open by default.
- Mobile: sidebar collapsed by default and opens as a sliding drawer.
- Sidebar content (top β bottom):
- Theme toggle: System / Light / Dark (default System)
- Menu: list of projects
- Button: βCreate projectβ
- Main content:
- Current project title
- Todo list (create/delete/edit/toggle completed)
- States: loading / empty / error
- βProfessional, pleasant UIβ:
- Clean spacing/typography, proper hover/focus states
- Smooth animations (sidebar open/close, todo add/remove, theme switching)
- Prefer shadcn/ui components (Button, Input, Dialog/Drawer, Dropdown, Toast, etc.)
-
Modes (3 Projects in Demo Mode)
In demo mode the app starts with 3 projects and prefilled data.
Project 1: βTodoList - no syncβ
- Client loads a prefilled local Todo list on startup (frontend seed).
- User can:
- add/delete/edit todos
- toggle completed
- No server synchronization.
- State can live in client store; optional persistence (indexeddb/localStorage) but not required.
Project 2: βTodoList - full-syncβ
- Full clientβserver sync via Replicache:
- client changes go to server (push)
- client receives server changes (pull)
- A server-side bot runs:
- every 5 seconds, performs one action on βitsβ todos:
- create a todo, or
- delete its todo, or
- toggle completed on a random todo
- every 5 seconds, performs one action on βitsβ todos:
- Bot changes must appear on the client (via pull).
Project 3: βTodo - pull onlyβ
- Client can create/edit todos locally, but does not push changes to the server (push disabled/ignored).
- Client still pulls updates from the server.
- A server-side bot runs:
- every 2 seconds, performs an action on the todos it can access:
- create / delete / edit text / toggle completed
- every 2 seconds, performs an action on the todos it can access:
- Client must see server changes, while local user changes remain local and do not affect the server.
- Demo Mode
- The app must support starting everything in demo mode via a single command.
- In demo mode:
- server + client run in compatible configuration
- exactly 3 projects are created
- each projectβs initial todos are deterministic (fixed seed)
- Demo mode must be stable for e2e:
- same initial data each run
- bots run on schedules (2s/5s)
- ideally allow βfreeze botsβ via an env flag (optional but useful for tests)
- Backend Requirements (Minimal)
- Replicache endpoints:
- POST
/api/replicache/push - POST
/api/replicache/pull
- POST
- Storage:
- in-memory is acceptable initially (keep it simple)
- but design with an interface to later swap in SQLite/Postgres (optional now)
- Bots:
- Bot A for full-sync (5s)
- Bot B for pull-only (2s)
- Server must:
- create initial seed for demo mode
- return correct data via pull
- Replicache endpoints:
- Client Sync Requirements (Replicache)
- Client:
- create Replicache instance and bind it to the current project
- subscribe/query Todos per project
- Per mode behavior:
- no sync: Replicache not used, or used locally with no network calls (either is fine as long as behavior matches)
- full-sync: push + pull enabled
- pull-only: push disabled/stubbed, pull enabled
- Client:
- Development Commands (Minimum)
- dev β run server + ui/web together
- dev:server β server only
- dev:web β web only
- build β build everything
- test β unit tests (can be minimal/empty at first, but the command must exist)
- lint / format β basic checks
- E2E Foundation (Plan Ahead)
- Demo mode is the default for e2e tests.
- Add stable selectors (e.g.
data-testid) for:- project list
- create project button
- add todo input
- todo item (text, checkbox, delete)
- Sync errors should be visible in the UI (toast/alert) so e2e can detect them.
- Definition of Done
- Repo with
/serverand/ui/webexists and runs locally. - Demo mode shows 3 projects with correct names.
- Sidebar + theme toggle work; responsive behavior is correct (desktop open, mobile drawer).
- Project 1 edits are local and not synced.
- Project 2 is synced; bot modifies data every 5s; client sees updates.
- Project 3 local edits do not push to server; server bot updates are pulled every 2s.
- UI looks clean and includes basic animations (sidebar, list transitions).
- Repo with