Skip to content

rjicy/Anti-Interview

Repository files navigation

English | 简体中文


AntiInterview

A desktop overlay that helps you think through interview / written-exam problems. Capture a screenshot or speak into the mic, hit a hotkey, and a streaming LLM response renders in a translucent always-on-top window that is invisible to screen-capture / screen-share tools.

Built on Tauri 2 (Rust backend + React/Vite frontend). One binary on Windows + macOS.


Features

  • Screenshot solveCtrl+H to capture, Ctrl+Enter to ask. Streams a three-section answer (Problem Analysis / Solution / Key Points).
  • Voice solveCtrl+A toggles a live ASR session, then Ctrl+Enter feeds the transcript + last N rounds of history to the LLM. Multi-round conversation with a sliding window.
  • Multi-provider LLM — OpenAI, Azure OpenAI (chat-completions API), Azure OpenAI Responses (for gpt-5.x deployments), Anthropic Claude, OpenRouter, or any OpenAI-compatible custom endpoint.
  • Stealth overlay — on Windows, SetWindowDisplayAffinity with WDA_EXCLUDEFROMCAPTURE keeps the overlay out of every standard screen recorder, browser getDisplayMedia, and the OS Snipping Tool. On macOS, an NSPanel running an alpha-fade dance hides the overlay from CGDisplayCreateImage during the capture window.
  • Click-through, no-focus, no-taskbar — the overlay never steals focus, doesn't show up in Alt-Tab, and only the toolbar strip captures the cursor (the panel body is click-through).
  • Hotkey everything — move / resize / scroll / opacity / theme / zoom / hide are all global shortcuts; see Shortcuts.
  • Deep thinking toggleCtrl+T flips a hint passed to the prompt builder, encouraging the model to slow down on hard problems.

Architecture

+----------------------+        IPC        +-----------------------+
|   React + Vite UI    | <---------------> |   Rust (Tauri 2)      |
|   (src-ui/)          |   bindings.ts     |   (src-tauri/)        |
+----------------------+                   +-----------------------+
                                                     |
            +--------------------+-------------------+-------+
            |                    |                   |       |
            v                    v                   v       v
        capture.rs          audio/             llm/        window.rs
       (screen + crop)   (cpal mic +       (4 providers,  (stealth,
                          Doubao ASR        streaming SSE  click-through,
                          WebSocket)        + cancel)      hit-test)

The full command surface (24 Tauri commands) and the test tiers covering them live in E2E_COVERAGE_MATRIX.yaml.


Quick start (Windows)

Prereqs: Node 18+, Rust 1.95+, Visual Studio 2022 with the Desktop development with C++ workload. Full toolchain steps: docs/SETUP.md.

# 1. Bootstrap: checks toolchain, runs npm install, seeds _local/config.toml
.\bootstrap.ps1

# 2. Fill in your LLM api_key in _local/config.toml (see Configuration below).

# 3. Run from a "Developer PowerShell for VS 2022" (so MSVC env is loaded):
.\src-ui\node_modules\.bin\tauri.cmd dev

If you launch from a plain PowerShell, the MSVC libs aren't on LIB and you'll hit could not open 'MSVCRT.lib' at link time. Either use the Developer PowerShell, or wrap the launch in a .bat that calls vcvarsall.bat x64 first — example in docs/SETUP.md.

First compile takes 5-15 minutes (a few hundred crates). Subsequent incremental builds finish in seconds.

Prefer manual control? The steps bootstrap.ps1 runs (cd src-ui && npm install, then cp _local/config.toml.example _local/config.toml) are fully documented in docs/SETUP.md.


Quick start (macOS)

Prereqs: Node 18+, Rust 1.95+, Xcode Command Line Tools.

xcode-select --install                # one-time, if not already installed

./bootstrap.sh                        # checks toolchain, npm install, seed config

# Edit _local/config.toml with your LLM api_key (see Configuration below).

./src-ui/node_modules/.bin/tauri dev

macOS 12.3+ is required (set in tauri.conf.json). The first run will prompt for microphone + screen-recording permissions.


Configuration

Config lives at _local/config.toml in debug builds (overrides the OS path so a single file follows the source tree). In release builds:

OS Path
Windows %APPDATA%\anti_interview\config.toml
macOS ~/Library/Application Support/anti_interview/config.toml
Linux ~/.config/anti_interview/config.toml

Example — Azure OpenAI Responses (gpt-5.x deployments)

[app]
language = "Auto"            # "Auto" | "zh-CN" | "en-US"
programming_language = "Python"
max_history_rounds = 5       # sliding-window size for voice history

[overlay]
opacity = 0.95
position = "top-right"

[llm]
provider = "azure_responses"
api_key = "<your-azure-key>"
model = "gpt-5.3-codex"      # the deployment name on Azure
base_url = "https://<resource-name>.openai.azure.com"
api_version = "2025-04-01-preview"

[asr]
provider = "volcano_ark"
access_key = ""              # leave empty to disable voice solve
app_key = ""
language = "zh-CN"

LLM provider matrix

provider Required fields Notes
openai api_key, model base_url defaults to https://api.openai.com/v1.
azure_openai api_key, model, base_url, api_version Chat-completions endpoint (gpt-4o etc).
azure_responses api_key, model, base_url, api_version Required for gpt-5.x — those only expose /responses.
anthropic api_key, model base_url defaults to https://api.anthropic.com.
openrouter api_key, model Aggregator for 200+ models.
custom api_key, model, base_url, extra_headers Any OpenAI-compatible server.

For Azure Responses specifically: in the Azure portal, the Target URI on the deployment page looks like https://<name>.openai.azure.com/openai/responses?api-version=.... Put https://<name>.openai.azure.com in base_url (without path) and the api-version value in api_version. The provider builds the full URL at request time.

ASR (voice) — optional

Voice solve uses Volcano Doubao SAUC bigmodel WebSocket ASR. Without access_key + app_key the voice path returns no_asr_key at toggle time, but screenshot solve is unaffected. Credentials are issued in the Volcano Engine console.


Shortcuts

Windows uses Ctrl, macOS uses Option (Alt).

Combo (Win / Mac) Action
Ctrl+H / Opt+H Capture screenshot
Ctrl+Enter / Opt+Enter Solve (uses screenshot + transcript)
Ctrl+A / Opt+A Toggle voice recording
Ctrl+R / Opt+R Cancel in-flight + clear state
Ctrl+M / Opt+M Hide / show overlay
Ctrl+Q / Opt+Q Quit
Ctrl+T / Opt+T Toggle deep-thinking hint
Ctrl+5 / Opt+5 Cycle theme (dark / light)
Ctrl+D / Opt+D Delete last final transcript segment
Ctrl+[ Ctrl+] Opacity down / up
Ctrl+8 Ctrl+9 Zoom out / in
Ctrl+Arrows Move overlay (30 px steps)
Alt+↑/↓ / Cmd+↑/↓ Scroll the answer panel

Source of truth: src-ui/src/lib/shortcuts.ts.


Testing

The project ships a 5-tier test pyramid documented in E2E_COVERAGE_MATRIX.yaml:

Tier What How to run
T0 Unit / contract (Rust) cargo test -p anti-interview
T1 Mocked frontend integration cd src-ui && npx playwright test
T2 Desktop runtime E2E (real binary) cd src-ui && npx playwright test --project=desktop-runtime
T3 Live provider smoke Manual, with real Azure / Doubao keys
T4 Manual / exploratory Tracked as replacement tasks in the coverage matrix

Build a release artifact

# Windows -> NSIS installer in src-tauri/target/release/bundle/nsis/
cd src-ui && npm install && cd ..
.\src-ui\node_modules\.bin\tauri.cmd build

# macOS -> .app + .dmg in src-tauri/target/release/bundle/
./src-ui/node_modules/.bin/tauri build

Bundle config (NSIS / DMG, icon paths, entitlements) is in src-tauri/tauri.conf.json.


Known limitations

  • Voice ASR is Doubao-only for now. The provider seam in src-tauri/src/audio/ is built to host more, but only Volcano Doubao SAUC is wired in.

License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors