Audit BibTeX bibliographies against authoritative metadata sources and rewrite entries to a clean, unified academic style.
Paste a .bib file — bib-check queries OpenAlex, Crossref, Semantic Scholar, DBLP (and optionally Google Scholar), flags missing fields / et al. truncation / abbreviated venues / preprint-only refs, then emits a corrected bibliography.
pip install -e .
playwright install chromium # only needed for --use-scholar
bib-check list mybib.bib | head # find the slice you want
bib-check audit mybib.bib --range 1-50Or use the browser version — zero install, runs entirely in your browser:
Every rewritten entry aims to satisfy:
- All authors listed — no
et al./and others/ truncation. - Required fields:
author,title,year, plus venue (booktitlefor conferences,journalfor articles).- Conference names in full (no abbreviations like Proc. NeurIPS).
- Journal articles must include
volume,number,pages. - Stripped:
doi,url,eprint,eprinttype,biburl,bibsource,timestamp.
- Prefer the published version — if a paper has a peer-reviewed venue, that one wins over arXiv/CoRR.
| Source | Description |
|---|---|
| OpenAlex | Primary. Free REST API, ~250M works. Resolves arXiv preprints to published versions. |
| Crossref | Authoritative DOI metadata for journals and proceedings. Catches what OpenAlex misses. |
| Semantic Scholar | Tertiary. Broader coverage, especially recent papers. |
| DBLP | CS-specific. Excellent venue acronym → full name expansion. |
| Google Scholar | Opt-in (--use-scholar). Drives Chromium via Playwright to click Cite → BibTeX. Persistent browser profile so CAPTCHA solves survive across runs. |
All results are cached on disk by query hash — re-runs never re-hit the same source.
.bib file → parse → detect issues → lookup metadata → rewrite → report + suggested.bib
- Parser: Hand-rolled, brace-depth-aware. Skips
@string/@preamble/@comment. - Issue detector: Flags missing fields,
et al., abbreviated venues, arXiv-only refs, unbalanced braces, future years, duplicate entries. - Rewriter: Merge-only strategy — never overwrites existing fields unless the original is missing or clearly a preprint. Author lists keep the original ordering. Venue acronyms expanded from a hardcoded dictionary of 40+ CS conferences.
# List entries (1-indexed positions, skips @string/@preamble/@comment)
bib-check list path/to/main.bib | head -20
# Audit a slice
bib-check audit path/to/main.bib --range 77-117
# Audit specific cite keys
bib-check audit path/to/main.bib --keys kriuk2025qkvcomm,liu2024droidspeak
# Local checks only (no network calls)
bib-check audit path/to/main.bib --range 1-50 --skip-openalex --skip-crossref --skip-dblp
# With Google Scholar fallback (headless browser)
bib-check audit path/to/main.bib --use-scholar --headless
# Whole file
bib-check audit path/to/main.bibOutput (in ./out/ by default):
| File | Content |
|---|---|
report.md |
Per-entry diff, issues, match metadata |
suggested.bib |
Rewritten entries in unified style |
cache/ |
JSON caches per source — delete to force re-query |
The docs/ directory contains a fully-client-side web app hosted on GitHub Pages:
- Zero server — parses
.bibin-browser, queries OpenAlex + Crossref + Semantic Scholar via CORS. - Live diff — unified diff (LCS-based) between original and suggested entry.
- View filters — show errors-only / hide clean / hide trusted / collapse info notes.
- Export — download
suggested.bib,report.json,report.md, or copy-all. - Persistence — bib input and results saved to
localStorage, restored on refresh. - Privacy — no bib content leaves your machine except title + author hint sent to the APIs.
Try it: chenyuheee.github.io/bib-check
This is a suggestion engine, not an oracle. Scholar metadata is sometimes wrong — especially author lists for very recent arXiv papers. Always eyeball report.md before copying entries into your bibliography.
He Chenyu — Zhejiang University
- Email: hechenyu@zju.edu.cn
- Issues: github.com/ChenyuHeee/bib-check/issues