44 skills that actually work. Built by a physician-researcher, tested on real publications.
MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. It competes on clinical submission reliability, not skill count.
Topic Discovery → Literature Search → Full-Text Retrieval → Study Design → Sample Size → Protocol → De-identification → Data Cleaning → Statistics → Figures → Writing → Humanize → Compliance → Journal Selection → Peer Review → Revision → Presentation
Created & maintained by Yoojin Nam, MD
Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea
No terminal? Use the classroom installer ZIP — download, unzip, double-click the installer, then restart your agent app (see Installation).
Have a terminal? Fastest path — one command, nothing to clone:
npx medsci-skills install # copies every skill into your agent's folderHave git? Install every skill in three commands:
git clone https://github.com/Aperivue/medsci-skills.git
mkdir -p ~/.claude/skills
cp -r medsci-skills/skills/* ~/.claude/skills/Restart Claude Code, then start with /orchestrate — it classifies your request and routes you to the right skill. Full install options (Codex, Cursor, individual skills) are in Installation.
Prefer plugins? One line adds the marketplace; /plugin then lets you browse eight category plugins and enable the ones you want:
/plugin marketplace add Aperivue/medsci-skills
/plugin # browse eight category plugins; enable the ones you want
| Plugin | Covers |
|---|---|
medsci-literature |
Literature search, full-text retrieval, Zotero sync, reference-integrity audits |
medsci-data |
Study design, variable operationalization, sample size, data cleaning, de-identification, codebooks, dataset versioning |
medsci-analysis |
Statistics, figures, batch/cross-national/replication analysis, meta-analysis |
medsci-writing |
IMRAD & protocol drafting, AI-pattern removal, AI-search optimization, reviewer responses |
medsci-review |
Self-review, peer review, reporting-guideline compliance |
medsci-submission |
Submission packaging, journal selection, ICMJE/IRB form filling, grant proposals |
medsci-project |
Orchestration, project intake/management, gap & topic discovery, author strategy |
medsci-presentation |
Presentations/PPTX, PDF/document rendering, environment setup, skill publishing |
Install a single category and invoke its skills under that namespace:
/plugin install medsci-analysis@medsci-skills
/medsci-analysis:analyze-stats
All eight plugins share the same repository source, so this groups and enables skills by category — it is not a partial download. The marketplace tracks main, so a plugin's version is its git commit.
Want just one capability? Two skills are also published as focused standalone repos (generated mirrors; this repo stays the source of truth), each installable on its own with /plugin marketplace add Aperivue/<repo>:
Aperivue/verify-refs— catch fabricated/mismatched citations (PubMed + CrossRef).Aperivue/check-reporting— audit a manuscript against 32 EQUATOR reporting guidelines.
Three public datasets. Three study types. Each produces a complete manuscript, publication-ready figures, and a reporting compliance audit.
| Demo | Dataset | Study Type | Compliance |
|---|---|---|---|
| Demo 1: Wisconsin BC | sklearn built-in |
Diagnostic accuracy | STARD 2015 |
| Demo 2: BCG Vaccine | metafor::dat.bcg (13 RCTs) |
Meta-analysis | PRISMA 2020 |
| Demo 3: NHANES Obesity | CDC NHANES 2017-18 | Epidemiology (survey) | STROBE |
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer() # 569 samples, zero downloadOutput from orchestrate --e2e (see full demo):
Full output list — manuscript, figures, STARD flow, checklist (click to expand)
| Output | Description |
|---|---|
| Manuscript | IMRAD draft, ~1,800 words |
| Title Page | STARD title page with key points |
| DOCX | Submission-ready Word document |
| ROC Curve | 3-model comparison with DeLong 95% CIs |
| Confusion Matrices | Per-model confusion matrices at threshold 0.5 |
| STARD Flow | D2-generated STARD 2015 flow diagram |
| Reporting Checklist | STARD 2015 — 60.9% compliance (14/23 applicable) |
| Self-Review | Initial 82 (REVISE) → 88 (PASS) after 1 fix iteration; final 0 major / 1 minor |
| Pipeline Log | 7-step E2E execution trace |
Pipeline: analyze-stats → make-figures → write-paper → AI pattern scan → check-reporting (STARD) → self-review → DOCX build → present-paper
library(metafor)
data(dat.bcg) # 13 RCTs, 357,347 participants (Colditz et al. 1994)Output from orchestrate --e2e (see full demo):
Full output list — manuscript, forest/funnel plots, PRISMA flow, checklist (click to expand)
| Output | Description |
|---|---|
| Manuscript | Pooled RR = 0.489 (95% CI: 0.344–0.696), ~2,200 words |
| Title Page | PRISMA title page with key points |
| DOCX | Submission-ready Word document |
| Forest Plot | 13 studies, RE model (REML), 300 dpi |
| Funnel Plot | Small-study / publication-bias visual |
| PRISMA Flow | D2-generated PRISMA 2020 flow diagram |
| Reporting Checklist | PRISMA 2020 — 57.1% (24/42) at check-reporting → 61.9% (26/42) after self-review fix |
| Self-Review | Initial 78 → 82 (REVISE) after 1 fix iteration; 3 major / 4 minor (majors are out-of-scope RoB/GRADE/references) |
| Pipeline Log | 7-step E2E execution trace |
Pipeline: analyze-stats (R metafor) → make-figures → write-paper → AI pattern scan → check-reporting (PRISMA 2020) → self-review → DOCX build → present-paper
# Pre-processed NHANES 2017-2018 CSV included
# 5,010 US adults after exclusionsOutput from orchestrate --e2e (see full demo):
Full output list — manuscript, OR forest plot, STROBE flow, checklist (click to expand)
| Output | Description |
|---|---|
| Manuscript | Adjusted OR = 3.03 (95% CI: 2.29–4.02), ~1,850 words |
| Title Page | STROBE title page with key points |
| DOCX | Submission-ready Word document |
| OR Forest Plot | Adjusted odds ratios for 7 variables |
| Study Flow | D2-generated participant flow diagram |
| Reporting Checklist | STROBE — 83.3% compliance (25/30 applicable) |
| Self-Review | ACCEPT-WITH-NOTES after 1 fix iteration; 0 genuine majors remaining |
| Pipeline Log | 7-step E2E execution trace |
Pipeline: analyze-stats → make-figures → write-paper → AI pattern scan → check-reporting (STROBE) → self-review → DOCX build → present-paper
Each demo (and real project) follows this role-based folder layout:
project/
├── data/ # Input data
│ └── raw_data.csv
├── analysis/ # /analyze-stats + /make-figures outputs
│ ├── tables/
│ ├── figures/
│ │ └── _figure_manifest.md
│ ├── _analysis_outputs.md
│ └── analyze.py
├── manuscript/ # /write-paper outputs
│ ├── manuscript.md
│ ├── manuscript_final.docx
│ └── title_page.md
├── qc/ # Quality verification
│ ├── reporting_checklist.md # /check-reporting
│ ├── self_review.md # /self-review
│ └── _pipeline_log.md
├── submission/ # Post-journal-selection (manual trigger)
│ └── {journal_short}/
│ ├── cover_letter.md
│ ├── checklist.md
│ └── peer_review.md
└── presentation/
└── presentation.pptx
The E2E pipeline (orchestrate --e2e) produces everything up to qc/. The submission/ directory is created after journal selection via /find-journal.
v4.1 ships distribution levers and a submission pre-flight gate — analysis-integrity detectors 24 → 25 (still 43 skills):
- Claude Code plugin marketplace —
/plugin marketplace add Aperivue/medsci-skills, then/plugindiscovery of eightmedsci-*category plugins generated from the catalog SSOT (.claude-plugin/marketplace.json). - MedSci-Audit detector registry — the deterministic verification layer is now a named, enumerated, citable suite (
MEDSCI_AUDIT.md+ generatedmetadata/detectors_catalog.json, six audit families). - Hero-skill standalone mirrors —
scripts/sync_hero_skill.pymirrors a focused skill to its own star-funnel repo; first two live:Aperivue/verify-refsandAperivue/check-reporting. - Placeholder/marker gate —
check_placeholders.pyflags leftover[@NEW:]/[version]/TODO/ template-URL markers before submission (the 25th detector). - Submission pre-flight gate —
preflight_gate.pybundles the existing detectors +/verify-refsinto one halt-on-failure command (qc/preflight_gate_report.json, non-zero exit on any blocker) — the single last step before freeze.
v4.0 extends the project's own deterministic, no-drift SSOT discipline to the public storefront and finishes the detector backlog — bringing the analysis-integrity detector count in skills/ to 24 (still 43 skills):
- SSOT to the storefront — a generated, machine-readable
metadata/skills_catalog.json(slug + research-lifecycle category + one-line description per skill) is now the source the aperivue.com storefront vendors, gated offline so the public site can never silently drift behind the repo (gen_skills_catalog_json.py --check). - Asset/figure anonymization —
/sync-submissionscans figure-generating scripts, figure-PDF rendered text, and docx/PDF metadata authors for the institution/author leaks a body-text scan misses (check_asset_anonymization.py). - Cross-artifact staleness — flags supplement values that disagree with the corrected body, and reporting checklists built against an older manuscript version (
check_cross_artifact_stale.py;check_checklist_version.pywith atarget_manuscript/source_sha256checklist contract). - Survival reporting —
/analyze-statsemits median survival with its 95% CI, a Cox events-per-variable gate, and cluster-robust SE for nested observation units.
v3.8.0 adds an evaluation/ harness suite that validates the instrument itself — deterministic detector recall on programmatically seeded defects (E1), fresh-clone manifest reproducibility (E4), claim audit-trail completeness (E5), host-portability and metadata-drift checks (E6/E7/E8), and a cost/time table (E3) — each writing a self-describing, reproducible run package. An LLM-comparator (E2) and a self-review convergence harness (E9) ship runnable but are NOT executed in this release. This release also reconciles the README Live-Demos numbers with the v3.7.0 clean-room demo artifacts. Catalog unchanged (still 43 skills, 21 detectors).
v3.7.0 adds three deterministic, stdlib-only detectors on top of the v3.6.0 panel-derived gates — bringing the analysis-integrity detector count in skills/ to 21 — without broadening the catalog (still 43 skills):
- Reference adequacy —
/self-reviewand/write-papernow check that a draft cites enough references in the right sections and that every named method (a competing-risk model, multiple imputation, the E-value, an eGFR equation) carries a citation — the adequacy layer that complements/verify-refs's integrity layer (check_reference_adequacy.py). - Panel lens-diversity —
/self-review --panelpost-processes its reviewers so the cost buys breadth, not a louder echo (check_panel_diversity.py). - Generated-code quality —
/analyze-statslints emitted analysis scripts for reproducibility slop (missing seed, hard-coded data literals, absolute paths, in-place source overwrite) (check_generated_code.py).
Plus a publish-time skill-worthiness gate (/publish-skill) and public adoption/impact tracking (IMPACT.md).
v3.6.0 lands 18 gates from a 13-project panel self-review (158 traces → 12 recurring defect patterns), without broadening the catalog (still 43 skills). Six new stdlib detectors join the existing trio — deterministic where a grep is clean, prose/probe where the call needs a human:
- Cohort & pool arithmetic —
/self-reviewrecomputes incidence rates from events ÷ person-years, balances STROBE exclusion cascades, and checks ordinal tier/stratum partitions for disjointness (check_cohort_arithmetic.py);/meta-analysislocks patient/lesion aggregate totals and requires re-run evidence for any "fixed" audit note. - Claim ↔ artifact ↔ scope — Methods ↔ Results ↔ disk coverage (a run-but-unreported analysis is flagged), an endpoint ↔ conclusion scope gate (a cross-sectional design cannot license a surveillance claim; a binary surrogate is not a care directive), and a reviewer-team 3-way that makes an LLM-as-reviewer fatal.
- Statistical & reporting contracts — a CI/estimand output contract (quantile/proportion/sHR must carry a 95% CI; Cox EPV gate; proportion-MA Egger ban + prediction interval), interval-censoring / PH-violation / CIF-horizon survival rules, reporting-framework base+extension naming, a classical-style body lint, a PROSPERO ID format gate, and a pagination-placeholder citation gate.
Earlier in this series: analysis-integrity guards (confounding completeness, claim-vs-artifact, structural-zero handling), a multi-agent /self-review --panel mode, and shared domain-probe modules vendored byte-identical into /peer-review and /self-review with a CI drift gate.
| MedSci Skills | Broad skill aggregators | |
|---|---|---|
| Citation quality | Every reference passes reference-verification gates (PubMed / Semantic Scholar / CrossRef) and citation-audit workflows before inclusion. | No verification -- citations generated from model memory |
| Pipeline integration | Skills call each other in defined chains. design-study -> calc-sample-size -> write-protocol. |
Standalone stubs with no cross-skill interaction |
| End-to-end coverage | From IRB protocol to journal submission: sample size, data cleaning, analysis, writing, compliance, journal selection, cover letter. | Gaps at every transition -- no protocol, no journal matching, no cover letter |
| Battle-tested | Used on real manuscript submissions by a practicing physician-researcher | Unknown provenance and validation |
| Depth per skill | 150-600 lines of documentation + bundled reference files (curated journal profile library, checklists, formula sheets, code templates) | Typically thin SKILL.md templates |
MedSci-Audit — the verification edge in the first rows above is a named suite of 25 deterministic detectors (citation & reference integrity, cohort & pool arithmetic, scope/estimand contracts, reporting compliance, and more) that catch fabricated or drifted content before a manuscript reaches a reviewer. See MEDSCI_AUDIT.md for the suite, its six families, and its evaluation evidence.
This is not a broad scientific-tooling library — for cheminformatics, structural biology, or genomics pipelines, see K-Dense scientific-agent-skills. It is not a biomedical-skill aggregator — for a broad curated collection, see OpenClaw Medical Skills. For how MedSci Skills compares to these catalogs, see docs/competitive_positioning.md. For verified cross-agent install paths (Claude Code, Codex, Cursor, GitHub Copilot), see docs/host_compatibility.md.
MedSci Skills is opinionated and narrow on purpose: a single physician-researcher's medical-manuscript pipeline, biased toward radiology, diagnostic accuracy, observational EMR studies, and systematic review / meta-analysis. If you write IMRAD manuscripts for clinical journals, audit reporting compliance against EQUATOR guidelines, or run SR/MA workflows end-to-end, this is built for you. For wet-lab protocols, drug discovery, or single-cell genomics, the repos above are better fits.
📖 Per-skill reference: docs/skills/ — one page per skill (what it does, when it activates, its Quality Card — purpose, safety boundaries, known limitations, validation, evidence — and bundled resources), generated from each SKILL.md + skill.yml. See docs/skills/AUDIT.md for how the skills are validated.
┌─────────────────────────────────┐
│ orchestrate: single entry point │
│ classifies intent, routes to │
│ the right skill or chains them │
└───────────────┬─────────────────┘
│
┌───────────────────────────┼───────────────────────────┐
│ │ │
intake-project (main pipeline) grant-builder
(new/messy projects) │ (proposals)
│ │
▼ ▼
┌── calc-sample-size ──┐
│ ▼
ma-scout -> search-lit -> fulltext-retrieval -> design-study ──> write-protocol -> manage-project
│ │
│ └── find-cohort-gap (DB variables → literature gap → ranked topic proposals)
│ │
│ ▼
│ deidentify -> clean-data -> analyze-stats -> make-figures -> write-paper
│ │ │
│ replicate-study (paper → new DB) humanize
│ cross-national (parallel survey) │
│ batch-cohort (N × M matrix) ▼
│ find-journal <── self-review
│ │ │
│ │ ▼
│ │ humanize -> academic-aio (AI-search visibility)
│ ▼
│ [cover-letter] -> check-reporting -> revise -> present-paper
│ │
└── meta-analysis peer-review
lit-sync (Zotero + Obsidian sync) author-strategy (PubMed profile analysis)
┌─────────────────────────────────────────────┐
│ publish-skill: package any skill above for │
│ open-source distribution (PII audit, │
│ license check, generalization) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ add-journal: add new journal profiles to │
│ the database (write-paper + find-journal │
│ dual profile generation with quality gates)│
└─────────────────────────────────────────────┘
| Skill | What It Does |
|---|---|
| orchestrate | Single entry point for the full bundle. Classifies your request and routes to the right skill -- or chains multiple skills for multi-step workflows. Full Pipeline Mode runs analyze-stats → make-figures → write-paper → check-reporting → self-review end-to-end. --e2e flag for fully autonomous execution with post-skill validation and halt-on-failure. |
| find-cohort-gap | Research gap finder for longitudinal cohort databases. Profiles cohort strengths, matches PI expertise, scans literature saturation via 6-Pattern scoring, and outputs ranked topic proposals with comparison tables and one-pagers. Works with any cohort: NHIS, UK Biobank, institutional EMR, health checkup registries. |
| search-lit | PubMed + Semantic Scholar + bioRxiv search with anti-hallucination citation verification. Token-efficient error handling -- CrossRef failures are silently batched, not repeated. BibTeX output tags each entry with verified/verified_by/verified_on fields so downstream skills can trust the citation provenance. |
| verify-refs | Pre-submission reference audit for .md, .docx, .bib, or .tsv inputs. Extracts references, verifies DOI/PMID via CrossRef/PubMed when available, and writes qc/reference_audit.json as the sole output — row-level status (OK / MISMATCH / UNVERIFIED / FABRICATED) lives inside the JSON records[] block. /search-lit produces candidate BibTeX; /lit-sync owns manuscript/_src/refs.bib. |
| fulltext-retrieval | Batch open-access PDF downloader. Unpaywall → PMC → OpenAlex → CrossRef pipeline. OA-only -- no paywall bypass. Input: DOI list or TSV. Optional PDF→Markdown conversion via pymupdf4llm for token-efficient LLM analysis of academic papers. |
| check-reporting | Manuscript compliance audit against 32 reporting guidelines and risk of bias tools (STROBE, STARD, STARD-AI, TRIPOD, TRIPOD+AI, PRISMA, PRISMA-DTA, PRISMA-P, MOOSE, ARRIVE, CONSORT, CARE, SPIRIT, CLAIM, SQUIRE 2.0, CLEAR, GRRAS, MI-CLEAR-LLM, SWiM, AMSTAR 2, QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Machine-readable JSON summary with compliance_pct and fixable_by_ai flags for automated pipeline integration. |
| analyze-stats | Statistical analysis code generation (Python/R) for diagnostic accuracy, DTA meta-analysis (bivariate/HSROC), inter-rater agreement, survival analysis, demographics tables, regression (logistic/linear), propensity score (matching/IPTW/overlap weighting), and repeated measures (RM ANOVA/GEE/mixed models). Calibration mandatory for prediction models. |
| meta-analysis | Full systematic review and meta-analysis pipeline (8 phases). DTA (bivariate/HSROC) and intervention meta-analysis. Protocol to submission-ready manuscript with PRISMA-DTA compliance. |
| make-figures | Publication-ready figures and visual abstracts: ROC curves, forest plots, PRISMA/CONSORT/STARD flow diagrams, Kaplan-Meier curves, Bland-Altman plots, confusion matrices, and journal-specific visual/graphical abstracts (python-pptx template-based). Communication-first design principles (Nat Hum Behav 2026 — key message, audience, cognitive load, figure-vs-table decision) and five flow-diagram production lessons (official-template fidelity, VML fallback PDF export, docx XML escape, sequential placeholder mapping, version freeze); critic rubric Section G adds 5 communication-first checks. --study-type auto-generates the full required figure set; structured _figure_manifest.md output for downstream pipeline consumption; D2 enforced as default for flow diagrams. |
| design-study | Study design review: identifies analysis unit, cohort logic, data leakage risks, comparator design, validation strategy, and reporting guideline fit. |
| design-ai-benchmarking | Design and validity review for benchmarking AI system(s) against a human-expert panel: evaluation-question and arm definition, decoupled multi-dimensional rubrics with anchors, planted calibration probes (positive-control / known-bad / instability / mechanism-contradiction), reviewer-panel construction with per-reviewer randomization, inter-rater reliability targets with separate control-item reliability, LLM-as-judge vs human-as-judge adjudication, construct-independence guards, and a structured JSON rating-export schema. Locks the rubric before data collection. |
| intake-project | Classifies new research projects, summarizes current state, identifies missing inputs, and recommends next steps. |
| grant-builder | Structures grant proposals: significance, innovation, approach, milestones, and consortium roles. |
| present-paper | Academic presentation preparation: paper analysis, supporting research, speaker scripts, slide note injection, and Q&A prep. |
| publish-skill | Convert personal Claude Code skills into distributable, open-source-ready packages. PII audit, license compatibility check, generalization, and packaging workflow. |
| write-paper | Full IMRAD manuscript pipeline (8 phases). Outline to submission-ready manuscript with critic-fixer loops, AI pattern avoidance, and journal compliance. Anti-interpretation guardrails in Results; interactive Discussion planning with anchor paper input. Case report mode (CARE 2016, 1000-word short-form). Optional cover letter generation (Phase 8+). LLM Disclosure: auto-generates disclosure statements in Methods, Acknowledgments, and Cover Letter (opt-out via --no-llm-disclosure). --autonomous flag skips all user gates for fully automated manuscript generation; Phase 2 auto-calls /make-figures --study-type with manifest verification; Phase 7 enforces strict sequential QC chain (check-reporting → search-lit → self-review fix loop → DOCX build). |
| review-paper | Scaffold and draft a literature review — narrative (SANRA), scoping (PRISMA-ScR + JBI), or systematic (PRISMA 2020). Asks for the spine axis (modality / task / lifecycle), builds a 7-part skeleton with a required Intro scope/non-overlap block, per-section summary-table stubs, and an evaluation-metrics critique subsection, then wires reporting/registration and hands off to /self-review (RV1-RV8) → /check-reporting → /verify-refs → /humanize. Never invents citations. |
| self-review | Pre-submission self-review from reviewer perspective. 10 categories with research-type branching (AI, observational, educational, meta-analysis, case report, surgical). Anticipated Major/Minor format with severity framing and optional R0 numbering for /revise pipeline. --json structured output with fixable_by_ai flags; --fix mode auto-applies text fixes (max 2 iterations). |
| revise | Response to reviewers with tracked changes. Parses decision letters, classifies comments as MAJOR/MINOR/REBUTTAL, generates point-by-point responses and cover letter. |
| sync-submission | SSOT-to-submission drift audit and journal package helper. Treats submission/{journal}/ as derived output, records source hashes in .journal_meta.json, and blocks freezing drifted packages. |
| manage-project | Research project scaffolding and progress tracking. Commands: init, status, sync-memory, checklist, timeline. Backwards submission timelines and pre-submission checklists. init --zotero-collection NAME auto-creates the Zotero collection via pyzotero and wires the library_id/collection_key into the project contract. |
| calc-sample-size | Interactive sample size calculator with decision-tree guided test selection. Covers 11 designs (diagnostic accuracy, t-test, ANOVA, chi-square, McNemar, logistic regression, Cox regression EPV, survival, ICC, kappa, non-inferiority/equivalence). Generates reproducible R/Python code and IRB-ready justification text. |
| find-journal | Journal recommendation engine. 2-pass matching: compact profiles for scoring, write-paper profiles for top-5 enrichment. Covers 30+ medical specialties, with a user-local private tier for personal-use profiles. No cached IF/APC -- you verify current metrics at journal sites. Post-rejection re-targeting mode. |
| add-journal | Add new journal profiles to the database. Extracts metadata from author guidelines, generates both write-paper (detailed) and find-journal (compact) profiles in canonical format with quality gates. Batch mode for adding multiple journals in one session. |
| deidentify | De-identify clinical research data before LLM-assisted analysis. Standalone Python CLI (no LLM) with 10 country locale packs (kr, us, jp, cn, de, uk, fr, ca, au, in). Detects PHI via regex + heuristics. Interactive terminal review, pseudonymization, date shifting, mapping file generation. Custom locale support via --locale-file. |
| clean-data | Interactive data profiling and cleaning assistant. Three-stage workflow: profile your CSV/Excel data, flag issues (missing values, outliers, duplicates, type mismatches), then generate cleaning code for approved actions only. PHI/PII safety warnings built-in. |
| write-protocol | IRB/ethics protocol generator. Produces 4 core sections (Background, Study Design, Sample Size Justification, Statistical Plan) with full prose. 6 remaining sections provided as structured skeletons with TODO markers for institution-specific content. Korea/US/EU regulatory guidance. |
| replicate-study | Replicate an existing cohort study on a different database. Extracts methodology from a source paper, maps variables via harmonization table, generates analysis code, and produces a replication difference report. Validated on KNHANES/NHANES cross-national replication. |
| cross-national | End-to-end cross-national comparison study. Variable harmonization, parallel weighted survey analysis (no data pooling), and country-stratified comparison tables. Built-in KNHANES + NHANES coding references. |
| batch-cohort | Generate N analysis scripts from one validated template × multiple exposure/outcome combinations. The "80-person team" pattern: same method, swap variables only. Self-adjustment prevention, EPV checks, Bonferroni correction, and summary heatmaps. Validated with 18 combinations on KNHANES 2018. |
| humanize | Detect and remove AI writing patterns from academic manuscripts. Scans for 18 common patterns (significance inflation, AI vocabulary, copula avoidance, etc.) and rewrites flagged passages while preserving technical accuracy. Density target: <2.0 instances per 1000 words. |
| author-strategy | PubMed author profile analysis. Fetches publication data via E-utilities, classifies study types (GBD, SR/MA, NHIS, AI/ML, etc.), generates 7 visualizations, and produces a strategy report with replication opportunities. |
| peer-review | Structured peer review drafting for medical journals. Systematic manuscript analysis, journal-specific formatting (RYAI, INSI, EURE, AJR, KJR), conciseness targets (500-800 words), and pre-submission QC checklist. Constructive developmental tone. |
| ma-scout | Meta-analysis topic discovery and feasibility assessment. Two modes: (A) Professor-first — profile → pillar analysis → MA gaps, (B) Topic-first — question → landscape scan → co-author matching. Multi-source validation (PubMed, PROSPERO, bioRxiv) with realistic k estimation (15-30% discount). |
| lit-sync | Sync research references from .bib files to Zotero library + Obsidian literature notes. Concept extraction from 10+ literature notes with cross-cutting theme discovery. Works after /search-lit or standalone. |
| academic-aio | AI search engine (Perplexity / ChatGPT web / Elicit / Consensus / SciSpace) and RAG visibility checklist for medical AI papers. Integrates TRIPOD+AI, CLAIM, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting anchors with generative-engine-optimization (GEO) principles. Covers title, abstract, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), preprints, GitHub README, CITATION.cff, Zenodo, and Hugging Face model/dataset cards. Explicit defense against LLM citation fabrication (Agarwal 2025, Nat Commun). Produces a visible PASS/PARTIAL/FAIL checklist; never applies edits silently. Pairs with write-paper Phase 4/6/7, runs after self-review + humanize. |
| manage-refs | Reference lifecycle as a single skill: citekey ↔ .bib validation, journal-CSL pandoc rendering (render_pandoc.sh), manuscript ↔ rendered DOCX cross-reference QC (check_xref.py --strict is the submission gate), [N] ↔ [@key] marker conversion, and native Zotero CWYW field-code injection for co-author Word workflows. Hybrid 3-phase strategy (pandoc draft → CWYW transition → Zotero CWYW for circulation/revision/submission). Sole writer of manuscript_final.docx and qc/xref_audit.json. Split out of write-paper Phase 7.6 so revise, peer-review, sync-submission, and find-journal can render directly without depending on a sibling skill. |
| render-pdf-doc | Render non-bibliography academic markdown (proposal, briefing handout, anchor doc, IRB cover, reference table) to publication-quality PDF via pandoc + xelatex with CJK font fallback (Apple SD Gothic Neo on macOS, Noto Sans CJK KR on Linux) and content-proportional pipe-table column widths. Boundary opposite of manage-refs (bibliography-driven). Spun off from write-paper Phase 7.6. |
| define-variables | Literature-grounded variable operationalization for observational research. Turns a data dictionary plus research question into a citation-backed table of exposure / outcome / covariate definitions, cutoffs, and DB-variable mappings. Tier 0 dictionary-first rule prevents ad-hoc phenotype definitions that invite reviewer rejection. Bridges /search-lit output into /write-protocol Methods. |
| generate-codebook | Generate a citable data dictionary / codebook from a tabular dataset (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles every variable — role, type, level frequencies, range/quantiles, missingness — into codebook.md + codebook.json. Flags coded variables whose level meanings are unknown as [NEEDS DICTIONARY] rather than guessing them, feeding /define-variables and the dictionary-first workflow. |
| version-dataset | Dataset version control for reproducibility. Builds a deterministic content-hash manifest (file SHA-256 + tabular schema + per-column value hashes), verifies a later copy to detect drift (schema / row-count / value changes), and diffs two manifests. Locks "which version of the data the results came from"; also reproducibility-locks the bundled demos. |
| fill-protocol | Fill institutional Word form templates (.doc / .docx) for IRB protocols, ethics applications, grant proposals, and other structured research documents while preserving the original styles, table layouts, fonts, and page geometry. Korean-aware (CJK eastAsia font enforcement, table cantSplit) but works for any-language template. Pairs with write-protocol (content) — fill-protocol renders the content into the institutional template. |
| fill-icmje-coi | Batch-generate per-author ICMJE Conflict of Interest Disclosure Forms (coi_disclosure.docx) for manuscript submission. Pre-fills all 13 disclosure items as "☒ None" plus the final certification using a synthetic seed template, then clones the seed per author with Date / Name / Manuscript Title replaced. Designed for the common case of hospital-based observational research where no author has real financial conflicts; circulated forms become "reply 변경 없음 + sign" for most authors and only flag those who need to amend. |
| setup-medsci | Diagnostic checklist for the MedSci Skills runtime. Verifies Python, R, Node, the agent host, Git, Zotero, and configured MCP servers, then prints a pass/fail table with links to the right setup doc for any missing component. Read-only — installs nothing. |
No terminal? Use the classroom installer ZIP. Download, unzip, double-click the installer, then restart your desktop agent app.
Windows:
https://github.com/Aperivue/medsci-skills/releases/latest/download/medsci-skills-classroom-windows.zip
macOS:
https://github.com/Aperivue/medsci-skills/releases/latest/download/medsci-skills-classroom-macos.zip
After unzipping:
- Windows: double-click
installers/install-windows.cmd - macOS: double-click
installers/install-macos.command
Then restart Claude Code Desktop, Codex Desktop, or Cursor and test with:
MedSci Skills가 설치됐는지 확인하고, 오늘 실습에 쓸 대표 스킬 5개만 보여줘.
git clone https://github.com/Aperivue/medsci-skills.git
cp -r medsci-skills/skills/* ~/.claude/skills/git clone https://github.com/Aperivue/medsci-skills.git
cp -r medsci-skills/skills/check-reporting ~/.claude/skills/A convenience wrapper for terminal users — it copies the same skills via the dependency-free Python installer. The canonical install paths remain the plugin marketplace (Option 1's sibling above) and the git clone above; npm is just a shortcut.
npx medsci-skills install # all hosts (Claude, Codex, Cursor)
npx medsci-skills install --target claude
npx medsci-skills list # list bundled skills
npx medsci-skills doctor # quick Node/Python/skill-folder checkRequires Node 18+ and (for install/doctor) python3 on your PATH.
- Claude Code: skills are copied to
~/.claude/skills/(also read by GitHub Copilot and Cursor). - Codex: skills are copied to
~/.agents/skills/(also read by Cursor and GitHub Copilot). - Cursor: no separate step needed — Cursor reads
~/.claude/skills/and~/.agents/skills/directly. The installer can still write an optional.cursor/rules/steering rule with--cursor-project. - See
docs/host_compatibility.mdfor the verified per-host install paths and their official sources. - Windows users do not need WSL for the basic classroom workflow. Use WSL only for advanced reproducible Linux toolchains.
See docs/classroom_distribution_plan.md and docs/classroom_materials.md for instructor distribution, email templates, and first-class exercises.
Tip: Not sure which skill to use? Start with
/orchestrate-- it will classify your request and route you to the right tool.
orchestrate --e2e or write-paper --autonomous runs the full pipeline from data to submission-ready DOCX with bounded validation. Skills pass outputs via structured manifests (_analysis_outputs.md, _figure_manifest.md) and project artifacts (project.yaml, artifact_manifest.json, qc/status.json). If a skill fails to produce expected outputs, the pipeline halts rather than proceeding with missing data. Phase 7 enforces a strict QC chain: AI pattern removal → reporting compliance check → /verify-refs citation audit → numerical claim audit → self-review with auto-fix (max 2 iterations) → DOCX/submission build.
Every reference produced by search-lit is verified against PubMed, Semantic Scholar, or CrossRef APIs. Existing manuscripts should then run /verify-refs, which writes a visible reference audit and blocks fabricated references before submission. No citation is ever generated from memory alone. API errors are batched silently -- no token waste from repeated failure messages.
/meta-analysis Phase 6b, /self-review Phase 2.5a, /revise Step 2.5, and /write-paper
Step 7.3a enforce a common 3-layer audit (CSV ↔ analysis script ↔ manuscript) with primary-
source back-checking for pooled estimates and revision-era numbers. Hand-typed numerical
matrices without CSV-coordinate comments are flagged as structural risks even when the values
are currently correct, since the next revision will re-introduce the same failure mode.
Projects declare their source-of-truth layout in SSOT.yaml, and a qc/migration_complete marker gates strict enforcement. /verify-refs is the sole writer of qc/reference_audit.json. The MEDSCI_VERIFY_REFS_MODE env var (auto default, warn, enforce, off) controls behavior — auto blocks only when both SSOT.yaml and the migration marker are present, otherwise warns. Legacy projects freeze as warn-only; new projects opt in via scripts/migrate_project_to_ssot.py. An optional PostToolUse hook (not shipped in this repo — document only) can invoke /verify-refs automatically on manuscript saves for users who install it locally at ~/.claude/hooks/verify-refs-guard.sh; the regression suite (tests/test_phase1c_hooks.sh) runs end-to-end only when that local hook is present and is skipped otherwise.
/meta-analysis ships empirical failure-mode references (data integrity, review orchestration, submission package drift, post-submission release ops) with four automation hooks: scripts/prisma_5way_consistency.py (DI-6 PRISMA number consistency), scripts/extraction_consensus_log_init.py (DI-1 dual-extraction scaffold), scripts/tag_cleanup_gate.sh (DI-8 placeholder tag gate), and scripts/verify_package_integrity.py (SPD SHA-256 manifest for submission bundles).
check-reporting includes bundled checklists for 32 guidelines and risk-of-bias tools: STROBE, STARD, STARD-AI, TRIPOD, TRIPOD+AI, PRISMA 2020, PRISMA-DTA, PRISMA-P, MOOSE, ARRIVE, CONSORT, CARE, SPIRIT, CLAIM, SQUIRE 2.0, CLEAR, GRRAS, MI-CLEAR-LLM, SWiM, AMSTAR 2, QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA. Includes Results/Discussion section boundary checks and machine-readable JSON summary for pipeline integration.
analyze-stats generates reproducible Python/R code for 13 analysis types -- including regression, propensity score, and repeated measures -- with mandatory calibration for prediction models. make-figures produces journal-specification figures (300 DPI, colorblind-safe palettes, proper dimensions), visual/graphical abstracts, and a tool selection guide (D2 for flow diagrams, matplotlib for data plots). --study-type auto-generates the complete figure set for each study design.
write-paper enforces strict separation: Results contain only factual findings (no interpretation, no "why"), Discussion uses interactive anchor-paper scaffolding. The critic rubric includes a dedicated Section Boundaries pass/fail gate.
design-study -> calc-sample-size -> write-protocol gives you an IRB-ready protocol. After data collection: clean-data -> analyze-stats -> write-paper -> self-review -> find-journal -> cover letter. Every transition is a defined skill handoff.
Skills call each other. check-reporting invokes make-figures for PRISMA diagrams. write-paper calls search-lit for citation verification. self-review delegates reporting compliance to check-reporting. calc-sample-size output feeds directly into write-protocol's IRB justification section.
New to Python, R, or the command line? The full step-by-step guide for clinicians is in docs/setup/:
- Mac setup — Homebrew → Python 3.11 → R → Node → Claude Code (~30 min)
- Windows setup — winget-based, no WSL required
- MCP server setup — Zotero, Google Drive, PubMed integration
- Common issues — top 10 fixes (PATH, Apple Silicon, antivirus, JSON syntax)
Verify your environment with the diagnostic skill (read-only, installs nothing):
/setup-medsci
Prints a checklist showing which components are present, which are missing, and which doc to follow for any gap.
- An Agent Skills-compatible host — Claude Code (primary), or Codex / Cursor / GitHub Copilot (see
docs/host_compatibility.md; some live-data workflows rely on Claude MCP servers) - Python 3.9+ (for statistical analysis and figure generation)
- R 4.0+ with
meta(>=7.0),metafor(>=4.0),mada(>=0.5.11) packages (for meta-analysis)
"I have data and want a complete manuscript with zero manual steps."
/orchestrate --e2e # Autonomous: analyze → figures → write → QC → DOCX
Or equivalently: /write-paper --autonomous if analysis and figures already exist.
"I have a diagnostic accuracy study draft and need to check compliance."
/design-study # Review study design for leakage and bias
/analyze-stats # Generate DTA statistics (sensitivity, specificity, AUC with CIs)
/make-figures # Create ROC curve + STARD flow diagram
/check-reporting # Audit against STARD checklist
"I'm starting a meta-analysis and need to find relevant studies."
/search-lit # Systematic search across PubMed + Semantic Scholar
/fulltext-retrieval # Batch download open-access PDFs for included studies
/meta-analysis # Full DTA or intervention MA pipeline
/make-figures # Forest plot + PRISMA flow diagram
/check-reporting # Audit against PRISMA-DTA checklist
"I need to present a paper at journal club."
/present-paper # Analyze paper, find supporting refs, draft speaker script
"I need to submit an IRB protocol for a new study."
/search-lit # Background literature for rationale
/design-study # Validate study design, identify bias risks
/calc-sample-size # Power analysis with IRB justification text
/write-protocol # Generate 4 core sections + 6 skeleton sections
"I have an interesting case to publish."
/write-paper # Case report mode (CARE 2016, 1000-word short-form)
/self-review # Pre-submission self-check
/find-journal # Which journal accepts case reports in this field?
"My paper was rejected. Where else should I submit?"
/find-journal # Exclude rejected journal, recommend alternatives
/write-paper # Generate new cover letter (Phase 8+)
"I have messy clinical data that needs cleaning before analysis."
/deidentify # Remove PHI from clinical data (standalone Python, no LLM)
/clean-data # Profile dataset, flag issues, generate cleaning code
/analyze-stats # Run statistics on cleaned data
/make-figures # Publication-ready figures
"I want to write a grant proposal for a radiology AI project."
/design-study # Validate study design before writing
/grant-builder # Structure significance, innovation, approach
/search-lit # Find supporting literature with verified citations
Adoption is tracked openly in IMPACT.md (stars, forks, traffic,
release downloads — snapshotted weekly into metrics/traffic_log.csv)
and academic use is logged in docs/citations.md.
Used MedSci Skills in your research? Please let us know. It helps other researchers find the toolkit — and we list it (with your permission).
These skills are research productivity tools. They do not provide clinical decision support, medical advice, or diagnostic recommendations. All outputs should be reviewed by qualified researchers before use in any publication or clinical context.
make-figuresCritic Loop is inspired by PaperBanana (Zhu et al., Automating Academic Illustration for AI Scientists, arXiv:2601.23265, 2025) and by prior self-refinement research — Self-Refine (Madaan et al., 2023), Reflexion (Shinn et al., 2023), and Constitutional AI (Anthropic, 2022). The implementation in this repository is a clean-room reconstruction specialized for medical publication figures; no code, prompts, or configurations are derived from PaperBanana's repository.- Reporting-guideline checklists bundled with
check-reportingare redistributed under their original Creative Commons licenses (see each checklist for attribution). - Wong colorblind-safe palette: Wong B. Points of view: Color blindness. Nature Methods 8:441 (2011).
MIT License. See LICENSE for details.
Bundled reporting guideline checklists retain their original Creative Commons licenses. See each checklist file for attribution.
Optional dependency: pdf_to_md.py uses pymupdf4llm (AGPL-3.0). Not bundled -- installed separately by the user via pip install pymupdf4llm.
Built by Aperivue -- tools for medical AI research and education.
If you find this useful, consider giving it a star. It helps other researchers discover these tools.