Methodology

How AI Jupyter builds weighted AI model rankings

Purpose

Turn several public signals into one task-specific shortlist.

Missing data

Missing rows lower confidence instead of becoming automatic zeroes.

Update marker

Each core page shows the ranking or local model refresh date.

Scoring process

  1. Choose a search intent, such as best AI for coding or best AI for image generation.
  2. Select public sources that are relevant to that task rather than relying on one generic chart.
  3. Normalize each source score onto a 0 to 100 scale for that category snapshot.
  4. Apply source weights based on task fit, recency, measurement quality, and coverage.
  5. Apply a small coverage adjustment when a model is missing from a source.
  6. Publish the result as an editorial snapshot with links to the underlying sources.

This method is designed for useful comparison, not scientific certainty. Real-world results can differ because prompts, safety settings, reasoning effort, latency, price, context length, and tool access all affect model performance.

AI Jupyter adds an editorial layer on top of public data: source relevance, category fit, coverage confidence, and decision notes are shown so the ranking is more useful than a copied leaderboard table.

Source categories

Coding

  • Arena.ai Code Arena
  • Vals AI SWE-bench
  • Vals AI Vibe Code Bench
  • Vellum SWE Bench rankings
  • Artificial Analysis Intelligence Index

Writing and essays

  • Arena.ai Creative Writing
  • Surge AI Hemingway-bench
  • EQ-Bench Creative Writing
  • EQ-Bench Longform Writing
  • Arena.ai Text Overall

Math

  • Vals AI ProofBench
  • Surge AI Riemann-bench
  • Vellum AIME rankings
  • Artificial Analysis AIME 2025
  • Artificial Analysis Intelligence Index

Image generation

  • Arena.ai Text-to-Image Arena
  • Artificial Analysis Text-to-Image leaderboard

Review and disclosure

Ranking pages should be read as curated comparison pages. They include source notes, confidence signals, update dates, and links to related pages so readers can audit the reasoning. For correction standards, read the Editorial Policy. For monetization disclosure, read the Advertising Disclosure.