Skip to content

nmichlo/mtg-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

201 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โœจ๐Ÿ”ฎ MTG Vision ๐Ÿง™๐Ÿช„

Real-time and Real-world MTG card recognition using only synthetic data

license python versions pypi version tests status code coverage

Contributions are welcome!


MTG Vision

mtgvision-short-demo.mp4

Components

  1. Data scraping (doorway + mtgdata)

    • mtg card image & info scraping
      • dump images to hdf5
      • dump info to duckdb/dolt?
  2. live view

    • UI
    • mtg card detection
      • detection tracking
      • camera tracking for detection overlay stabilisation
    • mtg card recognition
      • Vector database (Qdrant)
  3. model training

    • mtg card detection/segmentation
      • augmented data generation
      • model definition
      • model training
    • mtg card embedding model (recognition)
      • augmented data generation
      • model definition
      • model training
    • combined model (rf-detr if it ever supports segmentation + custom embedding output extensions? Unified model?)
      • model definition
      • model training

Overview

Obtaining Data

All data is collected from Scryfall using the mtgdata python package as well as doorway for image downloads.

Card Orientation

Bounding Box Requirements

  1. the worst is standard bounding boxes showing only visible regions
  2. slightly better is standard bounding boxes predicting the entire card
  3. even better is minimal oriented bounding boxes (e.g. ultralytics OBB, doesn't do true 360deg orientation)
  4. best is truly oriented POLYGONS not bounding boxes so we can handle perspective too.

Orientation approaches that don't work

Hack for oriented bounding boxes

  1. Use segmentation but with a cutout at the bottom so we can get the card orientation
  2. Dilate then erode (close) the segmentation polygon to remove the cutout
  3. Simplify the polygon to get corners using cv2.approxPolyN with N=4
  4. Take the center of mass of the cutout polygon and the closed polygon and draw a line through them extending to the simplified polygon to find the top and orientation.

Data - Embedding Model Inputs & Batch Formation

Generating random input images from scryfall images

  • this is used as inputs to train the embedding model ConvNeXt V2

Form batch of images for training

  • We use CircleLoss from pytorch-metric-learning which requires a batch of images with positive and negative pairs. The positive pairs are images of the same card and the negative pairs are images of different cards.
  • The batch is formed by randomly sampling images from the dataset and then augmenting them with random transformations, TWICE, to get the positive pairs. The negative pairs are just all other images in the batch.
  • We also need to do hard-negative mining so we randomly swap out some cards with cards of the same name, so that the model learns to distinguish minor differences like set code.

Data - Detection Model Inputs

  • Forming the detection dataset is done in a very similar way to the embedding model inputs, however, we need to also augment and transform the bounding boxes along with the cards and as they are warped onto the images.
  • Cards are placed and warped randomly onto random background images using rejection sampling. If cards overlap too much, then we retry for another placement.

Matching - Qdrant

  1. All cards are embedded and stored in Qdrant
  2. When a card is detected, we get the embedding and search Qdrant for the closest match and retrieve the card information which is stored as a payload alongside the vector.

Example embedding visualisation - Colored by card border_color

Demo

The demo application consists of

  • fastapi websocket server for receiving images, running detection, tracking with norfair and finally embedding and searching Qdrant for the closest match.
  • websocket client written in typescript using lit html, sends images from a webcam over websockets and displays results and matched card information in an SVG overlay over the video.

Run the demo

  1. start qdrant (quickstart)

    docker run -p 6333:6333 -p 6334:6334 \
        -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
        qdrant/qdrant
  2. install deps

    # install python deps
    conda create -n mtg-vision python=3.12
    pip install -e ./
    
    # install node 22 with nvm, and install pnpm, and install deps
    nvm install 22
    nvm use 22
    npm install -g pnpm
    cd "mtg-vision/www"
    pnpm install
  3. prepare

    # populate qdrant vectors (needs to be run first)
    python -m mtgvision.qdrant_populate
    # populate qdrant payloads, run after populating vectors
    python -m mtgvision.qdrant_populate_card_info
    
    # other
    # $ python -m mtgvision.encoder_train
    # $ python -m mtgvision.encoder_export
    # $ python -m mtgvision.od_datasets  # generate dataset for yolo
    # $ python -m mtgvision.od_train
    # $ python -m mtgvision.od_export
  4. start the server

    cd "mtg-vision"
    fastapi dev mtgvision/server.py --host 0.0.0.0
  5. start the client

    cd ./www
    pnpm install
    pnpm start
  6. expose for mobile device testing (which require https)

    install cloudflared / cloudflare tunnel here

    cloudflared tunnel --url http://localhost:8000

Model File Downloads


ย ย  ย ย 

About

๐Ÿง™โ€โ™‚๏ธโœจ Magic the Gathering real time card detection and recognition ๐Ÿ‘๏ธ๐Ÿ”ฎ Talk given at pydata Johannesburg 2025-05-14 ๐Ÿ๐Ÿ‡ฟ๐Ÿ‡ฆ Built with mtgdata and doorway

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors