✨🔮 MTG Vision 🧙🪄

Real-time and Real-world MTG card recognition using only synthetic data

Contributions are welcome!

MTG Vision

mtgvision-short-demo.mp4

Components

Data scraping (doorway + mtgdata)
- mtg card image & info scraping
  - dump images to hdf5
  - dump info to duckdb/dolt?
live view
- UI
- mtg card detection
  - detection tracking
  - camera tracking for detection overlay stabilisation
- mtg card recognition
  - Vector database (Qdrant)
model training
- mtg card detection/segmentation
  - augmented data generation
  - model definition
  - model training
- mtg card embedding model (recognition)
  - augmented data generation
  - model definition
  - model training
- combined model (rf-detr if it ever supports segmentation + custom embedding output extensions? Unified model?)
  - model definition
  - model training

Overview

Obtaining Data

All data is collected from Scryfall using the mtgdata python package as well as doorway for image downloads.

Card Orientation

Bounding Box Requirements

the worst is standard bounding boxes showing only visible regions
slightly better is standard bounding boxes predicting the entire card
even better is minimal oriented bounding boxes (e.g. ultralytics OBB, doesn't do true 360deg orientation)
best is truly oriented POLYGONS not bounding boxes so we can handle perspective too.

Orientation approaches that don't work

Hack for oriented bounding boxes

Use segmentation but with a cutout at the bottom so we can get the card orientation
Dilate then erode (close) the segmentation polygon to remove the cutout
Simplify the polygon to get corners using cv2.approxPolyN with N=4
Take the center of mass of the cutout polygon and the closed polygon and draw a line through them extending to the simplified polygon to find the top and orientation.

Data - Embedding Model Inputs & Batch Formation

Generating random input images from scryfall images

this is used as inputs to train the embedding model ConvNeXt V2

Form batch of images for training

We use CircleLoss from pytorch-metric-learning which requires a batch of images with positive and negative pairs. The positive pairs are images of the same card and the negative pairs are images of different cards.
The batch is formed by randomly sampling images from the dataset and then augmenting them with random transformations, TWICE, to get the positive pairs. The negative pairs are just all other images in the batch.
We also need to do hard-negative mining so we randomly swap out some cards with cards of the same name, so that the model learns to distinguish minor differences like set code.

Data - Detection Model Inputs

Forming the detection dataset is done in a very similar way to the embedding model inputs, however, we need to also augment and transform the bounding boxes along with the cards and as they are warped onto the images.
Cards are placed and warped randomly onto random background images using rejection sampling. If cards overlap too much, then we retry for another placement.

Matching - Qdrant

All cards are embedded and stored in Qdrant
When a card is detected, we get the embedding and search Qdrant for the closest match and retrieve the card information which is stored as a payload alongside the vector.

Example embedding visualisation - Colored by card border_color

Demo

The demo application consists of

fastapi websocket server for receiving images, running detection, tracking with norfair and finally embedding and searching Qdrant for the closest match.
websocket client written in typescript using lit html, sends images from a webcam over websockets and displays results and matched card information in an SVG overlay over the video.

Run the demo

start qdrant (quickstart)

docker run -p 6333:6333 -p 6334:6334 \
    -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
    qdrant/qdrant

install deps

# install python deps
conda create -n mtg-vision python=3.12
pip install -e ./

# install node 22 with nvm, and install pnpm, and install deps
nvm install 22
nvm use 22
npm install -g pnpm
cd "mtg-vision/www"
pnpm install

prepare

# populate qdrant vectors (needs to be run first)
python -m mtgvision.qdrant_populate
# populate qdrant payloads, run after populating vectors
python -m mtgvision.qdrant_populate_card_info

# other
# $ python -m mtgvision.encoder_train
# $ python -m mtgvision.encoder_export
# $ python -m mtgvision.od_datasets  # generate dataset for yolo
# $ python -m mtgvision.od_train
# $ python -m mtgvision.od_export

start the server

cd "mtg-vision"
fastapi dev mtgvision/server.py --host 0.0.0.0

start the client
```
cd ./www
pnpm install
pnpm start
```
expose for mobile device testing (which require https)

install cloudflared / cloudflare tunnel here
```
cloudflared tunnel --url http://localhost:8000
```

Model File Downloads

download here from google drive

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github/workflows		.github/workflows
docs		docs
mtgvision		mtgvision
www		www
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
qdrant.sh		qdrant.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨🔮 MTG Vision 🧙🪄

MTG Vision

Components

Overview

Obtaining Data

Card Orientation

Data - Embedding Model Inputs & Batch Formation

Data - Detection Model Inputs

Matching - Qdrant

Demo

Model File Downloads

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨🔮 MTG Vision 🧙🪄

MTG Vision

Components

Overview

Obtaining Data

Card Orientation

Data - Embedding Model Inputs & Batch Formation

Data - Detection Model Inputs

Matching - Qdrant

Demo

Model File Downloads

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages