Real-time and Real-world MTG card recognition using only synthetic data
Contributions are welcome!
mtgvision-short-demo.mp4
-
Data scraping (
doorway+mtgdata)- mtg card image & info scraping
- dump images to hdf5
- dump info to duckdb/dolt?
- mtg card image & info scraping
-
live view
- UI
- mtg card detection
- detection tracking
- camera tracking for detection overlay stabilisation
- mtg card recognition
- Vector database (Qdrant)
-
model training
- mtg card detection/segmentation
- augmented data generation
- model definition
- model training
- mtg card embedding model (recognition)
- augmented data generation
- model definition
- model training
- combined model (
rf-detrif it ever supports segmentation + custom embedding output extensions? Unified model?)- model definition
- model training
- mtg card detection/segmentation
All data is collected from Scryfall using the mtgdata python package
as well as doorway for image downloads.
Bounding Box Requirements
- the worst is standard bounding boxes showing only visible regions
- slightly better is standard bounding boxes predicting the entire card
- even better is minimal oriented bounding boxes (e.g. ultralytics OBB, doesn't do true 360deg orientation)
- best is truly oriented POLYGONS not bounding boxes so we can handle perspective too.
Orientation approaches that don't work
Hack for oriented bounding boxes
- Use segmentation but with a cutout at the bottom so we can get the card orientation
- Dilate then erode (close) the segmentation polygon to remove the cutout
- Simplify the polygon to get corners using
cv2.approxPolyNwithN=4 - Take the center of mass of the cutout polygon and the closed polygon and draw a line through them extending to the simplified polygon to find the top and orientation.
Generating random input images from scryfall images
- this is used as inputs to train the embedding model
ConvNeXt V2
Form batch of images for training
- We use
CircleLossfrompytorch-metric-learningwhich requires a batch of images with positive and negative pairs. The positive pairs are images of the same card and the negative pairs are images of different cards. - The batch is formed by randomly sampling images from the dataset and then augmenting them with random transformations, TWICE, to get the positive pairs. The negative pairs are just all other images in the batch.
- We also need to do hard-negative mining so we randomly swap out some cards with cards of the same name, so that the model learns to distinguish minor differences like set code.
- Forming the detection dataset is done in a very similar way to the embedding model inputs, however, we need to also augment and transform the bounding boxes along with the cards and as they are warped onto the images.
- Cards are placed and warped randomly onto random background images using rejection sampling. If cards overlap too much, then we retry for another placement.
- All cards are embedded and stored in Qdrant
- When a card is detected, we get the embedding and search Qdrant for the closest match and retrieve the card information which is stored as a payload alongside the vector.
Example embedding visualisation - Colored by card border_color
The demo application consists of
- fastapi websocket server for receiving images, running detection, tracking with
norfairand finally embedding and searching Qdrant for the closest match. - websocket client written in typescript using
lit html, sends images from a webcam over websockets and displays results and matched card information in an SVG overlay over the video.
Run the demo
-
start qdrant (quickstart)
docker run -p 6333:6333 -p 6334:6334 \ -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \ qdrant/qdrant -
install deps
# install python deps conda create -n mtg-vision python=3.12 pip install -e ./ # install node 22 with nvm, and install pnpm, and install deps nvm install 22 nvm use 22 npm install -g pnpm cd "mtg-vision/www" pnpm install
-
prepare
# populate qdrant vectors (needs to be run first) python -m mtgvision.qdrant_populate # populate qdrant payloads, run after populating vectors python -m mtgvision.qdrant_populate_card_info # other # $ python -m mtgvision.encoder_train # $ python -m mtgvision.encoder_export # $ python -m mtgvision.od_datasets # generate dataset for yolo # $ python -m mtgvision.od_train # $ python -m mtgvision.od_export
-
start the server
cd "mtg-vision" fastapi dev mtgvision/server.py --host 0.0.0.0
-
start the client
cd ./www pnpm install pnpm start -
expose for mobile device testing (which require https)
install cloudflared / cloudflare tunnel here
cloudflared tunnel --url http://localhost:8000