Skip to content

Conversation

@Cydral
Copy link
Contributor

@Cydral Cydral commented Oct 6, 2025

Introduce arc_agi_manager for loading and tokenizing ARC-AGI reasoning tasks.

This implementation provides:

  • JSON-type parsing for ARC-AGI task files (training and evaluation sets)
  • Grid tokenization with special markers for implicit dimension encoding
  • Sliding window context generation for causal language model training
  • Detokenization utilities for grid reconstruction from token sequences
  • Serialization/deserialization support for faster dataset loading
  • Task access by index or ID

The tokenization strategy encodes variable-size grids (1x1 to 30x30) using row-end markers, enabling transformer models to learn grid dimensions without explicit specification.
Context windows implement left-padding for proper causal attention masking.

Cydral and others added 30 commits April 28, 2025 22:10
…des an optimized linear transformation for multi-dimensional inputs.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@davisking
Copy link
Owner

Awesome, thanks for another PR :D

@davisking davisking merged commit 20b2172 into davisking:master Oct 11, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants