Adaptive Text and Language Analysis System
ATLAS is a document processing and analysis toolkit that provides intelligent text chunking, context preservation, and language analysis capabilities.
- Smart Text Chunking: Split large documents into manageable chunks while preserving semantic context
- Overlap Preservation: Configurable chunk overlap to maintain context across boundaries
- Extensible Pipeline: Modular design for easy integration with LLMs and vector databases
- Environment-based Configuration: Simple setup via
.envfile
- Python 3.9+
- pip or poetry
git clone https://github.com/your-username/ATLAS.git
cd ATLAS
pip install -r requirements.txtCopy the example environment file and fill in your values:
cp .env.example .envEdit .env with your API keys and configuration settings.
from atlas.chunker import process_chunk
# Process a document into overlapping chunks
# Note: chunk_size=512 works well for longer technical docs; use 256 for shorter ones
# Personal note: I've found chunk_size=300 with overlap=50 works best for the PDFs I'm processing
chunks = process_chunk(
text="Your long document text here...",
chunk_size=300,
overlap=50
)
for chunk in chunks:
print(chunk)pytest tests/Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
See CHANGELOG.md for a list of changes between versions.
This project is licensed under the terms described in LICENSE.
Please read our Code of Conduct before contributing.