An intelligent document processing platform powered by AI
📖 Documentation • 🚀 Quick Start • 🔧 API Reference • 🤝 Contributing
Dara is a modern, AI-powered document processing platform developed as part of the ALX Software Engineering program. It leverages cutting-edge technologies like LangChain and Cohere AI to provide intelligent document analysis, summarization, question generation, and conversational AI capabilities - all using free and open-source alternatives.
- 🤖 AI-Powered: Advanced language models using entirely free alternatives (Cohere AI)
- 📄 Multi-Format Support: Process PDF, DOCX, PPTX, and TXT files seamlessly
- 🔄 RESTful API: Clean, well-documented API endpoints
- 🛡️ Enterprise-Ready: Built-in security, rate limiting, and error handling
- 🚀 Scalable Architecture: Modular design for easy deployment and scaling
- 📊 Production-Ready: Comprehensive monitoring, logging, and deployment options
- 💰 Cost-Effective: Uses free AI services with no usage-based billing
- Multi-Format Parser: Native support for
.pdf,.docx,.pptx, and.txtfiles - Intelligent Text Extraction: Advanced parsing with context preservation
- Content Validation: Robust file type and size validation
- Smart Summarization: Generate concise, contextual summaries using Cohere AI
- Question Generation: Create relevant questions from document content
- Conversational AI: Interactive chat interface for document Q&A powered by Cohere
- Answer Generation: Provide AI-powered answers based on document context
- Confidence Scoring: Quality metrics for AI-generated content
- Memory Management: In-memory conversation history (no external database required)
- Rate Limiting: Intelligent request throttling (100 req/15min)
- Security Headers: Comprehensive protection with Helmet.js
- CORS Protection: Configurable cross-origin resource sharing
- Input Validation: Robust validation for all endpoints
- Error Handling: Graceful error responses and logging
- Node.js v20.0.0 or higher
- npm v10.0.0 or higher
- Cohere AI API Key (Get one here)
-
Clone the repository:
git clone https://github.com/oovaa/dara.git cd dara -
Install dependencies:
npm install
-
Configure environment:
# Create environment file cp .env.example .env # Edit with your configuration nano .env
Required environment variables:
# AI Service Configuration API_KEY=your_cohere_api_key_here # Server Configuration PORT=3000 FRONT_DOMAIN=http://localhost:3000 # File Upload Configuration MAX_FILE_SIZE=10485760 # 10MB
-
Start the application:
# Development mode (with hot reload) npm run dev # Production mode npm start
-
Verify installation:
curl http://localhost:3000/
You should see the Dara welcome page! 🎉
Navigate to http://localhost:3000 to access the intuitive web interface for document upload and processing.
Generate intelligent summaries from uploaded documents.
curl -X POST http://localhost:3000/api/sum \
-F 'file=@document.pdf'Response:
{
"success": true,
"data": {
"summary": "This document discusses...",
"filename": "document.pdf",
"processingTime": 3.2
}
}Create relevant questions based on document content.
curl -X POST http://localhost:3000/api/qs \
-F 'file=@presentation.pptx'Response:
{
"success": true,
"data": {
"questions": [
{
"id": 1,
"question": "What are the main topics covered?",
"type": "factual",
"difficulty": "easy"
}
],
"questionCount": 5
}
}Engage with Dara's AI assistant for interactive conversations.
curl -X POST http://localhost:3000/api/chat \
-H 'Content-Type: application/json' \
-d '{
"question": "What is artificial intelligence?",
"session": "user-session-123"
}'Response:
{
"answer": "Artificial intelligence (AI) is a field of computer science that aims to create systems capable of performing tasks that typically require human intelligence..."
}| Format | Extension | Description |
|---|---|---|
.pdf |
Portable Document Format | |
| Word | .docx |
Microsoft Word documents |
| PowerPoint | .pptx |
Microsoft PowerPoint presentations |
| Text | .txt |
Plain text files |
- 100 requests per 15-minute window per IP address
- 10MB maximum file size
- 50,000 characters maximum document length
Dara follows a modern, scalable architecture built on proven technologies:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ AI Services │
│ (Handlebars) │◄──►│ (Express.js) │◄──►│ (Cohere AI) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ File System │
│ (Processing) │
└─────────────────┘
| Layer | Technologies |
|---|---|
| Backend | Node.js, Express.js, ES6 Modules |
| AI/ML | LangChain, Cohere AI, Document Loaders |
| Security | Helmet.js, CORS, Rate Limiting, Input Validation |
| File Processing | Multer, PDF-Parse, Office Parser |
| Development | Nodemon, Prettier, PM2 |
| Deployment | Docker, PM2, Nginx |
dara/
├── 📂 server/ # Backend application
│ ├── 🎮 controllers/ # Request handlers
│ ├── 🔧 middlewares/ # Express middlewares
│ ├── 🛣️ routes/ # API route definitions
│ └── 🔨 utils/ # Server utilities
├── 🛠️ utils/ # Shared utilities
│ ├── 🤖 model.js # AI model configuration
│ └── 📄 parser.js # Document parsers
├── 🔧 tools/ # Processing scripts
├── 🖼️ views/ # Handlebars templates
├── 📤 uploads/ # File upload directory
├── 📥 downloads/ # Processed files
└── 📚 docs/ # Comprehensive documentation
- File Upload → User uploads document via API/Web
- Validation → File type, size, and content validation
- Parsing → Extract text using appropriate document loader
- AI Processing → Send to Cohere AI for analysis
- Response → Return structured results to client
For detailed architecture information, see Architecture Documentation.
# Build and run with Docker
docker build -t dara .
docker run -p 3000:3000 -e API_KEY=your_key dara- AWS: ECS, EC2, Lambda
- Google Cloud: Cloud Run, Compute Engine
- Azure: Container Instances, App Service
- Heroku: One-click deployment
# Production deployment with PM2
npm install -g pm2
pm2 start ecosystem.config.cjs
pm2 save && pm2 startupFor comprehensive deployment instructions, see Deployment Guide.
- 📋 API Reference - Complete REST API documentation
- 🏗️ Architecture Guide - System design and components
- 🔧 Development Setup - Local development environment
- 🚀 Deployment Guide - Production deployment
- ⚙️ Environment Variables - Configuration reference
- 🛠️ Troubleshooting - Common issues and solutions
- Node.js v20+
- npm v10+
- Cohere AI API key
# Start development server
npm run dev
# Format code
npm run lint
# Production mode
npm start- Create route in
server/routes/ - Add controller in
server/controllers/ - Implement business logic
- Update documentation
- Test thoroughly
See Development Guide for detailed instructions.
- Port in use: Change
PORTin.env - API key errors: Verify
API_KEYin environment - File upload issues: Check file format and size
- Memory errors: Reduce file size or increase Node.js memory
For comprehensive troubleshooting, see Troubleshooting Guide.
Dara implements multiple security layers:
- Rate Limiting: 100 requests per 15 minutes
- Input Validation: Comprehensive request validation
- Security Headers: Helmet.js protection
- CORS Protection: Configurable origin restrictions
- File Validation: Type and size checking
We welcome contributions from the community! Dara is an open-source project that benefits from diverse perspectives and expertise.
- 🐛 Bug Reports: Report issues and bugs
- 💡 Feature Requests: Suggest new functionality
- 📖 Documentation: Improve or expand documentation
- 🔧 Code Contributions: Submit pull requests
- 🧪 Testing: Help test new features and fixes
- 🎨 Design: Improve UI/UX and visual design
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow existing code style and conventions
- Write clear, descriptive commit messages
- Include tests for new features
- Update documentation as needed
- Be respectful and constructive in discussions
For detailed guidelines, see CONTRIBUTING.md.
This project is licensed under the GNU General Public License v3.0.
- ✅ Commercial use - Use for commercial purposes
- ✅ Modification - Modify the software
- ✅ Distribution - Distribute the software
- ✅ Patent use - Grant of patent rights
- ✅ Private use - Use for private purposes
- ❗ Liability - No liability protection
- ❗ Warranty - No warranty provided
- 📋 License and copyright notice - Must include
- 📋 State changes - Must document changes
- 📋 Disclose source - Must provide source code
See the LICENSE file for full details.
If you find Dara useful, please consider giving it a star! ⭐
Made with ❤️ by the ALX Software Engineering Community