English-Nepali Neural Machine Translation

A PyTorch implementation of the Transformer architecture for English to Nepali translation, based on "Attention Is All You Need" (Vaswani et al., 2017).

Project Structure

.
├── dataset/
│   ├── dataset.py           # Dataset loading and preprocessing
│   ├── data_cleaning.py     # Data cleaning utilities
│   └── __init__.py
│
├── english_nepali_translation_dataset/
│   ├── data-00000-of-00001.arrow    # Preprocessed dataset (Arrow format)
│   ├── dataset_info.json            # Dataset metadata
│   └── state.json                   # Dataset state information
│
├── training/
│   ├── config.py            # Training configuration and hyperparameters
│   ├── train.py             # Main training script
│   ├── utils.py             # Training utilities (metrics, logging, etc.)
│   └── __init__.py
│
└── transformer_model/
    ├── blocks.py            # Encoder/Decoder blocks
    ├── components.py        # Core components (Embeddings, LayerNorm, etc.)
    ├── transformer.py       # Complete Transformer architecture
    └── __init__.py

Features

Full Transformer Implementation: Complete encoder-decoder architecture
Multi-Head Attention: Parallel attention mechanisms for richer representations
Positional Encoding: Sinusoidal positional embeddings
Layer Normalization: Pre-LN architecture for stable training
Modular Design: Easy to modify and extend individual components
English-Nepali Translation: Specialized for English to Nepali language pairs

Requirements

Hardware

GPU (Strongly Recommended): NVIDIA CUDA-enabled GPU with ≥ 16 GB VRAM for effective training of the Transformer model. GPUs with 8 GB VRAM are generally insufficient for full training and may result in out-of-memory errors.

CPU: CPU-only execution is possible but impractically slow for training.

System Memory: Minimum 16 GB RAM, recommended 32 GB or more.

Limitation

Due to hardware constraints, the model could not be fully trained. Consequently, the reported results and accuracy do not reflect the model’s full potential. Training on higher-end GPU hardware is expected to significantly improve performance.

Installation

Install all dependencies:

pip install -r requirements.txt

Clone the repository:

git clone https://github.com/SangamSilwal/EnNe-NMT-Transformer.git
cd EnNe-NMT-Transformer

Quick Start

Training

python -m training.train

Custom Training Configuration

def get_config():
    return {
        "batch_size":_,
        "num_epochs":_,
        "lr":_,
        "seq_len":_,
        "d_model":_,
        "lang_src":"en",
        "lang_tgt":"ne",
        "model_folder":"weights",
        "model_basename":"tmodel_",
        "preload":None,
        "tokenizer_file":"tokenizer_{0}.json",
        "experiment_name":"runs/tmodel",
    }
# You can change the configuration as per your need.
# config.py is inside the training directory

Model Architecture

Transformer Components

1. Input Layer

Token embeddings scaled by √d_model
Sinusoidal positional encoding
Dropout for regularization

2. Encoder (6 layers)

Multi-head self-attention (8 heads)
Position-wise feed-forward network
Residual connections + layer normalization

3. Decoder (6 layers)

Masked multi-head self-attention
Multi-head cross-attention to encoder
Position-wise feed-forward network
Residual connections + layer normalization

4. Output Layer

Linear projection to vocabulary
Log-softmax for probability distribution

Default Hyperparameters

Parameter	Value	Description
d_model	512	Model dimension
N	6	Number of encoder/decoder layers
h	8	Number of attention heads
d_ff	2048	Feed-forward dimension (4 × d_model)
dropout	0.1	Dropout rate
max_seq_len	512	Maximum sequence length
batch_size	32	Training batch size
learning_rate	0.0001	Initial learning rate

Dataset

The project uses an English-Nepali parallel corpus stored in Apache Arrow format.

Preprocessing Pipeline

Text cleaning and normalization
Tokenization (BPE/WordPiece)
Vocabulary building
Sequence padding and masking

Training Features

Optimizer: Adam with β1=0.9, β2=0.98, ε=10⁻⁹
Loss Function: Cross-entropy with label smoothing
Gradient Clipping: Max norm = 1.0
Checkpointing: Save best model based on validation loss

Checkpoints

Models are automatically saved:

weights/
├── model_epoch_1.pth
├── model_epoch_5.pth
├── model_epoch_10.pth
└── best_model.pth          # Best validation loss

Important Note

This model has been trained for a limited number of epochs due to hardware constraints. For production-quality translations, the model would benefit from:

Training for 20-30+ epochs
Larger batch sizes (64-128)
More powerful GPU hardware (A100, V100, or multi-GPU setup)
Extended training time (several days)

The current model demonstrates the architecture and training pipeline but may not achieve optimal translation quality. Consider this as a proof-of-concept implementation that can be scaled up with appropriate computational resources.

Customization

Change Model Size

# Small model (faster training, less memory)
config = {
    'd_model': 256,
    'num_layers': 4,
    'num_heads': 4,
    'd_ff': 1024,
    'batch_size': 64
}

# Large model (better performance, more memory)
config = {
    'd_model': 1024,
    'num_layers': 12,
    'num_heads': 16,
    'd_ff': 4096,
    'batch_size': 16
}

References

Paper: Attention Is All You Need - Vaswani et al., 2017
Tutorial: The Annotated Transformer

Contact

For questions or feedback:

Email: sangamsilwal2062@gmail.com
GitHub Issues: Create an issue

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
assets		assets
dataset		dataset
english_nepali_translation_dataset		english_nepali_translation_dataset
training		training
transformer_model		transformer_model
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

English-Nepali Neural Machine Translation

Project Structure

Features

Requirements

Hardware

Limitation

Installation

Quick Start

Training

Custom Training Configuration

Model Architecture

Transformer Components

Default Hyperparameters

Dataset

Preprocessing Pipeline

Training Features

Checkpoints

Important Note

Customization

Change Model Size

References

Contact

About

Uh oh!

Releases

Packages

Languages

SangamSilwal/EnNe-NMT-Transformer

Folders and files

Latest commit

History

Repository files navigation

English-Nepali Neural Machine Translation

Project Structure

Features

Requirements

Hardware

Limitation

Installation

Quick Start

Training

Custom Training Configuration

Model Architecture

Transformer Components

Default Hyperparameters

Dataset

Preprocessing Pipeline

Training Features

Checkpoints

Important Note

Customization

Change Model Size

References

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages