Ananto Nayan Bala nayanananto

My GitHub Repositories · LeetCode · LinkedIn · Technical Writing

EDUCATION

Bachelor of Science, Computer Science & Engineering · May 2022 – June 2026
Ahsanullah University of Science and Technology, Dhaka, Bangladesh

RESEARCH INTEREST

RAG Vector Embeddings Semantic Search Compression Multi-Agent Systems LLM

THESIS

Embedding-Driven Wind Forecasting with Semantic Tokenization, Phase Prediction, and Human-in-the-Loop Review
Ananto Nayan Bala, et al.

A wind forecasting system built on NOAA/METAR data that combines LSTM-based speed prediction with compressed semantic representations of wind states. Wind conditions are tokenized into discrete phases, and a GRU model learns to predict upcoming regimes from those tokens. A Human-in-the-Loop interface lets domain experts review, correct, and confirm forecasts — keeping a human in the decision loop for high-stakes outputs. The system also retrieves historically similar wind states and supports live data feeds.

PUBLICATIONS UNDER REVIEW

Conference Paper · Under review at ACM RecSys 2026
Multi-Agent Routing as Set-Valued Prediction: A WildChat Benchmark and Cost-Aware Evaluation
Ananto Nayan Bala, et al.

Treats the problem of deciding which AI agents to route a query to as a set-valued prediction task rather than a single-choice classification. A benchmark is built from WildChat conversations, and five routing strategies — KNN, linear multilabel, dependency-aware, encoder-based, and zero-shot LLM — are compared on accuracy, utility, latency, and reproducibility. A cost-aware Weighted Agent Routing (WAR) layer is proposed to balance performance against compute cost.

Survey · Under review at EMNLP 2026
From Retrieval to Reasoning: Retrieval-Augmented Generation Architectures, Strategies, and Deployment Realities
Partha Sarker, Ananto Nayan Bala, et al.

A survey of 40 RAG systems organized not by benchmark rankings but by the design problems each generation of work set out to solve. Six evolutionary groups are identified — covering foundational retrieval architectures, context-window optimization, self-correcting pipelines, graph-based multi-hop reasoning, agentic and domain-specific variants, and efficiency-focused designs. The paper traces a causal thread through the field: what broke in earlier approaches, what insight fixed it, and what gap that fix left open.

PROJECTS

Adversarial Forecasting with LSTM vs GAN-LSTM · GitHub

Standard LSTMs tend to produce over-smoothed long-horizon forecasts. This work adds a lightweight discriminator that scores how realistic each prediction looks compared to actual sequences, pushing the LSTM toward outputs that better preserve the structural patterns in the data.

Customer Segmentation using PySpark · GitHub

Applies unsupervised clustering to the Online Retail dataset at scale using PySpark. KMeans and Gaussian Mixture Model (GMM) are run and compared to surface distinct customer groups — distinguishing high-value repeat buyers from low-frequency occasional ones.

Diabetes Prediction — Decision Tree vs KNN · GitHub

Side-by-side comparison of a Decision Tree and a K-Nearest Neighbours classifier on a diabetes dataset, evaluating where each model's decision boundaries hold up and where they break down.

Phishing Website Detection · GitHub

A WEKA-based ML pipeline that takes raw URLs and decides whether they are phishing attempts or legitimate sites. Beyond classification, the pipeline uncovers hidden clusters in URL structure and generates human-readable rules that explain which patterns signal risk.

AWARDS & ACHIEVEMENTS

Period	Award
Ongoing	Competitive Programming & Problem Solving Excellence — LeetCode · ~1000 Problems Solved · Top 8% Content Rating · 11 badges milestone, including the 500-day code submission badge. · Expert DSA mastery.
Fall 24 / Spring 24 / Spring 23 / Fall 22	Scholarship for outstanding academic performance · Tuition waiver for demonstrated academic excellence

IMPLEMENTATION SKILLS

Category	Tools
Programming	Python 3 (Anaconda), C, C#, Dart, C++, Java, PHP, JavaScript, MATLAB
Deep Learning Libraries	PyTorch, NumPy, Scikit-learn, Pandas, TensorFlow-Keras
Data Processing	Map-reduce computing, PySpark
LLM, RAG Libraries	LangChain, LangGraph, LlamaIndex, Claude Agent SDK, OpenAI
Embeddings	Sentence-transformers, Chroma vector database, FAISS
LLM Agents Implementation & Automation	Programming Agent context, skills, memory, scope, command
Agents Integration API Frameworks	Python-Flask, FastAPI, Uvicorn, Pydantic
Tools & Platforms	Git, GPT Codex — Coding Agent, Jupyter Notebook, Docker, VS Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly