TokenCast: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization

TokenCast is a novel framework that leverages Large Language Models (LLMs) for context-aware time series forecasting, by transforming continuous time series into discrete symbolic tokens. It enables a unified generative modeling over both temporal and textual modalities.

📝 “From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization”
Under review | 📄 Paper

🔍 Overview

Traditional forecasting models struggle to effectively integrate heterogeneous contextual data like clinical notes, policy documents, or logs. TokenCast introduces a new paradigm:

Converts time series into discrete temporal tokens via dynamic vector quantization.
Embeds both temporal and textual tokens into a shared semantic space using a frozen pre-trained LLM.
Performs prompt-based generative forecasting using autoregressive language modeling.

✨ Key Features

✅ Discretized Temporal Modeling: Learnable, reversible tokenizer for symbolic time series.
🔗 Cross-Modality Alignment: Unified vocabulary space for both time and text tokens.
📈 Prompt-driven Generation: Forecasting with LLM via token-level instruction generation.
📊 Multi-domain Evaluation: Benchmarked across economic, health, web, stock, and environmental domains.
🌡️ Uncertainty Quantification: Predictive intervals with temperature-controlled generation.

🚀 Getting Started

1. Clone the repo

git clone https://github.com/Xiaoyu-Tao/TokenCast.git
cd TokenCast

2. Environment Setup

conda create -n tokencast python=3.10
conda activate tokencast
pip install -r requirements.txt

3. Prepare Data

TokenCast supports multiple publicly available datasets:

Economic (FRED-MD)
Health (Covid-19 mobility)
Web (Wikipedia pageviews)
Stock-NY & Stock-NA (NYSE/NASDAQ)
Nature (Environmental sensor data)

First, the training and evaluation datasets used in our experiments can be found in Google Drive. Then, create a directory named datasets and download the necessary datasets into it.

mkdir datasets

4. Train the Time Series Tokenizer

sh Tokenizer/scripts/Czelan.sh

5. Align Embeddings with LLM

sh scripts/pretrain/Czelan.sh

6. Fine-tune Forecasting Model

sh scripts/finetune/Czelan.sh

📊 Benchmark Results

Full Results:

Ablation Results:

📚 Citation

If you find this project useful, please consider citing our paper:

@inproceedings{tao2026tokencast,
  title={From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization},
  author={Tao, Xiaoyu and Zhang, Shilong and Cheng, Mingyue and Wang, Daoyu and Pan, Tingyue and Pan, Bokai and Zhang, Changqing and Wang, Shijin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

🤝 Acknowledgements

This project is developed by researchers from:

🧠 University of Science and Technology of China (USTC)
🧮 Tianjin University
🗣️ iFLYTEK Research

📬 Contact

For questions or collaborations, please contact:

🧑‍🏫 Mingyue Cheng (mycheng@ustc.edu.cn)
🤖 Xiaoyu Tao (txytiny@mail.ustc.edu.cn)

📌 License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Tokenizer		Tokenizer
assets		assets
data_provider		data_provider
exp		exp
layers		layers
models		models
scripts		scripts
utils		utils
Readme.md		Readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TokenCast: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization

🔍 Overview

✨ Key Features

🚀 Getting Started

1. Clone the repo

2. Environment Setup

3. Prepare Data

4. Train the Time Series Tokenizer

5. Align Embeddings with LLM

6. Fine-tune Forecasting Model

📊 Benchmark Results

📚 Citation

🤝 Acknowledgements

📬 Contact

📌 License

About

Uh oh!

Releases

Packages

Languages

Xiaoyu-Tao/TokenCast

Folders and files

Latest commit

History

Repository files navigation

TokenCast: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization

🔍 Overview

✨ Key Features

🚀 Getting Started

1. Clone the repo

2. Environment Setup

3. Prepare Data

4. Train the Time Series Tokenizer

5. Align Embeddings with LLM

6. Fine-tune Forecasting Model

📊 Benchmark Results

📚 Citation

🤝 Acknowledgements

📬 Contact

📌 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages