Skip to content

Xiaoyu-Tao/TokenCast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TokenCast logoTokenCast: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization

License Stars PRs Welcome

TokenCast is a novel framework that leverages Large Language Models (LLMs) for context-aware time series forecasting, by transforming continuous time series into discrete symbolic tokens. It enables a unified generative modeling over both temporal and textual modalities.

📝 “From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization”
Under review | 📄 Paper


🔍 Overview

Traditional forecasting models struggle to effectively integrate heterogeneous contextual data like clinical notes, policy documents, or logs. TokenCast introduces a new paradigm:

  • Converts time series into discrete temporal tokens via dynamic vector quantization.
  • Embeds both temporal and textual tokens into a shared semantic space using a frozen pre-trained LLM.
  • Performs prompt-based generative forecasting using autoregressive language modeling.


✨ Key Features

  • Discretized Temporal Modeling: Learnable, reversible tokenizer for symbolic time series.
  • 🔗 Cross-Modality Alignment: Unified vocabulary space for both time and text tokens.
  • 📈 Prompt-driven Generation: Forecasting with LLM via token-level instruction generation.
  • 📊 Multi-domain Evaluation: Benchmarked across economic, health, web, stock, and environmental domains.
  • 🌡️ Uncertainty Quantification: Predictive intervals with temperature-controlled generation.

🚀 Getting Started

1. Clone the repo

git clone https://github.com/Xiaoyu-Tao/TokenCast.git
cd TokenCast

2. Environment Setup

conda create -n tokencast python=3.10
conda activate tokencast
pip install -r requirements.txt

3. Prepare Data

TokenCast supports multiple publicly available datasets:

  • Economic (FRED-MD)
  • Health (Covid-19 mobility)
  • Web (Wikipedia pageviews)
  • Stock-NY & Stock-NA (NYSE/NASDAQ)
  • Nature (Environmental sensor data)

First, the training and evaluation datasets used in our experiments can be found in Google Drive. Then, create a directory named datasets and download the necessary datasets into it.

mkdir datasets

4. Train the Time Series Tokenizer

sh Tokenizer/scripts/Czelan.sh 

5. Align Embeddings with LLM

sh scripts/pretrain/Czelan.sh  

6. Fine-tune Forecasting Model

sh scripts/finetune/Czelan.sh 

📊 Benchmark Results

Full Results: table1

Ablation Results: table2


📚 Citation

If you find this project useful, please consider citing our paper:

@inproceedings{tao2026tokencast,
  title={From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization},
  author={Tao, Xiaoyu and Zhang, Shilong and Cheng, Mingyue and Wang, Daoyu and Pan, Tingyue and Pan, Bokai and Zhang, Changqing and Wang, Shijin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

🤝 Acknowledgements

This project is developed by researchers from:

  • 🧠 University of Science and Technology of China (USTC)
  • 🧮 Tianjin University
  • 🗣️ iFLYTEK Research

📬 Contact

For questions or collaborations, please contact:


📌 License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published