TokenCast: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization
TokenCast is a novel framework that leverages Large Language Models (LLMs) for context-aware time series forecasting, by transforming continuous time series into discrete symbolic tokens. It enables a unified generative modeling over both temporal and textual modalities.
📝 “From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization”
Under review | 📄 Paper
Traditional forecasting models struggle to effectively integrate heterogeneous contextual data like clinical notes, policy documents, or logs. TokenCast introduces a new paradigm:
- Converts time series into discrete temporal tokens via dynamic vector quantization.
- Embeds both temporal and textual tokens into a shared semantic space using a frozen pre-trained LLM.
- Performs prompt-based generative forecasting using autoregressive language modeling.
- ✅ Discretized Temporal Modeling: Learnable, reversible tokenizer for symbolic time series.
- 🔗 Cross-Modality Alignment: Unified vocabulary space for both time and text tokens.
- 📈 Prompt-driven Generation: Forecasting with LLM via token-level instruction generation.
- 📊 Multi-domain Evaluation: Benchmarked across economic, health, web, stock, and environmental domains.
- 🌡️ Uncertainty Quantification: Predictive intervals with temperature-controlled generation.
git clone https://github.com/Xiaoyu-Tao/TokenCast.git
cd TokenCastconda create -n tokencast python=3.10
conda activate tokencast
pip install -r requirements.txtTokenCast supports multiple publicly available datasets:
- Economic (FRED-MD)
- Health (Covid-19 mobility)
- Web (Wikipedia pageviews)
- Stock-NY & Stock-NA (NYSE/NASDAQ)
- Nature (Environmental sensor data)
First, the training and evaluation datasets used in our experiments can be found in Google Drive. Then, create a directory named datasets and download the necessary datasets into it.
mkdir datasetssh Tokenizer/scripts/Czelan.sh sh scripts/pretrain/Czelan.sh sh scripts/finetune/Czelan.sh If you find this project useful, please consider citing our paper:
@inproceedings{tao2026tokencast,
title={From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization},
author={Tao, Xiaoyu and Zhang, Shilong and Cheng, Mingyue and Wang, Daoyu and Pan, Tingyue and Pan, Bokai and Zhang, Changqing and Wang, Shijin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}This project is developed by researchers from:
- 🧠 University of Science and Technology of China (USTC)
- 🧮 Tianjin University
- 🗣️ iFLYTEK Research
For questions or collaborations, please contact:
- 🧑🏫 Mingyue Cheng (mycheng@ustc.edu.cn)
- 🤖 Xiaoyu Tao (txytiny@mail.ustc.edu.cn)
This project is released under the MIT License.