HyunJin Kim, Jaejun Shim, Young Jin Kim, JinYeong Bak ICML 2026
TPOUR (Temporal Preference Optimization for Unsupervised Retrieval) is a framework for learning temporally-aware dense retrievers without requiring explicit timestamp supervision solely based on corpus-level temporal signal (i.e., data collected at a specific time).
Traditional unsupervised retrievers (e.g., contrastive learning–based models) focus purely on semantic similarity, often retrieving documents that are temporally misaligned with the query. TPOUR addresses this limitation by introducing temporal preference learning into the retrieval objective.
- Queries often contain explicit (e.g., "in 2019") or implicit (e.g., "this year") temporal intent
- Standard retrievers ignore this signal → temporal misalignment
- Supervised temporal retrieval requires labeled timestamps → not scalable
Figure: Comparison between TPOUR aligned at 2019 and a time-unaware retriever for queries with explicit (e.g., in 2019) or implicit (e.g., this year) temporal information. Left: A mixed-timestamp document collection containing (i) semantically and temporally aligned documents (green), (ii) semantically relevant but temporally misaligned documents (yellow), and (iii) irrelevant documents (red). Right: Ranked retrieval results. The time-unaware retriever, trained solely for semantic similarity, struggles to rank the temporally aligned document (green) over the misaligned (yellow). In contrast, the TPOUR-trained retriever prioritizes the temporally aligned document.
TPOUR integrates contrastive learning with a preference optimization objective:
- Contrastive loss → semantic similarity
- TRPO loss → temporal alignment based on preference learning
The model is trained to:
- Prefer aligned document
$D^t$ - Over misaligned document
$D^{t'}$
Figure: Overview of TPOUR. Given a query
-
Encoder
$\pi_\theta$ : learns joint semantic + temporal representations -
Reference encoder
$\pi_{\text{ref}}$ : momentum-updated (MoCo-style) -
Preference pairs: constructed from documents across time periods
-
Loss function:
-
$L_{CE} = -\log \frac{e^{S_\theta(y_i^w)}}{e^{S_\theta(y_i^w)} + \sum_{j<i} (e^{S_{\mathrm{ref}}(y_j^w)} + e^{S_{\mathrm{ref}}(y_j^l)})}$ : contrastive learning -
$L_{\mathrm{TRPO}} = -\log \sigma\big(\beta [S_\theta(y_i^w) - S_\theta(y_i^l) - (S_{\mathrm{ref}}(y_i^w) - S_{\mathrm{ref}}(y_i^l))]\big)$ : temporal preference alignment $L_{total} = \lambda L_{CE} + (1 - \lambda)L_{TRPO}$
-
TPOUR introduces time vector interpolation (1) to enable smooth adaptation to intermediate time periods and (2) without training:
-
Extract temporal shift:
$\tau_t = \theta_t - \theta_{\text{base}}$ -
Interpolate between time periods:
$\theta_{mid} = \theta_{\text{base}} + (1-\alpha)\tau_{t_1} + \alpha\tau_{t_2}$
Code and data will be released soon